Bug #1224: [repo] db-import's repo-add locks up sometimes - Servers - Parabola Issue Tracker

Bug #1224

[repo] db-import's repo-add locks up sometimes

lukeshu - about 7 years ago - . Updated almost 6 years ago.

Status:

fixed

Priority:

bug

Assignee:

lukeshu

% Done:

100%

Description

So this is the second time this has happened.

db-import-*, which import packages from Arch/ALARM, calls repo-add, which
is a shell program owned by pacman.

Sometimes, repo-add itself (that is: the /bin/bash binary) locks up,
pegging one of the CPU cores at 100%.

The first time, I announced it on IRC, and killed it after about 6
hours.

It happened again (at 2017-02-26, 7:00AM EST (UTC-5)); and having 8
cores, the rest of which are pretty idle, I decided to wait, and see
what would happen. It's been several days, and nothing has
happened.

There's a patch out for bash (4.4.011-2 ->
4.4.012-2), but that doesn't seem to be related.

I installed gdb, and attached to the process, but without debugging symbols, I didn't have much luck.

I'm re-compiling bash with debugging symbols, installing that. I'll kill the process, and wait for it to happen again.

History

Updated by isacdaavid about 7 years ago

debugging symbols!?

Only db-import-archlinuxarm-pkg is calling repo-add: to create pristine package databases (yes, every time db-import-archlinuxarm-pkg is called) and thus avoid the spoiled databases ALARM is serving (missing fields and mismatches between .files and .db databases)

I added that extra step so that certain consistency checks in Parabolaweb wouldn't fail and skip over populating file lists; which in turn was the cause of further errors. ALARM doesn't use something like Archweb, nor their website knows anything about a package's files, so they don't care/don't notice the issue. I knew the process was very slow and wasteful but didn't expect it to hang up completely (or even take more than it takes cron to call the script again). I can't recall exactly how long it took under my test environment, although the figures weren't that bad. Only a single-core VM, half a GiB of RAM and, admittedly, an SSD were used. Maybe my expectations were wrong.

It would be nice to try to record the full repo-add command line, as seen in the process tree, the next time feces hit the fan. This could be related to a specific package

Updated by lukeshu about 7 years ago

It's going on right now. I've attached to it with gdb. I've identified the problem, but not the root cause.

Bash's bgpids.storage is wrong somehow. It gets stuck the loop in bgp_delete (jobs.c:867). A comment describes bgpids.storage it as being circular, but that seems to be contradicted by the pid != NO_PIDSTAT check; so I'm not entirely sure what "correct" is.