New issue
Advanced search Search tips

Issue 924270 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Today
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

master-paladin is hanging

Project Member Reported by athilenius@chromium.org, Today (9 hours ago)

Issue description

The master-paladin is hanging https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8923644502201215952 on the CommitQueueCompletion stage https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8923644502201215952/+/steps/CommitQueueCompletion/0/stdout It appears to be a CIDB hang, but CIDB is working fine (as far as I can tell). Bot has a 'python2' process stuck at 100%CPU, so it's looping doing something.

Going to try and attach a debugger rather than just restarting the master this time.
 

Comment 1 by athilenius@chromium.org, Today (9 hours ago)

Call stack trace of the hung process:

#0  0x00007f5f6cd141a1 in vio_should_retry () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#1  0x00007f5f6cd041e4 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#2  0x00007f5f6cd04d2d in my_net_read () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#3  0x00007f5f6ccfe08c in cli_safe_read () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#4  0x00007f5f6ccff292 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#5  0x00007f5f6cd0002e in mysql_real_query () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18
#6  0x00007f5f6d20a34f in ?? () from /usr/lib/python2.7/dist-packages/_mysql.so

ps aux | grep 15439:

root      7206  0.0  0.0  69740  2216 pts/0    S+   12:48   0:00 sudo gdb --pid 15439
root      7207  1.4  0.0 108048 61920 pts/0    S+   12:48   0:04 gdb --pid 15439
athilen+  7369  0.0  0.0  12200   924 pts/1    S+   12:53   0:00 grep --color=auto 15439
chrome-+ 15439 62.0  0.0 295800 129456 ?       t    01:17 431:46 python2 chromite/bin/cbuildbot master-paladin --buildroot /b/swarming/w/ir/cache/cbuild/repository --branch master --buildbucket-id 8923644502201215952 --git-cache-dir /b/swarming/w/ir/cache/git --goma_dir /b/swarming/w/ir/cache/goma/client --goma_client_json /creds/service_accounts/service-account-goma-client.json --buildbot --previous-build-state eyJzdGF0dXMiOiAicGFzcyIsICJtYXN0ZXJfYnVpbGRfaWQiOiAzMzg1NjgxLCAiYnVpbGRidWNrZXRfaWQiOiAiODkyMzY1NzMwNjYwODY3MjQxNiIsICJkaXN0ZmlsZXNfdHMiOiAxNTQ3NjE3MDI0LjAyODUwNSwgImJ1aWxkcm9vdF9sYXlvdXQiOiAyLCAiYnJhbmNoIjogInJlbGVhc2UtUjcxLTExMTUxLkIifQ== --workspace /b/swarming/w/ir/cache/cbuild/workspace --ts-mon-task-num 1 --ts-mon-task-num 2 --resume --timeout 0 --notee --nocgroups --buildroot /b/swarming/w/ir/cache/cbuild/repository --version 11627.0.0-rc2 --validation_pool /b/swarming/w/ir/cache/cbuild/repository/validation_pool.dump --metadata_dump /b/swarming/w/ir/tmp/t/cbuildbot-tmpoBAQRH/metadata7Zwvvf


This is blocking all of CQ and has been for a while, so I'm killing the master paladin now. We can try and trace through what is actually hanging after the fact, although this is the second SQL hang we have seen recently which is concerning.

Comment 2 by athilenius@chromium.org, Today (9 hours ago)

Labels: -Pri-0 Pri-1
Ran `sudo kill -9 15439` and the bot cleaned up then rebooted. Dropping down to a P1 and going to poke around to see if I can figure out *why* it hung.

This is likely the same as  crbug.com/923571 

Comment 3 by mikenichols@chromium.org, Today (6 hours ago)

We believe this is due to the Buildbucket timeout (some sort of hang) that will be resolved with providing a default timeout.  Unfortunately I've been swamped with other issues and havent had a chance to completely implement.  Hope to get to it this week.

-- Mike

Comment 4 by mikenichols@chromium.org, Today (6 hours ago)

Mergedinto: 903713
Status: Duplicate (was: Untriaged)

Sign in to add a comment