Blocking:-774301 Owner: shenghua...@chromium.org Status: Untriaged (was: WontFix) Summary: Mac trybots sometimes time out archiving layout test results (was: mac_chromium_10.13_rel_ng trybot times out archiving layout test results)
I ssh-ed onto the bot running https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/642715
I ran "dtruss -p <PID>" (dtrace is strace on macs) on one of the process that the gsutil -m cp command had spawned, and got this:
SYSCALL(args) = return
read(0x36, "\200\002U\006#ERRORq\001cQueue\nEmpty\nq\002)Rq\003\206q\004.\0", 0x23) = 35 0
write(0x36, "\0", 0x31) = 49 0
read(0x36, "\0", 0x4) = 4 0
read(0x36, "\200\002U\006#ERRORq\001cQueue\nEmpty\nq\002)Rq\003\206q\004.\0", 0x23) = 35 0
write(0x36, "\0", 0x31) = 49 0
read(0x36, "\0", 0x4) = 4 0
read(0x36, "\200\002U\006#ERRORq\001cQueue\nEmpty\nq\002)Rq\003\206q\004.\0", 0x23) = 35 0
write(0x36, "\0", 0x31) = 49 0
I don't know syscalls super well, but seeing ERROR Queue Empty in there is strange. So something weird seems to be happening.
That doesn't work on mac os as far as I know. gdb isn't installed on our machines, and I don't think we can get python debugging symbols installed on them as well.
I made a PLX query looking for archive steps that took longer than 10 minutes. http://shortn/_LwYV6ZO2sh
It looks like this has been happening more frequently since the 24th.
I glanced through tools/build and depot_tools git log, and didn't see anything suspicious. Not sure why it's spiking now.
It's also not clear if there's a quick way to solve this. We could turn off this upload, which would solve the problem, but that will have to go through CQ, so existing jobs will fail, and we will lose data.
Also glanced through //third_party/Webkit/Tools, and didn't see anything.
https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/620004 which is a build from a month ago has about the same number of files. So something else weird is going on here. It took much less time I think though.
I might just disable uploading layout test results for this builder so that the builder isn't always failing.
I would disable the recursive upload first, before turning off the whole thing. Better a partial upload than none at all, and I'm pretty sure it's the recursive upload that's the problem.
Comment 1 by robertma@chromium.org
, Feb 1 2018