gclient runhooks failing on bots while trying to download NaCl toolchain |
|||||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of yigu@chromium.org gclient runhooks failing on chromium.mac/Mac Builder (dbg) Builders failed on: - Mac Builder (dbg): https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Builder%20%28dbg%29
,
Oct 23
,
Oct 23
This is still causing the tree to close.
,
Oct 23
,
Oct 23
Looks like these are being caused by google storage timeouts? Not sure why this is happening though. It's being very flaky, our network usually doesn't do that I think... https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931880141783353872/+/steps/gclient_runhooks/0/stdout and https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931851555621197216/+/steps/gclient_runhooks/0/stdout are two sample builds. Not sure what we can do about this :(
,
Oct 23
This seems to be more widespread than just Mac Builder (dbg). cc-ing some nacl people. It looks like things have changed recently? Compare https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931838916019215728/+/steps/gclient_runhooks/0/stdout (a recent broken gclient runhooks change) to https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8932506464447570928/+/steps/gclient_runhooks/0/stdout (an older gclient runhooks run). The old one doesn't seem to have the file which is causing errors in the recent runs. I didn't see any changes to the actual download_nacl_toolchains.py in 11 months, so I don't know where to look for the changes that broke this.
,
Oct 24
It looks like the successful run isn't printing out the full URL of the mac toolchain archive, but there hasn't been any change anytime in recent months which would have changed that URL (and as you noted, there also hasn't been any change in the script that downloads it). The last time the toolchain archive was updated was Feb 2.
,
Oct 24
It's actually failing now on Windows bots too, why they are trying to download the mac toolchain I don't know! https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20%28dbg%29%281%29/72494
,
Oct 24
,
Oct 24
Err scratch that, I skim read too fast, thats "nACL" not "MAC" -- sorry for the noise!
,
Oct 24
Same issue though, it's failing to download nacl toolchain.
,
Oct 24
,
Oct 24
,
Oct 24
,
Oct 24
I think this is probably something we should try to get Cloud oncall involved in at this point ...
,
Oct 24
Note that it appears that the "gclient runhooks" cache isn't being persisted across builds on that bot. For each build, the step takes 9-10 min. Compare that to the similar bot https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Builder, where it takes ~20 sec every time. (+thakis who I think saw something similar somewhere else) The swarming bot's likely clobbering the cache after each build to free up the required disk space. Worth looking into.
,
Oct 24
My similar issue is issue 897854 , but that's with a non-standard recipe. I'd imagine that the default recipe used by Mac Builder (dbg) should get that part right.
,
Oct 24
So maybe we need larger harddrives in that pool?
,
Oct 25
Dropping all debug_bots (in the mb sense) to symbol_level=1 as a mitigation that should help w/ builder caches getting blown away: https://chromium-review.googlesource.com/c/chromium/src/+/1299776
,
Oct 26
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/904b3a9293660b5d2c6e70b31961ff57acac7d90 commit 904b3a9293660b5d2c6e70b31961ff57acac7d90 Author: John Budorick <jbudorick@chromium.org> Date: Fri Oct 26 16:29:19 2018 Switch debug_bots to symbol_level=1 by default. Bug: 898161 Change-Id: I0f9f09f64318424fedad0ce3fabb116902c561aa Reviewed-on: https://chromium-review.googlesource.com/c/1299776 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: John Budorick <jbudorick@chromium.org> Cr-Commit-Position: refs/heads/master@{#603110} [modify] https://crrev.com/904b3a9293660b5d2c6e70b31961ff57acac7d90/tools/mb/mb_config.pyl
,
Oct 29
[Sheriff update] Around 180 consecutive builds now without the failure. I'm removing from sheriff queue and assigning to jbudorick@ since you made the change to avoid the problem. I'll leave it to you all involved in the discussions to close or keep open.
,
Oct 29
Marking fixed. We're keeping the builder cache after #20 as evinced by: cycle time: http://shortn/_KJ33PPKk0t disk usage: http://shortn/_HJX1ykNwaX |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by yigu@chromium.org
, Oct 23