New issue
Advanced search Search tips

Issue 898161 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 29
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 1
Type: Bug



Sign in to add a comment

gclient runhooks failing on bots while trying to download NaCl toolchain

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Oct 23

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of yigu@chromium.org

gclient runhooks failing on chromium.mac/Mac Builder (dbg)

Builders failed on: 
- Mac Builder (dbg): 
  https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Builder%20%28dbg%29


 
Labels: Infra-Troopers
We have seen more than 3 times of gclient runhooks flaky across platforms. The tree has been closed twice due to this issue. Is this something infra can take a look at?
Components: Infra>Client>Chrome
Cc: gbeaty@chromium.org
This is still causing the tree to close.
Status: Untriaged (was: Available)
Looks like these are being caused by google storage timeouts? Not sure why this is happening though. It's being very flaky, our network usually doesn't do that I think...

https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931880141783353872/+/steps/gclient_runhooks/0/stdout and https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931851555621197216/+/steps/gclient_runhooks/0/stdout are two sample builds.

Not sure what we can do about this :(
Cc: bradnelson@chromium.org dschuff@chromium.org hinoka@chromium.org
Summary: gclient runhooks failing on mac bots (was: gclient runhooks failing on chromium.mac/Mac Builder (dbg))
This seems to be more widespread than just Mac Builder (dbg).

cc-ing some nacl people. It looks like things have changed recently? Compare https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8931838916019215728/+/steps/gclient_runhooks/0/stdout (a recent broken gclient runhooks change) to https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8932506464447570928/+/steps/gclient_runhooks/0/stdout (an older gclient runhooks run). The old one doesn't seem to have the file which is causing errors in the recent runs.

I didn't see any changes to the actual download_nacl_toolchains.py in 11 months, so I don't know where to look for the changes that broke this.
It looks like the successful run isn't printing out the full URL of the mac toolchain archive, but there hasn't been any change anytime in recent months which would have changed that URL (and as you noted, there also hasn't been any change in the script that downloads it). The last time the toolchain archive was updated was Feb 2.
Labels: Type-Bug
It's actually failing now on Windows bots too, why they are trying to download the mac toolchain I don't know!

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20%28dbg%29%281%29/72494
Summary: gclient runhooks failing on bots (was: gclient runhooks failing on mac bots)
Summary: gclient runhooks failing on mac bots (was: gclient runhooks failing on bots)
Err scratch that, I skim read too fast, thats "nACL" not "MAC" -- sorry for the noise!
Labels: OS-Mac OS-Windows
Summary: gclient runhooks failing on bots (was: gclient runhooks failing on mac bots )
Same issue though, it's failing to download nacl toolchain.
Components: Infra>Client>NaCl
Summary: gclient runhooks failing on bots while trying to download NaCl toolchain (was: gclient runhooks failing on bots )
Cc: -yigu@chromium.org
I think this is probably something we should try to get Cloud oncall involved in at this point ...
Cc: -bradnelson@chromium.org thakis@chromium.org
Note that it appears that the "gclient runhooks" cache isn't being persisted across builds on that bot. For each build, the step takes 9-10 min. Compare that to the similar bot https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Builder, where it takes ~20 sec every time. (+thakis who I think saw something similar somewhere else)

The swarming bot's likely clobbering the cache after each build to free up the required disk space. Worth looking into.
My similar issue is  issue 897854 , but that's with a non-standard recipe. I'd imagine that the default recipe used by Mac Builder (dbg) should get that part right.
So maybe we need larger harddrives in that pool?
Dropping all debug_bots (in the mb sense) to symbol_level=1 as a mitigation that should help w/ builder caches getting blown away: https://chromium-review.googlesource.com/c/chromium/src/+/1299776
Project Member

Comment 20 by bugdroid1@chromium.org, Oct 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/904b3a9293660b5d2c6e70b31961ff57acac7d90

commit 904b3a9293660b5d2c6e70b31961ff57acac7d90
Author: John Budorick <jbudorick@chromium.org>
Date: Fri Oct 26 16:29:19 2018

Switch debug_bots to symbol_level=1 by default.

Bug:  898161 
Change-Id: I0f9f09f64318424fedad0ce3fabb116902c561aa
Reviewed-on: https://chromium-review.googlesource.com/c/1299776
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#603110}
[modify] https://crrev.com/904b3a9293660b5d2c6e70b31961ff57acac7d90/tools/mb/mb_config.pyl

Labels: -Sheriff-Chromium
Owner: jbudorick@chromium.org
Status: Assigned (was: Untriaged)
[Sheriff update] Around 180 consecutive builds now without the failure. I'm removing from sheriff queue and assigning to jbudorick@ since you made the change to avoid the problem. I'll leave it to you all involved in the discussions to close or keep open.
Status: Fixed (was: Assigned)
Marking fixed. We're keeping the builder cache after #20 as evinced by:

  cycle time: http://shortn/_KJ33PPKk0t
  disk usage: http://shortn/_HJX1ykNwaX

Sign in to add a comment