InitSDK failing intermittently on several builders |
|||||
Issue description
************************************************************
** Start Stage InitSDK - Tue, 04 Sep 2018 08:07:54 -0700 (PDT)
**
** Stage that is responsible for initializing the SDK.
************************************************************
08:07:54: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d5d0>
Preconditions for the stage successfully met. Beginning to execute stage...
08:07:54: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d150>
08:07:54: INFO: RunCommand: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk --buildbot-log-version --create --replace in /b/swarming/w/ir/cache/cbuild/repository
08:07:55: NOTICE: Deleting chroot.
08:08:01: NOTICE: Mounted /b/swarming/w/ir/cache/cbuild/repository/chroot.img on chroot
STEP_TEXT: 2018.09.03.043946
08:08:01: NOTICE: Downloading SDK tarball...
cros_sdk: Unhandled exception:
Traceback (most recent call last):
File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk", line 169, in <module>
DoMain()
File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk", line 165, in DoMain
commandline.ScriptWrapperMain(FindTarget)
File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/commandline.py", line 912, in ScriptWrapperMain
ret = target(argv[1:])
File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_sdk.py", line 1093, in main
sdk_cache, urls, 'stage3' if options.bootstrap else 'SDK')
File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_sdk.py", line 149, in FetchRemoteTarballs
raise ValueError('No valid URLs found!')
ValueError: No valid URLs found!
[1;31m08:08:32: ERROR:
return code: 1; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk --buildbot-log-version --create --replace
cmd=['/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk', '--buildbot-log-version', '--create', '--replace'], cwd=/b/swarming/w/ir/cache/cbuild/repository, extra env={'USE': u'chrome_internal', 'FEATURES': 'separatedebug'}[0m
[1;31m08:08:32: ERROR: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk failed (code=1)[0m
08:08:32: INFO: Translating result /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk failed (code=1) to fail.
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d7d0>
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f1629d2d250>
************************************************************
** Finished Stage InitSDK - Tue, 04 Sep 2018 08:08:32 -0700 (PDT)
************************************************************
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f1629d2d310>
,
Sep 4
Is grunt the only board where this is happening? The error looks like it failed to download the current stable chroot image, which wouldn't be grunt-specific.
,
Sep 4
Probably not. grunt was the failure that I saw, but I would expect this to be happening across the board(s).
,
Sep 4
I can only find the one occurrence of this issue. Ideally there would be additional logging but it appears to be related to an invalid SDK version in which there was no match to generate the SDK URL thus resulting in a planned exception falling out of FetchRemoteTarballs. I'm going to lower the priority, as the issue is not blocking, but leave this open as I feel we should improve logging to allow for debugging this issue in the future; urls are generated a few different ways therefore not knowing where this is falling out leaves this as an unknown. -- Mike
,
Sep 4
Here's a samus one: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fsamus-paladin%2F18125%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout I'm not sure this is not blocking if it's failing people's PreCQ.
,
Sep 4
not all of the runs for this config are failing. if you look at the recent history, it's like 20% of them ? https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=grunt-no-vmtest-pre-cq&buildBranch=master https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936316040226884928 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936306996226461088 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936305880822294752 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936304974079700304 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936304442230700896 the tarball download logic can be a little deceptive. we iterate over some possible URIs in cros_sdk:FetchRemoteTarballs and then don't show any actual errors when it fails. this goes back to when we used to have bz2 and xz inputs, so a missing tarball wasn't exactly an error. we probably should strip this code down so that failures are verbose and we can see the underlying network failure. specifically: - delete COMPRESSION_PREFERENCE - inline 'xz' in GetArchStageTarballs and GetStage3Urls (and drop support for bz2) - simplify FetchRemoteTarballs to only accept one url instead of a list of urls - delete the |for url in urls| loop entirely so we don't bother with the `curl -I` logic which only exists to probe existence of the remote file
,
Sep 4
,
Sep 4
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jorgelo@chromium.org
, Sep 4