New issue
Advanced search Search tips

Issue 880343 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

InitSDK failing intermittently on several builders

Project Member Reported by jorgelo@chromium.org, Sep 4

Issue description

************************************************************
** Start Stage InitSDK - Tue, 04 Sep 2018 08:07:54 -0700 (PDT)
** 
** Stage that is responsible for initializing the SDK.
************************************************************
08:07:54: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d5d0>
Preconditions for the stage successfully met. Beginning to execute stage...
08:07:54: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d150>
08:07:54: INFO: RunCommand: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk --buildbot-log-version --create --replace in /b/swarming/w/ir/cache/cbuild/repository
08:07:55: NOTICE: Deleting chroot.
08:08:01: NOTICE: Mounted /b/swarming/w/ir/cache/cbuild/repository/chroot.img on chroot
STEP_TEXT: 2018.09.03.043946
08:08:01: NOTICE: Downloading SDK tarball...
cros_sdk: Unhandled exception:
Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk", line 169, in <module>
    DoMain()
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk", line 165, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/commandline.py", line 912, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_sdk.py", line 1093, in main
    sdk_cache, urls, 'stage3' if options.bootstrap else 'SDK')
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/scripts/cros_sdk.py", line 149, in FetchRemoteTarballs
    raise ValueError('No valid URLs found!')
ValueError: No valid URLs found!
08:08:32: ERROR: 
return code: 1; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk --buildbot-log-version --create --replace
cmd=['/b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk', '--buildbot-log-version', '--create', '--replace'], cwd=/b/swarming/w/ir/cache/cbuild/repository, extra env={'USE': u'chrome_internal', 'FEATURES': 'separatedebug'}
08:08:32: ERROR: /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk failed (code=1)
08:08:32: INFO: Translating result /b/swarming/w/ir/cache/cbuild/repository/chromite/bin/cros_sdk failed (code=1) to fail.
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f1629d2d7d0>
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f1629d2d250>
************************************************************
** Finished Stage InitSDK - Tue, 04 Sep 2018 08:08:32 -0700 (PDT)
************************************************************
08:08:32: INFO: Running cidb query on pid 17269, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f1629d2d310>
 
Is grunt the only board where this is happening?  The error looks like it failed to download the current stable chroot image, which wouldn't be grunt-specific.
Probably not. grunt was the failure that I saw, but I would expect this to be happening across the board(s).
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>CI
Labels: -Pri-1 Pri-2
Status: Available (was: Untriaged)
I can only find the one occurrence of this issue.  

Ideally there would be additional logging but it appears to be related to an invalid SDK version in which there was no match to generate the SDK URL thus resulting in a planned exception falling out of FetchRemoteTarballs.  

I'm going to lower the priority, as the issue is not blocking, but leave this open as I feel we should improve logging to allow for debugging this issue in the future; urls are generated a few different ways therefore not knowing where this is falling out leaves this as an unknown.  

-- Mike
Here's a samus one: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fsamus-paladin%2F18125%2F%2B%2Frecipes%2Fsteps%2FInitSDK%2F0%2Fstdout

I'm not sure this is not blocking if it's failing people's PreCQ.
Summary: InitSDK failing on grunt-no-vmtest-pre-cq builder (was: InitSDK failing on grunt PreCQ builder)
not all of the runs for this config are failing.  if you look at the recent history, it's like 20% of them ?
https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=grunt-no-vmtest-pre-cq&buildBranch=master

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936316040226884928
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936306996226461088
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936305880822294752
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936304974079700304
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936304442230700896

the tarball download logic can be a little deceptive.  we iterate over some possible URIs in cros_sdk:FetchRemoteTarballs and then don't show any actual errors when it fails.  this goes back to when we used to have bz2 and xz inputs, so a missing tarball wasn't exactly an error.  we probably should strip this code down so that failures are verbose and we can see the underlying network failure.

specifically:
- delete COMPRESSION_PREFERENCE
- inline 'xz' in GetArchStageTarballs and GetStage3Urls (and drop support for bz2)
- simplify FetchRemoteTarballs to only accept one url instead of a list of urls
- delete the |for url in urls| loop entirely so we don't bother with the `curl -I` logic which only exists to probe existence of the remote file
Summary: InitSDK failing intermittently on several builders (was: InitSDK failing on grunt-no-vmtest-pre-cq builder)
Components: Infra>Client>ChromeOS>Build

Sign in to add a comment