New issue
Advanced search Search tips

Issue 614606 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

nyan-release consistently failing signing

Project Member Reported by djkurtz@chromium.org, May 25 2016

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release

9 of the last 10 builds failed signing.

The log shows:
** Start Stage Signing - Tue, 24 May 2016 21:01:28 -0700 (PDT)
...
21:01:28: INFO: Waiting for image signing for: nyan, 8369.0.0
21:01:28: INFO: GS errors are a normal part of the polling for results.
21:01:28: INFO: Waiting for signer results.
...
21:51:22: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' cat gs://chromeos-releases/canary-channel/nyan/8369.0.0/ChromeOS-recovery-R53-8369.0.0-nyan.instructions.json
21:52:22: WARNING: GS_ERROR: Traceback (most recent call last):
...
socket.timeout: timed out

So, signing stage times out after ~50 minutes.

I can manually run the command:
gsutil -o 'Boto:num_retries=10' cat gs://chromeos-releases/canary-channel/nyan/8369.0.0/ChromeOS-recovery-R53-8369.0.0-nyan.instructions.json

Here is the result:

{"status": {"status": "passed", "summary": "", "details": "", "current-time": "Wed, 25 May 2016 04:53:25 +0000 (UTC)"}, 
 "keyset_is_mp": false, "metadata-version": "1", "input-archive": "ChromeOS-recovery-R53-8369.0.0-nyan.tar.xz",
 "bot-hostname": "chromeos-signing16.hot.corp.google.com", "keyset": "nyan-premp",
 "version": {"platform": "8369.0.0", "full": "R53-8369.0.0", "milestone": "53"}, 
 "board": "nyan", "time": {"finish": "Wed, 25 May 2016 04:53:12 +0000 (UTC)", 
 "upload": { "chromeos_8369.0.0_nyan_recovery_canary-channel_premp.bin": {"duration": "0:02:16.015533", "start": "Wed, 25 May 2016 04:50:54 +0000 (UTC)"} },
 "sign": {"recovery_image.bin": {"duration": "0:04:17.036300", "start": "Wed, 25 May 2016 04:46:21 +0000 (UTC)"}}, 
 "start": "Wed, 25 May 2016 04:45:11 +0000 (UTC)", 
 "download": {"duration": "0:00:10.790948", "start": "Wed, 25 May 2016 04:45:16 +0000 (UTC)"}, 
 "unpack": {"duration": "0:00:46.564516", "start": "Wed, 25 May 2016 04:45:31 +0000 (UTC)"}}, 
 "type": "recovery", "channel": "canary"
}

So, signing did eventually succeed, but it took a while, finishing at:
 Wed, 25 May 2016 04:53:25 +0000 (UTC)

Which (converting time-zones to PDT is: 21:53:25

Or, about 1 minute after the bot gave up.

So, is there a 'signer time out' that needs to be extended?
 

Comment 1 by vapier@chromium.org, May 25 2016

we might need to add some more signer instances.  is this the only bot hitting a timeout ?  can you check the results for the other runs ?
Is is true that the ungrouping will have reduced the distribution of signing requests, and so will probably cause a bigger spike in signing load.
Had a one-off signing failure here, but it also failed the SignerTest, so this might be expected.

https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release/builds/58
Since Monday, our slowest release build has been 6:45:26. That does give us some room to increase timeouts, but I'd probably rather give that time to hwtests than signing since stability in the lab has been a harder problem to solve.

So... if nyan-release is timing out because our load is too spikey, I'd rather add a few more signing servers and speed things up.

It does seem weird that it's always the same board hitting the timeout.

Comment 5 by vapier@chromium.org, May 25 2016

my hypothesis:
if the builders are consistent-ish in how long they take to build & upload, then the overall queue build up would probably look the same, so if nyan is hitting the same point in the upload queue, i could see it timing out.  especially because the signers currently favor older requests over newer requests when they're posted at the same priority #.

Comment 6 by vapier@chromium.org, May 25 2016

i'll add 3 more instances tonight.  let's see if that helps.  if it does, i'll prob request a few more after that.
Thanks!
Project Member

Comment 8 by bugdroid1@chromium.org, May 25 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/97fce6c0a523ac80cbe49e93a46e5614a54309a2

commit 97fce6c0a523ac80cbe49e93a46e5614a54309a2
Author: Mike Frysinger <vapier@chromium.org>
Date: Wed May 25 22:30:59 2016

Comment 9 by vapier@chromium.org, May 25 2016

3 new bots are live
The recent 2 nyan-release builds successfully. More instances help.
did any other bots fail due to timeouts ?
Project Member

Comment 12 by bugdroid1@chromium.org, May 28 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/117a68f6479fc5d9532794a199d6747f5a8b65e7

commit 117a68f6479fc5d9532794a199d6747f5a8b65e7
Author: Mike Frysinger <vapier@chromium.org>
Date: Sat May 28 06:00:00 2016

assuming fixed now after adding 8 more instances
Status: Fixed (was: Assigned)
Status: Verified (was: Fixed)
Bulk verified

Sign in to add a comment