nyan-release consistently failing signing |
|||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release 9 of the last 10 builds failed signing. The log shows: ** Start Stage Signing - Tue, 24 May 2016 21:01:28 -0700 (PDT) ... 21:01:28: INFO: Waiting for image signing for: nyan, 8369.0.0 21:01:28: INFO: GS errors are a normal part of the polling for results. 21:01:28: INFO: Waiting for signer results. ... 21:51:22: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' cat gs://chromeos-releases/canary-channel/nyan/8369.0.0/ChromeOS-recovery-R53-8369.0.0-nyan.instructions.json 21:52:22: WARNING: GS_ERROR: Traceback (most recent call last): ... socket.timeout: timed out So, signing stage times out after ~50 minutes. I can manually run the command: gsutil -o 'Boto:num_retries=10' cat gs://chromeos-releases/canary-channel/nyan/8369.0.0/ChromeOS-recovery-R53-8369.0.0-nyan.instructions.json Here is the result: {"status": {"status": "passed", "summary": "", "details": "", "current-time": "Wed, 25 May 2016 04:53:25 +0000 (UTC)"}, "keyset_is_mp": false, "metadata-version": "1", "input-archive": "ChromeOS-recovery-R53-8369.0.0-nyan.tar.xz", "bot-hostname": "chromeos-signing16.hot.corp.google.com", "keyset": "nyan-premp", "version": {"platform": "8369.0.0", "full": "R53-8369.0.0", "milestone": "53"}, "board": "nyan", "time": {"finish": "Wed, 25 May 2016 04:53:12 +0000 (UTC)", "upload": { "chromeos_8369.0.0_nyan_recovery_canary-channel_premp.bin": {"duration": "0:02:16.015533", "start": "Wed, 25 May 2016 04:50:54 +0000 (UTC)"} }, "sign": {"recovery_image.bin": {"duration": "0:04:17.036300", "start": "Wed, 25 May 2016 04:46:21 +0000 (UTC)"}}, "start": "Wed, 25 May 2016 04:45:11 +0000 (UTC)", "download": {"duration": "0:00:10.790948", "start": "Wed, 25 May 2016 04:45:16 +0000 (UTC)"}, "unpack": {"duration": "0:00:46.564516", "start": "Wed, 25 May 2016 04:45:31 +0000 (UTC)"}}, "type": "recovery", "channel": "canary" } So, signing did eventually succeed, but it took a while, finishing at: Wed, 25 May 2016 04:53:25 +0000 (UTC) Which (converting time-zones to PDT is: 21:53:25 Or, about 1 minute after the bot gave up. So, is there a 'signer time out' that needs to be extended?
,
May 25 2016
Is is true that the ungrouping will have reduced the distribution of signing requests, and so will probably cause a bigger spike in signing load.
,
May 25 2016
Had a one-off signing failure here, but it also failed the SignerTest, so this might be expected. https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release/builds/58
,
May 25 2016
Since Monday, our slowest release build has been 6:45:26. That does give us some room to increase timeouts, but I'd probably rather give that time to hwtests than signing since stability in the lab has been a harder problem to solve. So... if nyan-release is timing out because our load is too spikey, I'd rather add a few more signing servers and speed things up. It does seem weird that it's always the same board hitting the timeout.
,
May 25 2016
my hypothesis: if the builders are consistent-ish in how long they take to build & upload, then the overall queue build up would probably look the same, so if nyan is hitting the same point in the upload queue, i could see it timing out. especially because the signers currently favor older requests over newer requests when they're posted at the same priority #.
,
May 25 2016
i'll add 3 more instances tonight. let's see if that helps. if it does, i'll prob request a few more after that.
,
May 25 2016
Thanks!
,
May 25 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/97fce6c0a523ac80cbe49e93a46e5614a54309a2 commit 97fce6c0a523ac80cbe49e93a46e5614a54309a2 Author: Mike Frysinger <vapier@chromium.org> Date: Wed May 25 22:30:59 2016
,
May 25 2016
3 new bots are live
,
May 26 2016
The recent 2 nyan-release builds successfully. More instances help.
,
May 26 2016
did any other bots fail due to timeouts ?
,
May 28 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/117a68f6479fc5d9532794a199d6747f5a8b65e7 commit 117a68f6479fc5d9532794a199d6747f5a8b65e7 Author: Mike Frysinger <vapier@chromium.org> Date: Sat May 28 06:00:00 2016
,
May 31 2016
assuming fixed now after adding 8 more instances
,
May 31 2016
,
Jul 1 2016
Bulk verified |
|||
►
Sign in to add a comment |
|||
Comment 1 by vapier@chromium.org
, May 25 2016