Issue metadata
Sign in to add a comment
|
paygen signing requests timing out |
||||||||||||||||||||||||
Issue descriptionWe still see signer failures in canaries started today at 2AM, for instance nyan-release build 487. They may be the same issue discussed on email thread yesterday (sorry, I should have opened a bug then). Here's part of the thread. 07:05:27: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' stat -- gs://chromeos-releases/canary-channel/cyan/8893.0.0/payloads/signing/30855-140362444756800/1.payload.hash.update_signer.signed.bin 07:05:27: WARNING: GS_ERROR: No URLs matched: gs://chromeos-releases/canary-channel/cyan/8893.0.0/payloads/signing/30855-140362444756800/1.payload.hash.update_signer.signed.bin 07:05:37: ERROR: Signer request timed out. Mike replied: a bad CL was landed in the signer. i'm fixing it now. but I don't know which repo it's in, so I have trouble tracking this problem. Thanks!
,
Oct 14 2016
,
Oct 14 2016
i don't think that breakage was relevant here. the signers should have been running that fix by then. that log snippet doesn't seem to be from the referenced buildbot. this is build 487: https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release/builds/487/steps/Paygen/logs/stdio/text that ran during 05:00, but that log snippet is from 07:00 that said, that log does fail with Signing of hashes failed. earlier in the log is: 05:20:32: INFO: RunCommand: tar -cjf /tmp/cbuildbot-tmpa4s8Cm/tmpqulnk6 1.payload.hash 2.payload.hash in /tmp/cbuildbot-tmpa4s8Cm/tmpM5oMZq ... 05:50:37: ERROR: Signer request timed out. so it waited 30 min and gave up. i'm guessing the signers were just fully loaded at that time and didn't get back around to processing the request in time. which probably means we're again hitting capacity. since the paygen step tends to be a critical one with lots of small files, we could add yet-another-knob here where we set a filesize cap. then we can configure one signer to never process requests that are over a certain size (say 100MB). that way even if all the signers are processing recovery images (which are 1.5GB+), we have one that'll churn through the tiny requests. how big are the normal paygen requests ?
,
Oct 14 2016
#3 thank you for reopening. Sorry if this wasn't clear: the snippet is from yesterday, when we had the email thread. Your reply "I am fixing it now" was sent after that snippet. That snippet is from cyan-release build 484. nyan-release build 487 is from this morning and shows the same pattern. Are they different failures?
,
Oct 14 2016
timeouts from earlier were prob due to the bad CL, but timeouts after that are probably from hitting capacity. i think there were more timeouts (most of them?) before the fix landed.
,
Oct 14 2016
The payload hash signatures are very small, just hashes of the payloads, not actual payloads. I'd have to look up the exact size, but probably 10's of bytes.
,
Oct 14 2016
#6 thanks, so it's not network capacity. Could it be CPU capacity when computing the payload?
,
Oct 14 2016
Could be that there simply aren't enough signers, or that are some scheduling inefficiencies happening. I made a change a while back that increased the cost of signing a recovery image (but helped our recovery image publishing tools) that would have reduced our overall signing capacity.
,
Oct 14 2016
oh right, paygen requests are just hashes lemme find the normal size for firmware requests and i'll set the initial bar there. signing firmware should be fast, so dedicating one instance to those and paygen requests should be OK.
,
Dec 20 2016
Issue 675980 has been merged into this issue.
,
Feb 9 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/3772a4696f6200c5195264d5fbf540708fad005c commit 3772a4696f6200c5195264d5fbf540708fad005c Author: Mike Frysinger <vapier@chromium.org> Date: Thu Feb 09 01:18:53 2017
,
Feb 11 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/5c3c8e73cb9003b464dc9d2910a61705bda86972 commit 5c3c8e73cb9003b464dc9d2910a61705bda86972 Author: Mike Frysinger <vapier@chromium.org> Date: Sat Feb 11 17:55:25 2017
,
Feb 13 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/a3c338696b4b7d4348946feedc0d717957ce09cc commit a3c338696b4b7d4348946feedc0d717957ce09cc Author: Mike Frysinger <vapier@chromium.org> Date: Mon Feb 13 20:06:40 2017
,
Feb 14 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/2f6c277dd7c2d817be606f5ec3c107ba1db5f194 commit 2f6c277dd7c2d817be606f5ec3c107ba1db5f194 Author: Mike Frysinger <vapier@chromium.org> Date: Tue Feb 14 19:29:39 2017
,
Feb 14 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/2f6c277dd7c2d817be606f5ec3c107ba1db5f194 commit 2f6c277dd7c2d817be606f5ec3c107ba1db5f194 Author: Mike Frysinger <vapier@chromium.org> Date: Tue Feb 14 19:29:39 2017
,
Feb 14 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/0cf6dc1a5620d7639d9636f5a6cff9f05b2db7fd commit 0cf6dc1a5620d7639d9636f5a6cff9f05b2db7fd Author: Mike Frysinger <vapier@chromium.org> Date: Tue Feb 14 23:26:50 2017
,
Feb 14 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/f12eb3e9fb51229136a34c7a60ab8e06fe0e3309 commit f12eb3e9fb51229136a34c7a60ab8e06fe0e3309 Author: Mike Frysinger <vapier@chromium.org> Date: Tue Feb 14 23:41:50 2017
,
Feb 15 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/cros-signing/+/b474500ad6df3d294e8bf5818b7d51cac35f35b4 commit b474500ad6df3d294e8bf5818b7d51cac35f35b4 Author: Mike Frysinger <vapier@chromium.org> Date: Wed Feb 15 19:07:13 2017
,
Feb 15 2017
i checked the logs of 9.cbf and it seems to be working -- it skips the recovery images due to the size, but processes all the paygen requests fine this should address issues with paygen requests not getting processed fast enough
,
Feb 15 2017
Thanks a lot!
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
,
Jun 21 2018
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by aaboagye@chromium.org
, Oct 14 2016