New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 656023 link

Starred by 1 user

Issue metadata

Status: Fixed
Merged: issue 655849
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

paygen signing requests timing out

Project Member Reported by semenzato@chromium.org, Oct 14 2016

Issue description

We still see signer failures in canaries started today at 2AM, for instance nyan-release build 487.  They may be the same issue discussed on email thread yesterday (sorry, I should have opened a bug then).  Here's part of the thread.

07:05:27: INFO: RunCommand: /b/cbuild/internal_master/.cache/common/gsutil_4.19.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' stat -- gs://chromeos-releases/canary-channel/cyan/8893.0.0/payloads/signing/30855-140362444756800/1.payload.hash.update_signer.signed.bin
07:05:27: WARNING: GS_ERROR: No URLs matched: gs://chromeos-releases/canary-channel/cyan/8893.0.0/payloads/signing/30855-140362444756800/1.payload.hash.update_signer.signed.bin 
07:05:37: ERROR: Signer request timed out.

Mike replied:

a bad CL was landed in the signer.  i'm fixing it now.

but I don't know which repo it's in, so I have trouble tracking this problem.  Thanks!


 
Apparently the fix[0] was merged yesterday. But, I have no insight into the operation of the signers. Not sure if they needed a restart or something like that.

 bug 655849  was opened to help prevent this breakage in the future.

0 - https://chrome-internal-review.googlesource.com/#/c/296096/
Mergedinto: 655849
Status: Duplicate (was: Untriaged)
Thank you Aseda!

Comment 3 by vapier@chromium.org, Oct 14 2016

Cc: -aaboagye@chromium.org -snanda@chromium.org -keta...@chromium.org -shchen@chromium.org dgarr...@chromium.org
Status: Available (was: Duplicate)
Summary: paygen signing requests timing out (was: signer errors in canary builds)
i don't think that breakage was relevant here.  the signers should have been running that fix by then.

that log snippet doesn't seem to be from the referenced buildbot.  this is build 487:
https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release/builds/487/steps/Paygen/logs/stdio/text
that ran during 05:00, but that log snippet is from 07:00

that said, that log does fail with Signing of hashes failed.  earlier in the log is:
05:20:32: INFO: RunCommand: tar -cjf /tmp/cbuildbot-tmpa4s8Cm/tmpqulnk6 1.payload.hash 2.payload.hash in /tmp/cbuildbot-tmpa4s8Cm/tmpM5oMZq
...
05:50:37: ERROR: Signer request timed out.

so it waited 30 min and gave up.  i'm guessing the signers were just fully loaded at that time and didn't get back around to processing the request in time.  which probably means we're again hitting capacity.

since the paygen step tends to be a critical one with lots of small files, we could add yet-another-knob here where we set a filesize cap.  then we can configure one signer to never process requests that are over a certain size (say 100MB).  that way even if all the signers are processing recovery images (which are 1.5GB+), we have one that'll churn through the tiny requests.

how big are the normal paygen requests ?
#3 thank you for reopening.

Sorry if this wasn't clear: the snippet is from yesterday, when we had the email thread.  Your reply "I am fixing it now" was sent after that snippet.  That snippet is from cyan-release build 484.

nyan-release build 487 is from this morning and shows the same pattern.  Are they different failures?

Comment 5 by vapier@chromium.org, Oct 14 2016

timeouts from earlier were prob due to the bad CL, but timeouts after that are probably from hitting capacity.  i think there were more timeouts (most of them?) before the fix landed.
The payload hash signatures are very small, just hashes of the payloads, not actual payloads. I'd have to look up the exact size, but probably 10's of bytes.
#6 thanks, so it's not network capacity.  Could it be CPU capacity when computing the payload?
Could be that there simply aren't enough signers, or that are some scheduling inefficiencies happening.

I made a change a while back that increased the cost of signing a recovery image (but helped our recovery image publishing tools) that would have reduced our overall signing capacity.

Comment 9 by vapier@chromium.org, Oct 14 2016

oh right, paygen requests are just hashes

lemme find the normal size for firmware requests and i'll set the initial bar there.  signing firmware should be fast, so dedicating one instance to those and paygen requests should be OK.
Cc: nxia@chromium.org diand...@chromium.org itspeter@chromium.org akes...@chromium.org sbasi@chromium.org
 Issue 675980  has been merged into this issue.
Project Member

Comment 11 by bugdroid1@chromium.org, Feb 9 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/3772a4696f6200c5195264d5fbf540708fad005c

commit 3772a4696f6200c5195264d5fbf540708fad005c
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Feb 09 01:18:53 2017

Project Member

Comment 12 by bugdroid1@chromium.org, Feb 11 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/5c3c8e73cb9003b464dc9d2910a61705bda86972

commit 5c3c8e73cb9003b464dc9d2910a61705bda86972
Author: Mike Frysinger <vapier@chromium.org>
Date: Sat Feb 11 17:55:25 2017

Project Member

Comment 13 by bugdroid1@chromium.org, Feb 13 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/a3c338696b4b7d4348946feedc0d717957ce09cc

commit a3c338696b4b7d4348946feedc0d717957ce09cc
Author: Mike Frysinger <vapier@chromium.org>
Date: Mon Feb 13 20:06:40 2017

Project Member

Comment 14 by bugdroid1@chromium.org, Feb 14 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/2f6c277dd7c2d817be606f5ec3c107ba1db5f194

commit 2f6c277dd7c2d817be606f5ec3c107ba1db5f194
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Feb 14 19:29:39 2017

Project Member

Comment 15 by bugdroid1@chromium.org, Feb 14 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/2f6c277dd7c2d817be606f5ec3c107ba1db5f194

commit 2f6c277dd7c2d817be606f5ec3c107ba1db5f194
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Feb 14 19:29:39 2017

Project Member

Comment 16 by bugdroid1@chromium.org, Feb 14 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/0cf6dc1a5620d7639d9636f5a6cff9f05b2db7fd

commit 0cf6dc1a5620d7639d9636f5a6cff9f05b2db7fd
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Feb 14 23:26:50 2017

Project Member

Comment 17 by bugdroid1@chromium.org, Feb 14 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/f12eb3e9fb51229136a34c7a60ab8e06fe0e3309

commit f12eb3e9fb51229136a34c7a60ab8e06fe0e3309
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Feb 14 23:41:50 2017

Project Member

Comment 18 by bugdroid1@chromium.org, Feb 15 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/cros-signing/+/b474500ad6df3d294e8bf5818b7d51cac35f35b4

commit b474500ad6df3d294e8bf5818b7d51cac35f35b4
Author: Mike Frysinger <vapier@chromium.org>
Date: Wed Feb 15 19:07:13 2017

Status: Fixed (was: Available)
i checked the logs of 9.cbf and it seems to be working -- it skips the recovery images due to the size, but processes all the paygen requests fine

this should address issues with paygen requests not getting processed fast enough
Thanks a lot!

Comment 21 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 22 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 24 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)
Status: Fixed (was: Archived)

Sign in to add a comment