Increase the builder's GoB quotas to better accomodate current load |
||||||||
Issue descriptionBasically the same thing happened as was described in bug #769088. This paladin: https://luci-milo.appspot.com/buildbot/chromeos/fizz-paladin/3054 ...failed in CommitQueueSync. In the text of the error you see much of: --- 09:52:42: WARNING: git reported transient error (cmd=fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-fizz-private refs/changes/39/540139/2); retrying Traceback (most recent call last): File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 177, in _Wrapper ret = func(*args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 243, in _run return functor(*args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand raise RunCommandError(msg, cmd_result) RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-fizz-private refs/changes/39/540139/2 fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for <redacted> fatal: The remote end hung up unexpectedly [W git.go:283] Transient error string identified in STDERR: "fatal: The remote end hung up unexpectedly\n" [W git.go:294] Retrying after 3s (rc=128): transient error string encountered --- The previous bug was closed as WontFix since the problem didn't reproduce.
,
Jan 10 2018
Looking at the logs on the master paladin, it seems that the master first tried fizz-paladin/3054, observed the failure, and restarted/retried with fizz-paladin/3055. That second run passed. The theory from dgarrett@ is that since adding new GCE builders our some months ago, our usage has increase enough that we're brushing up against our GoB quota limits. In this particular case "brushing up against" became "outright exceeded". Since the quota we hit was short term, retry works in that case. Assuming the theory is correct, the correct response will be to request more quota.
,
Jan 16 2018
This didn't get done last week. We need to find time to move it forward.
,
Jan 22 2018
-> don to annotate this bug with the metric that shows the problem, then rediscuss in next meeting
,
Jan 22 2018
This graph shows our usage over the last year. https://viceroy.corp.google.com/chromeos/gerrit?duration=38707218
,
Jan 22 2018
,
Jan 24 2018
I had a thought. We currently have wrappers around git commands that look for a variety of errors. On builders, there is another wrapper that also does retries. Are we using excessive quota in some cases because of an excessive number of retries? <outer retries> * <inner retries> ?
,
Jan 29 2018
will ask to increase quota.
,
Jan 30 2018
I see several paladins failing with the same error now.
,
Jan 30 2018
https://uberchromegw.corp.google.com/i/chromeos/builders/chell-paladin/builds/3813 https://uberchromegw.corp.google.com/i/chromeos/builders/daisy_skate-paladin/builds/12494 https://uberchromegw.corp.google.com/i/chromeos/builders/eve-paladin/builds/2248 https://uberchromegw.corp.google.com/i/chromeos/builders/falco-paladin/builds/17779 https://uberchromegw.corp.google.com/i/chromeos/builders/fizz-paladin/builds/3213 https://uberchromegw.corp.google.com/i/chromeos/builders/kahlee-paladin/builds/1247 https://uberchromegw.corp.google.com/i/chromeos/builders/kip-paladin/builds/4223 Are few of them
,
Jan 30 2018
+deputies
,
Jan 30 2018
Filing at go/gob-quota-ticket
,
Jan 30 2018
akeshet, any idea how to increase the quote? That seems like the right fix.
,
Jan 30 2018
,
Feb 10 2018
Have not had a chance to follow up on above b/ . I don't believe this still needs to be Chase bug. Recommend dropping chase label and downgrading to P2.
,
Feb 12 2018
Saw a few instance of quota being exceeded this morning: InfrastructureFailure: <class 'chromite.lib.cros_build_lib.RunCommandError'>: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-nautilus-private refs/changes/98/566698/2 fatal: remote error: Short te RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-nautilus-private refs/changes/98/566698/2 fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for 3su6n15k.de RunCommandError: return code: 128; command: git fetch -f https://chromium-review.googlesource.com/chromiumos/platform/arc-camera refs/changes/66/906266/1 fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for 3su6n15k.default@developer.gse https://luci-milo.appspot.com/buildbot/chromeos/auron_yuna-paladin/2172 https://luci-milo.appspot.com/buildbot/chromeos/daisy_spring-paladin/17987 https://luci-milo.appspot.com/buildbot/chromeos/parrot-paladin/25715
,
Feb 12 2018
Ok, back to Chase it is.
,
Feb 20 2018
Still awaiting assistance on https://b.corp.google.com/issues/72697187
,
Mar 7 2018
Fixed, see https://b.corp.google.com/issues/72697187 for context. However, the steps described there to add reader permissions to new internal repos will be needed when new repos are added. The command to do so is: gob-ctl acl chrome-internal/NEW/REPO/NAME --reader=3su6n15k.default@developer.gserviceaccount.com
,
Mar 12 2018
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by diand...@chromium.org
, Jan 10 2018