New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Increase the builder's GoB quotas to better accomodate current load

Project Member Reported by diand...@chromium.org, Jan 10

Issue description

Basically the same thing happened as was described in bug #769088.  This paladin:

https://luci-milo.appspot.com/buildbot/chromeos/fizz-paladin/3054

...failed in CommitQueueSync.

In the text of the error you see much of:

---

09:52:42: WARNING: git reported transient error (cmd=fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-fizz-private refs/changes/39/540139/2); retrying
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 177, in _Wrapper
    ret = func(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 243, in _run
    return functor(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand
    raise RunCommandError(msg, cmd_result)
RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-fizz-private refs/changes/39/540139/2
fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for <redacted>
fatal: The remote end hung up unexpectedly
[W git.go:283] Transient error string identified in STDERR: "fatal: The remote end hung up unexpectedly\n"
[W git.go:294] Retrying after 3s (rc=128): transient error string encountered

---

The previous bug was closed as WontFix since the problem didn't reproduce.


 
...actually at one other paladin this too.  I'll see if I find any more as well...

  https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-paladin/18175


Cc: dgarr...@chromium.org
Status: Assigned (was: Untriaged)
Looking at the logs on the master paladin, it seems that the master
first tried fizz-paladin/3054, observed the failure, and restarted/retried
with fizz-paladin/3055.  That second run passed.

The theory from dgarrett@ is that since adding new GCE builders our some
months ago, our usage has increase enough that we're brushing up against
our GoB quota limits.  In this particular case "brushing up against" became
"outright exceeded".  Since the quota we hit was short term, retry works in
that case.

Assuming the theory is correct, the correct response will be to request more
quota.

Labels: Chase-Pending
Owner: ----
Status: Available (was: Assigned)
Summary: Increase the builder's GoB quotas to better accomodate current load (was: Short term ls-remote-gerrit rate limit exceeded)
This didn't get done last week.  We need to find time to move it forward.

Owner: dgarr...@chromium.org
Status: Assigned (was: Available)
-> don to annotate this bug with the metric that shows the problem, then rediscuss in next meeting
This graph shows our usage over the last year.


https://viceroy.corp.google.com/chromeos/gerrit?duration=38707218
Owner: akes...@chromium.org
I had a thought.

We currently have wrappers around git commands that look for a variety of errors. On builders, there is another wrapper that also does retries.

Are we using excessive quota in some cases because of an excessive number of retries?  <outer retries> * <inner retries> ?
Labels: -Chase-Pending Chase
will ask to increase quota.
I see several paladins failing with the same error now.
Cc: jkop@chromium.org shuqianz@chromium.org
+deputies
Filing at go/gob-quota-ticket
akeshet, any idea how to increase the quote?  That seems like the right fix.

Have not had a chance to follow up on above b/ .

I don't believe this still needs to be Chase bug. Recommend dropping chase label and downgrading to P2.
Saw a few instance of quota being exceeded this morning:

InfrastructureFailure: <class 'chromite.lib.cros_build_lib.RunCommandError'>: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-nautilus-private refs/changes/98/566698/2 fatal: remote error: Short te
RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/overlays/overlay-nautilus-private refs/changes/98/566698/2 fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for 3su6n15k.de
RunCommandError: return code: 128; command: git fetch -f https://chromium-review.googlesource.com/chromiumos/platform/arc-camera refs/changes/66/906266/1 fatal: remote error: Short term ls-remote-gerrit rate limit exceeded for 3su6n15k.default@developer.gse



https://luci-milo.appspot.com/buildbot/chromeos/auron_yuna-paladin/2172
https://luci-milo.appspot.com/buildbot/chromeos/daisy_spring-paladin/17987
https://luci-milo.appspot.com/buildbot/chromeos/parrot-paladin/25715
Ok, back to Chase it is.
Still awaiting assistance on https://b.corp.google.com/issues/72697187
Fixed, see https://b.corp.google.com/issues/72697187 for context. However, the steps described there to add reader permissions to new internal repos will be needed when new repos are added.

The command to do so is:
gob-ctl acl chrome-internal/NEW/REPO/NAME --reader=3su6n15k.default@developer.gserviceaccount.com
Status: Fixed (was: Assigned)

Sign in to add a comment