New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 639393 link

Starred by 2 users

Issue metadata

Status: Duplicate
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Gerrit CQ experiences BatchRefUpdate failures, which leave issue in a broken state.

Project Member Reported by iannucci@chromium.org, Aug 19 2016

Issue description

Example: https://chromium-review.googlesource.com/c/373164/

Error submitting change: 
BatchRefUpdate failed: TransitionalBatchRefUpdate[
  CREATE: 0000000000000000000000000000000000000000 bd229aac3baee3dab6299ce17a17c8a6a85fa528 refs/changes/64/373164/2  (REJECTED_OTHER_REASON: transaction aborted)
  UPDATE: ad686d6aa094591806045160d4243623da8cf5a7 bd229aac3baee3dab6299ce17a17c8a6a85fa528 refs/heads/master  (LOCK_FAILURE)
]

Please, unvote/vote on Commit Queue label to re-trigger on the same patchset.
Bot data: {"action": "cancel", "triggered_at": "2016-08-19T18:29:36.0Z", "revision": "7e70ec454f6a8b86995f0a358fafc1eb1565b56c"}

As far as I can tell, this is a retryable error. 

Additional context:

The tree was closed for ~50 minutes, and I suspect that multiple people had changes queued up. I investigated the closure (flake), and opened the tree. As soon as that happened 6 new changes were immediately committed by the CQ (I assume that 6 people didn't just run `git cl land` at exactly the same time). At the same time CQ rejected my change with the error above (I suspect that other changes at the same time may have failed for similar reasons, but cannot confirm).

I suspect this may have to do with googlesource's eventual consistency guarantees, and a retry loop with some backoff would have been able to submit this without issue.
 
Owner: tandrii@chromium.org
Status: Started (was: Untriaged)
Yeah, your analysis is correct. I'll dig up CQ logs tomorrow and fix it.

These Gerrit's consistency bugs are getting really annoying though - i've been investigating them for almost 2 weeks now.
Issue 639425 has been merged into this issue.
Labels: Proj-Gerrit-Migration
Owner: ----
Status: ExternalDependency (was: Started)
OK, I've pulled the logs. The problem is that Gerrit REST api returned 409 which  also means that change can't be landed (say because of missing LGTM).

I've filed to an internal b/31058068 for a different code to be returned, so that CQ can distinguish transitive failures from permanent ones.
Components: Infra>Codereview>Gerrit
Owner: tandrii@chromium.org
Labels: -Pri-3 Milestone-Fishfood Pri-2
Summary: Gerrit CQ experiences BatchRefUpdate failures, which leave issue in a broken state. (was: CQ(gerrit) should retry on BatchRefUpdate failures)
b/31058068 is resolved. Now this error will result in 500. However, this is still not good enough, because in all such cases the issue has actually been committed.
b/31363319 is the new external issue blocking this one. There is nothing to be done on infra side.
Mergedinto: 644980
Status: Duplicate (was: ExternalDependency)

Sign in to add a comment