New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 679974 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Many PreCQ builds failed with non-unique buildbucket id.

Project Member Reported by dgarr...@chromium.org, Jan 11 2017

Issue description

During a brief window, many PreCQ builds failed repeatedly with the following SQL error message during BuildStartup.

  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19092/steps/BuildStart/logs/stdio

Error:
  Duplicate entry '8990800877361077120' for key 'buildbucket_id_index'

Sample CL:
  https://chromium-review.googlesource.com/#/c/408780/

Sample build:
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19092

PreCQ Launcher build:
  https://uberchromegw.corp.google.com/i/chromeos/builders/pre-cq-launcher/builds/8425


Luci build links generated for these builds appear to point to the wrong build results (perhaps pointing to later builds):
  https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/compile_only_pre_cq/19074
  https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19074/

 
I killed the PreCQ Launcher, and the problem went away.

I'm not sure if I actually fixed anything with the launcher restart, or if the issue was transient and the timing coincidental. Affected CLs appear to be testing normally now.

Comment 2 by nxia@chromium.org, Jan 11 2017

Status: WontFix (was: Untriaged)
There should be a waterfall restart around that time (not sure how to confirm but I saw a bunch builds got killed before the failure). When master comes back from restart, it picks up the builds (canceled by restart) and re-runs with the same configurations (including buildbucket_id). We want each buildbucket_id to be a unique value in CIDB, so we want the build with the old buildbucket_id to fail. This is expected, closing as won't fix.
Status: Available (was: WontFix)
I think we need the PreCQ launcher to detect what's happening and not spam CLs with confusing messages.

Maybe by having it notice that the CL already has a valid CIDB entry and not trying to create a new one?

Comment 4 by nxia@chromium.org, Jan 11 2017

Looked into the CL, it did notify the failure was an infra issue and would retry the build automatically. 


"
NOTE: The Pre-Commit Queue will retry your change automatically.
The following build(s) failed:
mixed-b-pre-cq: cbuildbot failed in https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/compile_only_pre_cq/19074
The build encountered Chrome OS Lab infrastructure issues.  Your change will not be blamed for the failure.
Commit queue documentation: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview
"
But it did it 5 times in 7 minutes. And it looked to keep doing it non-stop until I restarted the PreCQ launcher.

Worse, the attached links pointed to what looked like a valid in-progress build, so it was all very hard to understand what was wrong.

Comment 6 by nxia@chromium.org, Jan 11 2017

Cc: akes...@chromium.org
By looking into https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq?numbuilds=200

there are 14 failures because of the duplicated buildbucket_ids, and it's because there were 14 different builds canceled by the restart. PreCQ-launcher shouldn't affect this as the builds were not retried by PreCQ-launcher (PreCQ-launcher will always assign a new buildbucket_id when it triggers a build). 

Comment 7 by autumn@chromium.org, Jan 17 2017

Labels: -current-issue

Comment 8 by nxia@chromium.org, Jan 20 2017

Status: WontFix (was: Available)
It's intentional to fail the builds retried with non-unique buildbucket_id

Sign in to add a comment