Many PreCQ builds failed with non-unique buildbucket id. |
||||||
Issue descriptionDuring a brief window, many PreCQ builds failed repeatedly with the following SQL error message during BuildStartup. https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19092/steps/BuildStart/logs/stdio Error: Duplicate entry '8990800877361077120' for key 'buildbucket_id_index' Sample CL: https://chromium-review.googlesource.com/#/c/408780/ Sample build: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19092 PreCQ Launcher build: https://uberchromegw.corp.google.com/i/chromeos/builders/pre-cq-launcher/builds/8425 Luci build links generated for these builds appear to point to the wrong build results (perhaps pointing to later builds): https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/compile_only_pre_cq/19074 https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq/builds/19074/
,
Jan 11 2017
There should be a waterfall restart around that time (not sure how to confirm but I saw a bunch builds got killed before the failure). When master comes back from restart, it picks up the builds (canceled by restart) and re-runs with the same configurations (including buildbucket_id). We want each buildbucket_id to be a unique value in CIDB, so we want the build with the old buildbucket_id to fail. This is expected, closing as won't fix.
,
Jan 11 2017
I think we need the PreCQ launcher to detect what's happening and not spam CLs with confusing messages. Maybe by having it notice that the CL already has a valid CIDB entry and not trying to create a new one?
,
Jan 11 2017
Looked into the CL, it did notify the failure was an infra issue and would retry the build automatically. " NOTE: The Pre-Commit Queue will retry your change automatically. The following build(s) failed: mixed-b-pre-cq: cbuildbot failed in https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/compile_only_pre_cq/19074 The build encountered Chrome OS Lab infrastructure issues. Your change will not be blamed for the failure. Commit queue documentation: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview "
,
Jan 11 2017
But it did it 5 times in 7 minutes. And it looked to keep doing it non-stop until I restarted the PreCQ launcher. Worse, the attached links pointed to what looked like a valid in-progress build, so it was all very hard to understand what was wrong.
,
Jan 11 2017
By looking into https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/compile_only_pre_cq?numbuilds=200 there are 14 failures because of the duplicated buildbucket_ids, and it's because there were 14 different builds canceled by the restart. PreCQ-launcher shouldn't affect this as the builds were not retried by PreCQ-launcher (PreCQ-launcher will always assign a new buildbucket_id when it triggers a build).
,
Jan 17 2017
,
Jan 20 2017
It's intentional to fail the builds retried with non-unique buildbucket_id |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dgarr...@chromium.org
, Jan 11 2017