Pre-CQ flake: The InitSDK stage failed: (15, 'Received signal 15; shutting down') |
|||||
Issue descriptionI keep getting hit by this pre-cq flake that unmarks all my CLs. This is a tracking bug to justify that this happens often enough to be worth looking at. https://chromium-review.googlesource.com/c/549087/ https://chromium-review.googlesource.com/c/549088/
,
Jun 30 2017
,
Jun 30 2017
Wasn't there a waterfall restart? That kills all in-progress builds, and the CLs get blamed.
,
Jun 30 2017
Looking a little more closely, I'm confused. Those two CLs show the build failures BEFORE any comment showing they were picked up by the PreCQ. And... the first CL was marked PreCQ ready when the builds started, but the second one wasn't. So... why was the second CL included in the PreCQ run? Further, the second CL was rebased very shortly before the PreCQ builder started. Ningning added logic to kill PreCQ builds if the CLs are rebased, maybe the builds took a while to start and that was triggered. But the timeline seems a little off.
,
Jul 6 2017
,
Jul 6 2017
the reason is the pre-cqs for the old patch-set were cancelled when a new patch-set was uploaded, but the failure messages of the aborted pre-cqs weren't handled properly by the validation_pool.
,
Jul 7 2017
Thanks. Somehow my workflow is really great at surfacing weird bugs.
,
Jul 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/29b0f229d4c0615eff960af0821010048bfed2fc commit 29b0f229d4c0615eff960af0821010048bfed2fc Author: Ningning Xia <nxia@chromium.org> Date: Fri Jul 14 02:46:01 2017 Record trybot_cancelled action with build_id of pre-cq build. Pre-cqs with stale patch_number may get cancelled by pre-cq-launcher. Instead of recording the trybot_cancelled action with the build_id of pre-cq-launcher, use the build_id of the cancelled pre-cq. A follow-up CL is to change the pre-cq failure triaging logic: when a pre-cq reaches the CompletionStage and trys to triage the failures, if it finds a trybot_cancelled action assicated with its build_id, it will consider the failure as infra_only failure and will not blame on CLs being tested. BUG= chromium:738179 TEST=unit_tests Change-Id: I8920687d5c6033dd386d62da3f78124e9971cd29 Reviewed-on: https://chromium-review.googlesource.com/562655 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/29b0f229d4c0615eff960af0821010048bfed2fc/cbuildbot/stages/sync_stages_unittest.py [modify] https://crrev.com/29b0f229d4c0615eff960af0821010048bfed2fc/cbuildbot/stages/sync_stages.py
,
Jul 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/782342ed074772d7d06d02249a667ce14f1db62b commit 782342ed074772d7d06d02249a667ce14f1db62b Author: Ningning Xia <nxia@chromium.org> Date: Fri Jul 14 02:46:01 2017 Do not add CLs to suspect candidate if the Pre-CQ was cancelled. Pre-CQ can be cancelled because its patch number is stale, in which case the Pre-CQ shouldn't blame on the CLs it's testing. BUG= chromium:738179 TEST=unit_tests Change-Id: I80fb189155521addf238c7a96a770ae79620b1b9 Reviewed-on: https://chromium-review.googlesource.com/563488 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/782342ed074772d7d06d02249a667ce14f1db62b/cbuildbot/validation_pool.py [modify] https://crrev.com/782342ed074772d7d06d02249a667ce14f1db62b/lib/clactions_unittest.py [modify] https://crrev.com/782342ed074772d7d06d02249a667ce14f1db62b/cbuildbot/validation_pool_unittest.py [modify] https://crrev.com/782342ed074772d7d06d02249a667ce14f1db62b/lib/clactions.py
,
Aug 18 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jrbarnette@chromium.org
, Jun 30 2017Status: Assigned (was: Untriaged)