Pre-CQ should stop retrying broken changes |
|||||
Issue descriptionI like to use "invalid" CQ-DEPEND lines to remind myself that I still need to add a CQ-DEPEND target (e.g. when trying to establish a circular dependency, or waiting for an upstream submission to sync). I uploaded a CL like that at https://chromium-review.googlesource.com/754283, and since it's otherwise fine it has already received a +2 (which is nice because that would stick around when I later update the CQ-DEPEND line in the commit message). This caused the Pre-CQ to try it, which failed obviously, but for some reason this didn't cause it to wait until the CL got updated again to retry. It just keeps going again and again every few minutes. IIRC if a normal patch fails the Pre-CQ will wait with the retry until it gets updated again. So I assume this error is somehow handled incorrectly (maybe it treats it as an infra flake that should be retried automatically?). It didn't even stop when I explicitly marked the CL -2 (which intuitively should stop any bot from doing anything with it).
,
Nov 4 2017
Yeah, this should have been marked as failed (with a pre-cq-failed claction) and then it would not have been picked up again by speculative-pre-cq. CR-2 also should have blocked it, but apparently did not.
,
Nov 4 2017
pre-cq logs show it is choking with 15:55:26: INFO: Failed creating transaction for jwerner:754283:9e5b0aa5: CL:754283 has a malformed CQ-DEPEND target: Cannot parse the dependency: TODO(https://review.coreboot.org/22323) This is probably failing in such a way that it doesn't even get far enough to mark as pre-cq-failed.
,
Nov 4 2017
pre-cq sync is calling into validation_pool.py's version of CreateDisjointTrasactions. This seems to be the last point at which we still have a failure handle. We are calling FilterDependencyErrors and then HandleApplyFailure.
My guess -- this is being considered a DependencyError and thus not getting handled...
def CreateDisjointTransactions(self, manifest, changes, max_txn_length=None):
"""Create a list of disjoint transactions from the changes in the pool.
Args:
manifest: Manifest to use.
changes: List of changes to use.
max_txn_length: The maximum length of any given transaction. By default,
do not limit the length of transactions.
Returns:
A list of disjoint transactions. Each transaction can be tried
independently, without involving patches from other transactions.
Each change in the pool will included in exactly one of transactions,
unless the patch does not apply for some reason.
"""
patches = patch_series.PatchSeries(
self.build_root, forced_manifest=manifest)
plans, failed = patches.CreateDisjointTransactions(
changes, max_txn_length=max_txn_length)
failed = self._FilterDependencyErrors(failed)
if failed:
self._HandleApplyFailure(failed)
return plans
,
Nov 4 2017
Actually no, we are calling _HandleApplyFailure but it also isn't doing anything to mark the cl as pre-cq-failed
def _HandleApplyFailure(self, failures):
"""Handles changes that were not able to be applied cleanly.
Args:
failures: List of cros_patch.PatchException instances to handle.
"""
for failure in failures:
logging.info('Change %s did not apply cleanly.', failure.patch)
if self.is_master:
self._HandleCouldNotApply(failure)
def _HandleCouldNotApply(self, failure):
"""Handler for when Paladin fails to apply a change.
This handler notifies set CodeReview-2 to the review forcing the developer
to re-upload a rebased change.
Args:
failure: cros_patch.PatchException instance to operate upon.
"""
msg = ('%(queue)s failed to apply your change in %(build_log)s .'
' %(failure)s')
self.SendNotification(failure.patch, msg, failure=failure)
self.RemoveReady(failure.patch)
,
Nov 4 2017
(I bet this is also responsible for the pre-cq retry loops we get when CLs fail to apply for other reasons). I'll take a crack at it.
,
Nov 4 2017
,
Nov 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/c122454c32f0af15a391377d9419c5f5cc69e7a6 commit c122454c32f0af15a391377d9419c5f5cc69e7a6 Author: Aviv Keshet <akeshet@chromium.org> Date: Mon Nov 06 08:40:08 2017 validation_pool: add plumbing of failures from CreateDisjoint... This is a plumbing-only change, with no change to logic. A follow-up CL will use the set of failed patches returned to pre-cq-launcher to mark these changes are pre-cq-failed. BUG= chromium:781481 TEST=None Change-Id: Ifede89d5fcc4edd65d8656010d862ced9fc7817b Reviewed-on: https://chromium-review.googlesource.com/754347 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: David James <davidjames@chromium.org> [modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/validation_pool.py [modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/validation_pool_unittest.py [modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/stages/sync_stages.py
,
Nov 7 2017
ValidationPool.RemoveReady called GerritHelper.RemoveReady (https://cs.corp.google.com/chromeos_public/chromite/lib/gerrit.py?l=410) to remove the 'Commit-Queue' and 'Trybot-Ready' bits and updated CL Pre-CQ status to "failed" (https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/validation_pool.py?l=1538) every time Pre-CQ-Launcher failed to apply it. But since the CL was marked +2 'Code-Review' bit, Pre-Cq-Launcher kept retrying picking it up in a grace period. I tried a fix to reset the 'Code-Review' bit on these CLs, but our bot didn't seem to have the privilege to reset 'C-R' bit. see crbug.com/663605 .
,
Nov 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/313fa494bfef873e984afd8c3762b05acae4e9db commit 313fa494bfef873e984afd8c3762b05acae4e9db Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Nov 08 13:32:08 2017 pre-cq: mark failed-to-apply changes as pre-cq-failed so they don't loop BUG= chromium:781481 TEST=None Change-Id: I51df023c780c6ed2891784ba4c669c5a8e5fdba4 Reviewed-on: https://chromium-review.googlesource.com/756175 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/313fa494bfef873e984afd8c3762b05acae4e9db/cbuildbot/validation_pool.py [modify] https://crrev.com/313fa494bfef873e984afd8c3762b05acae4e9db/cbuildbot/stages/sync_stages.py
,
Nov 8 2017
,
Jan 17 2018
Issue 663605 has been merged into this issue. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jwer...@chromium.org
, Nov 4 2017