New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781481 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Pre-CQ should stop retrying broken changes

Project Member Reported by jwer...@chromium.org, Nov 4 2017

Issue description

I like to use "invalid" CQ-DEPEND lines to remind myself that I still need to add a CQ-DEPEND target (e.g. when trying to establish a circular dependency, or waiting for an upstream submission to sync). I uploaded a CL like that at https://chromium-review.googlesource.com/754283, and since it's otherwise fine it has already received a +2 (which is nice because that would stick around when I later update the CQ-DEPEND line in the commit message). This caused the Pre-CQ to try it, which failed obviously, but for some reason this didn't cause it to wait until the CL got updated again to retry. It just keeps going again and again every few minutes.

IIRC if a normal patch fails the Pre-CQ will wait with the retry until it gets updated again. So I assume this error is somehow handled incorrectly (maybe it treats it as an infra flake that should be retried automatically?). It didn't even stop when I explicitly marked the CL -2 (which intuitively should stop any bot from doing anything with it).
 
Cc: akes...@chromium.org dgarr...@chromium.org
Not sure who this bug should go to, +Aviv/Don for triage.
Yeah, this should have been marked as failed (with a pre-cq-failed claction) and then it would not have been picked up again by speculative-pre-cq.

CR-2 also should have blocked it, but apparently did not.
pre-cq logs show it is choking with 

15:55:26: INFO: Failed creating transaction for jwerner:754283:9e5b0aa5: CL:754283 has a malformed CQ-DEPEND target: Cannot parse the dependency: TODO(https://review.coreboot.org/22323)

This is probably failing in such a way that it doesn't even get far enough to mark as pre-cq-failed.
pre-cq sync is calling into validation_pool.py's version of CreateDisjointTrasactions. This seems to be the last point at which we still have a failure handle. We are calling FilterDependencyErrors and then HandleApplyFailure.

My guess -- this is being considered a DependencyError and thus not getting handled...

  def CreateDisjointTransactions(self, manifest, changes, max_txn_length=None):                   
    """Create a list of disjoint transactions from the changes in the pool.
                                                                                                  
    Args:
      manifest: Manifest to use.                                                                  
      changes: List of changes to use.
      max_txn_length: The maximum length of any given transaction.  By default,                   
        do not limit the length of transactions.                                                  
        
    Returns:
      A list of disjoint transactions. Each transaction can be tried                              
      independently, without involving patches from other transactions.                           
      Each change in the pool will included in exactly one of transactions,                       
      unless the patch does not apply for some reason.                                            
    """   
    patches = patch_series.PatchSeries(                                                           
        self.build_root, forced_manifest=manifest)                                                
    plans, failed = patches.CreateDisjointTransactions(
        changes, max_txn_length=max_txn_length)
    failed = self._FilterDependencyErrors(failed)
    if failed:
      self._HandleApplyFailure(failed)                                                            
    return plans
Actually no, we are calling _HandleApplyFailure but it also isn't doing anything to mark the cl as pre-cq-failed

  def _HandleApplyFailure(self, failures):
    """Handles changes that were not able to be applied cleanly.                                  
    
    Args:                                                                                         
      failures: List of cros_patch.PatchException instances to handle.
    """
    for failure in failures:                                                                      
      logging.info('Change %s did not apply cleanly.', failure.patch)                             
      if self.is_master:
        self._HandleCouldNotApply(failure)                                                        

  def _HandleCouldNotApply(self, failure):
    """Handler for when Paladin fails to apply a change.                                          

    This handler notifies set CodeReview-2 to the review forcing the developer                    
    to re-upload a rebased change.

    Args:
      failure: cros_patch.PatchException instance to operate upon.                                
    """
    msg = ('%(queue)s failed to apply your change in %(build_log)s .'                             
           ' %(failure)s')
    self.SendNotification(failure.patch, msg, failure=failure)                                    
    self.RemoveReady(failure.patch)
Labels: -Pri-3 Pri-2
Owner: akes...@chromium.org
(I bet this is also responsible for the pre-cq retry loops we get when CLs fail to apply for other reasons).

I'll take a crack at it.
Status: Started (was: Untriaged)
Project Member

Comment 8 by bugdroid1@chromium.org, Nov 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c122454c32f0af15a391377d9419c5f5cc69e7a6

commit c122454c32f0af15a391377d9419c5f5cc69e7a6
Author: Aviv Keshet <akeshet@chromium.org>
Date: Mon Nov 06 08:40:08 2017

validation_pool: add plumbing of failures from CreateDisjoint...

This is a plumbing-only change, with no change to logic.

A follow-up CL will use the set of failed patches returned to
pre-cq-launcher to mark these changes are pre-cq-failed.

BUG= chromium:781481 
TEST=None

Change-Id: Ifede89d5fcc4edd65d8656010d862ced9fc7817b
Reviewed-on: https://chromium-review.googlesource.com/754347
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>

[modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/validation_pool.py
[modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/validation_pool_unittest.py
[modify] https://crrev.com/c122454c32f0af15a391377d9419c5f5cc69e7a6/cbuildbot/stages/sync_stages.py

Comment 9 by nxia@chromium.org, Nov 7 2017

ValidationPool.RemoveReady called GerritHelper.RemoveReady (https://cs.corp.google.com/chromeos_public/chromite/lib/gerrit.py?l=410)  to remove the 'Commit-Queue' and 'Trybot-Ready' bits and updated CL Pre-CQ status to "failed" (https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/validation_pool.py?l=1538) every time Pre-CQ-Launcher failed to apply it.

But since the CL was marked +2 'Code-Review' bit, Pre-Cq-Launcher kept retrying picking it up in a grace period. 

I tried a fix to reset the 'Code-Review' bit on these CLs, but our bot didn't seem to have the privilege to reset 'C-R' bit. see  crbug.com/663605 .


Project Member

Comment 10 by bugdroid1@chromium.org, Nov 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/313fa494bfef873e984afd8c3762b05acae4e9db

commit 313fa494bfef873e984afd8c3762b05acae4e9db
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Nov 08 13:32:08 2017

pre-cq: mark failed-to-apply changes as pre-cq-failed so they don't loop

BUG= chromium:781481 
TEST=None

Change-Id: I51df023c780c6ed2891784ba4c669c5a8e5fdba4
Reviewed-on: https://chromium-review.googlesource.com/756175
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/313fa494bfef873e984afd8c3762b05acae4e9db/cbuildbot/validation_pool.py
[modify] https://crrev.com/313fa494bfef873e984afd8c3762b05acae4e9db/cbuildbot/stages/sync_stages.py

Status: Fixed (was: Started)

Comment 12 by nxia@chromium.org, Jan 17 2018

Cc: pho...@chromium.org nxia@chromium.org shuqianz@chromium.org vapier@chromium.org
 Issue 663605  has been merged into this issue.

Sign in to add a comment