New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 705023 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on: View detail
issue 706896
issue 706930



Sign in to add a comment

Innocent CLs got rejected because their dependent CLs were randomly filtered out by throttled CQ

Project Member Reported by nxia@chromium.org, Mar 24 2017

Issue description

master-paladin/14059 didn't pick up #399 and #437, but #398 depends on #399 and #436 depends on #437, so they all got rejected. Needs more investigation.


https://chromium-review.googlesource.com/c/457437

https://chromium-review.googlesource.com/c/457399
 

Comment 1 by nxia@chromium.org, Mar 25 2017

>>>> If I were you, I'd just chalk that up to a snafu induced by the lab
>>>> outage last night, and subsequently heavy CL load. It might have prevented
>>>> all your CLs from being picked up together, even though CQ-DEPENDs should
>>>> have required it. I'd bet that if you manage to resubmit today, it'll go
>>>> through.
>>>>
>>>> There could arguably be a CQ bug in there somewhere, since it really
>>>> shouldn't try half the patchset like that.

Brian could be right here. As the tree is still in throttle, CQ picks a random subset of candidate changes and tests. We can 1) add logging for the changes that are thrown away by the filter. 2) fix FilterChangesForThrottledTree to excude changes and their dependencies at the same time.
 

https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/validation_pool.py?type=cs&q=FilterChangesForThrottledTree&sq=package:%5Echromeos_(internal%7Cpublic)$&l=714

Comment 2 by nxia@chromium.org, Mar 28 2017

context from Patrick <pberny@chromium.org>

"
For what it's worth, I went through a similar exercise/experience not too long ago. I wanted to submit changes to about 6 or 7 repos at once and was not  able to get them through with the same " was not eligible (wrong manifest branch, wrong labels, or otherwise filtered from eligible set)." message for a long time (close to 2 weeks). Luckily there was no absolute requirement for all to go in at once (just duplicate USE flag definitions if breaking it up), so eventually I removed the circular dependency for things to go through, but I don't know what I would have done if it had to be there without the ability to loosen the depencency. For those interested in digging, https://chromium-review.googlesource.com/#/c/450892/ .
"
Project Member

Comment 3 by bugdroid1@chromium.org, Mar 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c77918c64397afaf7ca0024359a1ef74b4625c3a

commit c77918c64397afaf7ca0024359a1ef74b4625c3a
Author: Ningning Xia <nxia@chromium.org>
Date: Wed Mar 29 00:16:58 2017

Add logging for changes not picked up by validation_pool when tree is throttled

BUG= chromium:705023 
TEST=run_tests

Change-Id: Ic2c8d96821632967540af6490459dc919897f177
Reviewed-on: https://chromium-review.googlesource.com/458969
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>

[modify] https://crrev.com/c77918c64397afaf7ca0024359a1ef74b4625c3a/cbuildbot/validation_pool.py

Comment 4 by nxia@chromium.org, Mar 29 2017

One example from Patrick



https://chromium-review.googlesource.com/c/450892/

March 17 7:16 pm


ChromeOS Commit Bot
Patch Set 6:

The Pre-Commit Queue failed to apply your change in https://luci-milo.appspot.com/buildbot/chromeos/pre-cq-launcher/8825 . CL:450892 depends on CL:*333889, which was not eligible (wrong manifest branch, wrong labels, or otherwise filtered from eligible set).

Commit queue documentation: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview

Comment 5 by nxia@chromium.org, Mar 29 2017

Following #4:

One CQ kicked out by filtering out CLs not passed Pre-CQs


https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F13986%2F%2B%2Frecipes%2Fsteps%2FCommitQueueSync%2F0%2Fstdout


20:24:17: INFO: Raw changes: CL:*332267 CL:*333888 CL:*333889 CL:*333910 CL:*333964 CL:*333987 CL:*337992 CL:*338404 CL:442835 CL:443407 CL:444250 CL:448112 CL:448396 CL:450872 CL:450892 CL:455281 CL:455369 CL:455370 CL:455371 CL:456442 CL:456482 CL:456483 CL:456789 CL:456861 CL:456900 CL:456923 CL:456979 CL:457120
20:24:17: INFO: Filtered changes: CL:*333964 CL:*337992 CL:*338404 CL:443407 CL:444250 CL:448112 CL:448396 CL:450872 CL:450892 CL:455281 CL:455369 CL:455370 CL:455371 CL:456442 CL:456482 CL:456483 CL:456789 CL:456861 CL:456900 CL:456923 CL:456979 CL:457120

Comment 6 by nxia@chromium.org, Mar 30 2017

Follow up the CL:*333889

https://chrome-internal-review.googlesource.com/c/333889

At 7:14:15 pm March 17, CL:*333889 was reset.

"The pre-cq verification for this change expired after 10080 minutes. No action is required on your part.

In order to protect the CQ from picking up stale changes, the pre-cq status for changes are cleared after a generous timeout. This change will be re-tested by the pre-cq before the CQ picks it up."

A CLAction (pre_cq_reset) was inserted 

| 11159456 |  1390931 |        333889 |            2 | internal      | pre_cq_reset              | NULL                          | 2017-03-18 02:14:15 | NULL  


progress_map was generated before the CL was screened again. 
http://shortn/_NxZaFGGWnA


Then this CL was screened and corresponding CLActions were inserted
http://shortn/_Lb1oxPZA1P

| 11159461 |  1390931 |        333889 |            2 | internal      | validation_pending_pre_cq | binhost-pre-cq                | 2017-03-18 02:15:37 | NULL                |
| 11159462 |  1390931 |        333889 |            2 | internal      | validation_pending_pre_cq | sand-pre-cq                   | 2017-03-18 02:15:37 | NULL                |
| 11159463 |  1390931 |        333889 |            2 | internal      | screened_for_pre_cq       | NULL                          | 2017-03-18 02:15:38 | NULL  

Which means this CL wasn't in the progress_map when it's generated. So when the PreCQLauncher tries to GetDisjointTransactionsToTest, CL:*333889 wasn't in the launchable_progress_map, so the error was thrown by the Pre-CQ-Launcher.

19:15:47: INFO: Failed creating transaction for pberny:*333888:*4aaf33c2: CL:*333888 depends on CL:450892, which depends on CL:*333889, which was not eligible (wrong manifest branch, wrong labels, or otherwise filtered from eligible set).
 

Comment 7 Deleted

Comment 8 by nxia@chromium.org, Mar 30 2017

This bug was originally filed because CLs got filtered out when Tree was throttled. Comment #2 was caused by two different issues, tracked them separately at crbug.com/706896 and crbug.com/706930
Blockedon: 706930 706896

Comment 10 by nxia@chromium.org, Apr 24 2017

Summary: Innocent CLs got rejected because their dependent CLs were randomly filtered out by throttled CQ (was: Innocent CLs got rejected by CQ when tree was throttled.)

Comment 11 by nxia@chromium.org, Apr 24 2017

An example with logs:
https://chromium-review.googlesource.com/c/483925/

12:14:19: INFO: Raw changes:
CL:*338304 CL:*338343 CL:*348204 CL:*354603 CL:*355043 CL:*355107 CL:*355569 CL:*356024 CL:*356206 CL:*356743 CL:*356825 CL:*357069 CL:*357404 CL:*357804 CL:*358331 CL:*359152 CL:*359528 CL:414532 CL:437727 CL:447869 CL:453260 CL:456931 CL:456958 CL:458236 CL:459836 CL:463826 CL:468108 CL:469286 CL:469611 CL:470088 CL:470846 CL:471348 CL:471876 CL:472628 CL:474051 CL:474884 CL:475313 CL:476110 CL:476116 CL:476132 CL:477437 CL:477471 CL:477613 CL:478311 CL:478553 CL:478775 CL:479032 CL:479037 CL:479180 CL:479466 CL:479468 CL:479493 CL:479546 CL:479582 CL:479592 CL:479693 CL:479696 CL:479703 CL:479772 CL:479785 CL:479953 CL:479975 CL:480819 CL:480906 CL:480934 CL:480968 CL:480985 CL:480987 CL:480996 CL:480997 CL:481022 CL:481032 CL:481074 CL:481179 CL:481222 CL:481439 CL:481462 CL:481565 CL:481619 CL:481961 CL:481983 CL:482185 CL:482388 CL:482408 CL:482411 CL:482459 CL:482465 CL:482466 CL:482523 CL:482599 CL:482700 CL:482993 CL:483281 CL:483400 CL:483560 CL:483659 CL:483661 CL:483819 CL:483842 CL:483861 CL:483921 CL:483922 CL:483923 CL:483924 CL:483925 CL:483927 CL:483928 CL:483929 CL:483931 CL:483968 CL:484121 CL:484122 CL:484123 CL:484203 CL:484359 CL:484659
12:14:19: INFO: Filtered changes:
CL:*338304 CL:*338343 CL:*348204 CL:*354603 CL:*355043 CL:*355107 CL:*355569 CL:*356024 CL:*356206 CL:*356743 CL:*356825 CL:*357069 CL:*357404 CL:*357804 CL:*358331 CL:*359152 CL:414532 CL:437727 CL:453260 CL:456931 CL:456958 CL:458236 CL:459836 CL:463826 CL:468108 CL:469286 CL:469611 CL:470088 CL:470846 CL:471348 CL:471876 CL:472628 CL:474051 CL:474884 CL:476110 CL:476116 CL:476132 CL:477437 CL:477471 CL:477613 CL:478311 CL:478553 CL:478775 CL:479032 CL:479037 CL:479180 CL:479466 CL:479468 CL:479493 CL:479546 CL:479582 CL:479592 CL:479693 CL:479696 CL:479703 CL:479772 CL:479785 CL:479953 CL:479975 CL:480819 CL:480906 CL:480934 CL:480968 CL:480985 CL:480987 CL:480996 CL:480997 CL:481022 CL:481032 CL:481074 CL:481179 CL:481222 CL:481439 CL:481462 CL:481565 CL:481619 CL:481961 CL:481983 CL:482185 CL:482388 CL:482408 CL:482411 CL:482459 CL:482465 CL:482466 CL:482599 CL:482700 CL:482993 CL:483281 CL:483400 CL:483560 CL:483659 CL:483661 CL:483819 CL:483861 CL:483921 CL:483922 CL:483923 CL:483924 CL:483925 CL:483927 CL:483928 CL:483929 CL:483931 CL:484203 CL:484359


12:15:17: INFO: As the tree is throttled, it only picks a random subset of candidate changes. Changes not picked up in this run: CL:*338304 CL:*338343 CL:*348204 CL:*355043 CL:*355107 CL:*356024 CL:*356206 CL:*356743 CL:*357069 CL:*357804 CL:*358331 CL:*359152 CL:414532 CL:453260 CL:456931 CL:456958 CL:458236 CL:459836 CL:468108 CL:469286 CL:469611 CL:470088 CL:470846 CL:471348 CL:471876 CL:472628 CL:474051 CL:474884 CL:476110 CL:476116 CL:476132 CL:477437 CL:477471 CL:478311 CL:478775 CL:479032 CL:479037 CL:479180 CL:479466 CL:479468 CL:479493 CL:479546 CL:479582 CL:479693 CL:479696 CL:479703 CL:479772 CL:479785 CL:479953 CL:479975 CL:480819 CL:480934 CL:480968 CL:480985 CL:480987 CL:480996 CL:481022 CL:481074 CL:481179 CL:481222 CL:481439 CL:481462 CL:481619 CL:481961 CL:481983 CL:482185 CL:482388 CL:482408 CL:482411 CL:482459 CL:482465 CL:482466 CL:482700 CL:483281 CL:483560 CL:483659 CL:483819 CL:483861 CL:483921 CL:483922 CL:483923 CL:483927 CL:483928 CL:483929 CL:483931 CL:484203 


Comment 12 by nxia@chromium.org, Apr 24 2017

Cc: nxia@chromium.org
 Issue 713508  has been merged into this issue.

Comment 13 by nxia@chromium.org, Apr 24 2017

Labels: -Pri-2 Pri-1
I thought we turned off filtering when throttling, which would be the easiest fix.

Comment 15 by nxia@chromium.org, Apr 25 2017

When tree is throttled, we'll have CLs in backlog and the number can goes up quickly, which can make the system more unstable. I think we still need filtering in throttling.
Project Member

Comment 16 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5

commit 4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5
Author: Ningning Xia <nxia@chromium.org>
Date: Wed May 03 03:24:22 2017

Do not reject CLs if their dependent CLs were filtered out by throttling

When the tree is throttled, CQ SyncStage will filter out some changes to
keep the change pool small. In this case, CLs shouldn't be rejected
because their dependent CLs are filtered out by throttled tree.

BUG= chromium:705023 
TEST=unit_tests

Change-Id: Ia009ef904091118cfc57e6bb14e5fdc7b31db572
Reviewed-on: https://chromium-review.googlesource.com/489083
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5/cbuildbot/validation_pool.py
[modify] https://crrev.com/4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5/lib/patch_unittest.py
[modify] https://crrev.com/4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5/cbuildbot/validation_pool_unittest.py
[modify] https://crrev.com/4e1d7ea995f8bd8c5d7bb0c8b31f16e2dfec63b5/lib/patch.py

Comment 17 by nxia@chromium.org, May 10 2017

Status: Fixed (was: Untriaged)

Sign in to add a comment