New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 644916 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 843640
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Feature



Sign in to add a comment

CLs shouldn't be marked not-ready in response to failures to launch pre-CQ trybots

Project Member Reported by derat@chromium.org, Sep 7 2016

Issue description

My change at https://chromium-review.googlesource.com/c/379075/ just had its Commit-Queue and Trybot-Ready bits unset with messages like the following:

----

We were not able to launch a mixed-c-pre-cq trybot for your change within 30 minutes.

This problem can happen if the trybot waterfall is very busy, or if there is an infrastructure issue. Please notify the sheriff and mark your change as ready again. If this problem occurs multiple times in a row, please file a bug.

Commit queue documentation: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium-os/commit-queue-overview

----

Why do I need to mark my change ready again in this case? Please keep the ready bits set and try the change automatically when the pre-CQ load is lighter or the infrastructure issues are fixed.
 

Comment 1 by autumn@chromium.org, Sep 13 2016

Labels: -Type-Bug Type-Feature
Owner: akes...@chromium.org
+ Aviv - do you know why we have this behavior? 

Comment 2 by derat@chromium.org, Sep 23 2016

Just saw this again on https://chromium-review.googlesource.com/c/388758/ :

---

We were not able to launch a rambi-pre-cq trybot for your change within 30 minutes.

This problem can happen if the trybot waterfall is very busy, or if there is an infrastructure issue. Please notify the sheriff and mark your change as ready again. If this problem occurs multiple times in a row, please file a bug.
Cc: nxia@chromium.org
Two reasons this exists:

1) CLs to some repositories can actually cause the pre-cq to not launch correctly (in particular, CLs to chromite which can cause the builder to crash before it can report that it started). In which case if we didn't un-set this bit, they would permanently cycle in the pre-cq. We might able to address this by using buildbucket rather than cidb to determine if tryjobs have launched.

2) If there is a capacity or downtime problem, rejecting CLs will help to fix it. If we leave all CLs un-rejected, then if we get accumulate a huge backlog due to downtime, the builer pool might be unable to keep up once it comes back.

Comment 4 by derat@chromium.org, Sep 23 2016

Thanks for the reply!

1) Can those repositories be flagged so that changes to repositories that can't prevent the pre-cq from launching correctly (which is probably almost all of them, right?) don't get dropped?

2) I don't follow this reasoning. Could the pre-cq rate-limit introducing the already-+1-ed changes back into the queue rather than making developers manually mark them again? That seems like it'd have the same effect while cutting down on dev drudgery and preventing changes from still sitting around in the non-+1-ed state if/when the backlog is fixed. X% of devs going in and setting the bit again should be essentially the same as automatically retrying X% of the changes (and then the rest later).

I'll also note that in the changes I linked above, there doesn't seem to have been a backlog:

https://chromium-review.googlesource.com/c/379075/
16:25: Trybot-Ready and Commit-Queue automatically unset (I don't know why gerrit attributes this to me; I didn't do it)
16:28: I manually set the bits again
16:31: pre-cq picks up the change

https://chromium-review.googlesource.com/c/388758/
12:26: Trybot-Ready and Commit-Queue unset
12:33: I manually set the bits again
12:35: pre-cq picks up the change
Status: WontFix (was: Untriaged)
These sound like nice to haves, but aren't so simple to implement and don't rise to the level where they preempt other important work.

Comment 6 by derat@chromium.org, May 17 2018

Cc: shapiroc@chromium.org
I still see this periodically, e.g. on https://crrev.com/c/1063012 just now:

"We were not able to launch a cyan-no-vmtest-pre-cq trybot for your change within 90 minutes.

This problem can happen if the trybot waterfall is very busy, or if there is an infrastructure issue. Please notify the sheriff and mark your change as ready again. If this problem occurs multiple times in a row, please file a bug."

Since there seem to be ongoing efforts lately to cut down on the need to re-+1 changes (e.g. the CL Exonerator Bot), could the WontFix here be reconsidered?
Cc: -shapiroc@chromium.org jclinton@chromium.org
We're removing the 90 minute timeout in  crbug.com/843640 .
(So this won't be happening at all very soon now.)

Comment 10 by derat@chromium.org, May 17 2018

Mergedinto: 843640
Status: Duplicate (was: WontFix)
Thanks!

Sign in to add a comment