CL changes during a CQ run should not fail the CQ run.
Reported by
jrbarnette@chromium.org,
Jan 10 2018
|
|||||
Issue description
If a CL changes after the CQ picks it up for testing, and before
the CL is finally merged, the CommitQueueHandleChanges stage will
detect this, and refuse to submit the CL. That produces an error
message like this:
Could not submit ejcaruso:857590:899e18dd, error: CL:857590 was modified while the CQ was in the middle of testing it. Patch set 2 was uploaded.
This will turn the stage red, which in turn makes the CQ run be
red. However, the event is _not_ a CQ failure, and if the CQ run
was otherwise successful, any unchanged CLs will be committed.
Since all red CQ runs normally must be annotated, the fact that
this event turns the stage red is a nuisance, and source of
confusion to the uninitiated (which means, almost any dev stuck
with Sheriff duty).
It would be helpful if the CQ/system behavior were different, e.g.
* Don't turn the CQ red for the event (a potential problem, since
these events aren't entirely harmless, since they block us from
creating prebuilts).
* Change go/som so that these events don't show up there.
* Some other change so that either a) sheriffs intuitively know
what to do, or b) sheriffs never see this, and so don't _need_
to know.
,
Jan 10 2018
> What's the proposed behaviour? I'm not married to any specific behavior. The high-level requirement is as stated: Either a) make it obvious for sheriffs what they should do, or b) hide the event from sheriffs well enough that they don't ordinarily encounter it.
,
Jan 10 2018
This keeps coming up from time to time, and I keep switching sides :D There is a reason the CQ is red in this case -- since what the CQ tested (all the CLs in the run) -- is not what was actually submitted, the CQ will not publish prebuilts (it'll update the prebuilts, but then not update the binhost, so developers will be using stale binhosts). This is not an exceptional condition in itself, but if you get this behavior for too long, developers will be left with really stale binhosts. This situation now happens more often due to self abort of the CQ, and it is important to distinguish it from a green run, where the prebuilts are published.
,
Jan 10 2018
,
Jan 10 2018
The thoughts in c#3, kinda resonate with me. If everything was submitted then I think we're all in agreement as it being green (pass). Similarly, if nothing gets submitted, then it's clear to be a red (fail) build. When we have something in between, should it be: - red* - green* - yellow - something else I personally think it should be red*. The build wasn't totally successful, but there's some annotation/notes to indicate that there was at least a partial success and the sheriffs investigations should be directed and/or limited. The reason I vote for red* is the reason I gave in c#1, is because if it keeps happening, I'd like someone to be aware of that fact, even if they choose to ignore it. And by someone, I mean sheriffs, since it's their explicit job to be on top of such things.
,
Jan 10 2018
> And by someone, I mean sheriffs
In this particular case, the sheriffs are the wrong people: It's not
realistic to require them to have the expertise to know what to do
with "prebuilts haven't been published". That's the responsibility of
Cros Infra, who _are_ required to understand, because we have to maintain
the code that makes prebuilts happen.
Additionally, requiring annotation in this specific case is a fool's errand:
the system knows and reports exactly what went wrong, which is that specific
CLs weren't committed. The report from the system is accurate enough that
we shouldn't require any human being to go and enter the very same
information in the annotation tool.
There's a case to be made that the absence of prebuilts is a problem that
should get surfaced and, when necessary, dealt with. *This bug is not for
that discussion*. This bug is about solving a more limited problem:
* Don't confuse the sheriffs with stuff they don't need to know or
understand.
* Don't ask anyone to engage in make-work annotation.
,
Jan 10 2018
So, actions (2 separate bugs?): - Don't emit s-o-m tickets for runs that only failed due to change-was-modified - Have master-paladin insert the cidb annotation on its own for change-was-modified sparing the need for deputy to do so
,
Jan 16 2018
> So, actions (2 separate bugs?): Potentially that's two separate bugs. However, it's not clear that we should do both; depending on our specific choices, one fix might obviate the other. So, my preference would be to hold this open until someone takes it with a commitment to doing _something_. Then, once we see what we did, we can decide whether to file feature requests for other work, or just close this as "now it's good enough."
,
Aug 15
It should be safe to let the CQ master pass, even if some CLs were updated during the CQ run, and thus can't be submitted. I've pushed back against that in the past because of binary prebuilt handling. However, I was taught a bit more about prebuilts and have changed my opinion.
,
Aug 16
I just had a conversation with dgarrett@, and there's some
enhancements to make regarding his statement:
* As implemented, failing the CQ causes pre-builts not
to be uploaded. Apparently, failing the CQ is the only
blessed way to stop the prebuilts from being uploaded.
* Once upon a time, in a universe that no longer exists,
it was necessary to block uploading prebuilts if the
code that built them wasn't committed to the tree.
* So long as we lived in that other universe, it was
therefore necessary that the CQ fail when any CL failed
to be committed.
dgarrett@ has learned that the universe we once lived in ceased
to exist some time ago. It is now safe to upload prebuilts even
if some code changes don't get committed.
So, the proposed fix is: Don't turn the CQ red because one or more
CLs fail to be committed.
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by davidri...@chromium.org
, Jan 10 2018