New issue
Advanced search Search tips

Issue 897607 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

[Infra/CQ] Allow developers to explicitly request no retries

Project Member Reported by gab@chromium.org, Oct 22

Issue description

I often find myself modifying code in base that *may* introduce flakes across the entire test suite (e.g. in oddly written old tests). Currently the retries are hiding any flakes which will in turn make the whole test suite flakier...

It would be nice if could explicitly add NORETRIES=True or something to tell CQ not to retry (especially not during a single run -- i.e. don't ever make the bot green if there were failures).

I'd even like a mode where I can not only not retry, but force 3 tries say so I can manually look at the errors and decide for myself if I caused them.
 
Case in point : I just landed a CL introducing flakiness @ r601600 (a few tests went from 100% to ~60% success). Thankfully FindIt identified it as the culprit and I was able to revert/reland w/ fix. Looking on the flakiness dashboard though, flakes for these tests had been identified in patch sets prior to the one I landed but CQ never made me aware of them... https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=remoting_unittests&tests=SecurityKeyMessageReaderImplTest.MultipleMessages

Knowing this was a tricky change: a NORETRY mode would have saved me a day.
Cc: liaoyuke@chromium.org dpranke@chromium.org st...@chromium.org jbudorick@chromium.org
+ stgao, jbudorick, dpranke, liaoyuke

Unfortunately, this is a bit trickier than you just suggested. I don't think that a global NORETRIES=True is the right approach.

webkit_layout_tests is an example of a test suite that has many flaky tests. If we had a NORETRY mode, then the CL would likely fail in webkit_layout_tests, regardless of whether the change was related.

For example, in your CL, the first shard of webkit_layout_tests I clicked on:
https://chromium-swarm.appspot.com/task?id=40b492ce92f10410&refresh=10&show_raw=1

had failures that only passed on retry, e.g. "fast/text/basic/generic-family-changes.html". I'm guessing these failures are unrelated to your CL.

If we had a database of expected flakiness of individual tests, then we could evaluate the change in flakiness of individual tests to determine whether or not the CL is likely causing an increase in flakiness. 

Until then, I'm not sure what else we can do on this front. stgao, liaoyuke -- thoughts?
agree w/ erik -- I don't think this is something we'd want to provide.
I agree that a global database of known flakes would help.

Imagining I'd have NORETRY=True today though : I'd manually investigate the failures and remove the NORETRY bit before the final submission. My point is that even when I know I'm potentially introducing flakiness and am willing to investigate all the flakes manually, I can't...

Exposing a list of flakes that happened on a given patch set would also help. Any way which allows me to manually look into all flakes, short of having to open the stdout of every single "green" run would be better than the status quo.
Thanks for the feedback. Let me see if I understand correctly.

You want a mechanism which clearly exposes all flakes so that you can investigate in more depth. 

Whether or not the build is red or green is orthogonal -- it only appears relevant because right now, red builds means that there are likely flakes, whereas we can't tell whether there are flakes in green builds.

This is a reasonable request. I'm not sure how to prioritize, but probably low relative to all the other work that CCI is doing [I think this falls in their domain?].
Components: Infra>Client>Chrome
#5: yeah, that'd be CCI, and it would indeed be relatively low priority.
This is some sort of combination of CCI and the CATS team (i.e., stgao@'s team / the FindIt and FlakeAnalyzer folks).

The current flakiness tooling is not far from making it easier to see all the flakes in a given CL, so that seems relatively doable. I'd expect this part to fall more in their domain than CCI's.

As to how we'd support something like this in tryjobs and in the CQ, there's a couple of ideas.

First, we have talked a bunch about wanting to turn off retries by default for many test suites (ones which are more like unit tests than integration tests and/or aren't expected to be flaky). I think we do want to support this, and if gab@'s tests were in one such suite, you'd kinda get this for "free" (not sure if remoting_unittests would clear that bar or not offhand).

Second, we also want to expose the sorts of hooks and methods the FindIt/FlakeAnalyzer tooling is using (e.g., to run tests N times) to users to make it easier for them to do their own investigating. 
Cc: erikc...@chromium.org
Owner: ----
Status: Available (was: Assigned)
Going to mark this as available.
Labels: -Pri-3 Pri-1
Another instance where I might be introducing flakes (while trying to fix one...) but have no way to ask CQ to make flakes obvious...

https://chromium-review.googlesource.com/c/chromium/src/+/1392178/15#message-26235655a51bc459e56582af938a968263df9ab3

Seems P1 to me in the world of CQ Infra.
Are you referring to win10_chromium_x64_rel_ng [which fails, and then succeeds] on PS#15? If so, I'm actively working on reducing frequency of that retry layer right now. See Issue 915319.

There are currently 3 retry layers that potentially mask newly introduced flakes. Unfortunately, the trade off is with having a very large number of false positives. [e.g. if we were to remove all retry layers, or have a flag that disables all retry layers, then pretty much every CQ run would fail on every run]. Our goal is to progressively remove more of the expensive retry layers. There is currently no plan to remove *all* retry layers, or allow a flag to do so.

For more background, see: https://docs.google.com/document/d/1O9nzVMA6rEe2-rhjsni_8wS9Lw4OnFpK2NE0wWB5VaY/edit
Ah, I misread your ask -- you're asking for a mechanism to make the flakes more obvious. That still seems like a reasonable UI request [not sure if we should expose through CQ or flakiness tooling -- maybe the latter...]
Basically, I know I might be introducing new flakes but have no way to tell (short of digging in logs of green try jobs).

It's going to be virtually impossible to go after flakes if devs that explicitly want to check for potential flakes caused by a tricky CL can't even do that intentionally (I'd be happier for now to have a mode where I see all flakes in a single run without auto-retries; I can manually make my opinion on each).

Thanks for working on this, I understand it's a tough problem, I'm just pointing out that I can't tell if I'm making it worse while trying to fix a flake source and that feels like a bad state to be in.

Comment 13 by gab@chromium.org, Jan 16 (6 days ago)

Here's another instance of this (and this time I'm out of options...).

In https://chromium-review.googlesource.com/c/chromium/src/+/1403059 I'm trying to document assumptions I've uncovered while trying to fix flaky interactive_ui_tests (oh, how ironic!)

In patch set 12, despite all bots being green according to dry-run, I inspected the logs and found a flake I introduced on https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win10_chromium_x64_rel_ng/176412 => https://chromium-swarm.appspot.com/task?id=426bbd2451a54810&refresh=10&show_raw=1

Failing to repro this flake locally, I need CQ to do it for me, so in patch set 13 I uploaded extra logs on the check, but that time it didn't flake...

Now I can't re-run the dry-run as it will merely skip all the steps per having passed recently (how clever..!)

What I would need in this case is a way to tell CQ to retry interactive_ui_tests on win10_chromium_x64_rel_ng as many times as it needs to encounter a flake (with some max iteration of course). And also a way to manually do this as many times as I want on the same CL.

For now, I guess I'll upload a PS14 with a dummy diff, run CQ (on all the bots...) and cross my fingers that it flakes an hour from now on that bot in that test...

I guess I could RDP to the bot too (though I've never done that), but from experience interactive_ui_tests don't play nicely with RDP.

Comment 14 by gab@chromium.org, Jan 16 (6 days ago)

Sigh, and swarming is clever too, I tried making a dummy string change to force a re-run in PS14 but it coalesced to yesterday's run results instead of running again :(.

Since fixing flakes inherently risks introducing other flakes (when tweaking entire test frameworks that is), it's rather impossible to work on flakiness while CQ hides it...

Comment 15 by erikc...@chromium.org, Jan 16 (6 days ago)

gab: Can you manually triggering the swarming tasks rather than going through the CQ? This seems to be the command that the recipe is issuing to swarming:
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8924161469012858016/+/steps/test_pre_run__with_patch_/0/steps/s__trigger__interactive_ui_tests__with_patch_/0/stdout

There's another flag to set idempotent=False, and then you can trigger it as many times as you want?

Sign in to add a comment