Investigate source of increased CQ false positives |
|
Issue descriptionCLs that introduce deterministic/flaky test failures are being landed more frequently in the last month. Some of these are being caught by FindIt, others are not. This is potentially related to the newly introduced 'retry with patch' recipe step, or to other recent changes to the chromium test recipe. Two requests for stgao: 1) Can you provide a link to the dashboard which shows # of FindIt reverts? 2) Can you provide a link to the 20 most recent Find-it reverted CLs? I will investigate and determine the root cause.
,
Nov 9
Actually, the first "next step" in c#1 is not correct. The desired behavior of preventing "obvious false positives from landing" is already in effect. If 'without patch' ever fails, we immediately mark the build as a success. If 'retry with patch' or 'with patch' ever succeeds, we also mark the build as a success. This means that any build in which 'retry with patch' causes SUCCESS is by definition of the same form factor as the "next step" I listed, and there's nothing to do. I think that the real problem is that "build 2" flakily passed. If a test flakily passes in the 'with patch' step, I think that we need to run it through 'without patch' to see if it flakily fails there. Unfortunately, this will significantly increase cycle time. This is where a live database of expected flakiness for each test would be incredibly useful. We can try the proposal I listed here, but I'm concerned that it will significantly increase cycle time, since we have flaky tests in pretty much every build, which means we'd need to add 'without patch' step to every build.
,
Nov 9
There are always going to be some flakes that get through; perhaps most obviously, any flaky test that happens to pass initially currently won't be retried, and to avoid this we'd have to re-run *every* test some arbitrary number of times. So, I'd probably be pretty reluctant to do anything to cause something like build 2 to do a 'without patch` pass. We've long talked in the past about wanting to re-run all *new* tests, which might help. Otherwise I think we just need to get better about handling flakiness that gets through the CQ and/or already exists. The live database of expected flakiness is also something we've long talked about and will help w/ that as well. |
|
►
Sign in to add a comment |
|
Comment 1 by erikc...@chromium.org
, Nov 9Labels: -Pri-3 Pri-1