New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 900370 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Investigate source of increased CQ false positives

Project Member Reported by erikc...@chromium.org, Oct 30

Issue description

CLs that introduce deterministic/flaky test failures are being landed more frequently in the last month. Some of these are being caught by FindIt, others are not.

This is potentially related to the newly introduced 'retry with patch' recipe step, or to other recent changes to the chromium test recipe.

Two requests for stgao:
1) Can you provide a link to the dashboard which shows # of FindIt reverts?

2) Can you provide a link to the 20 most recent Find-it reverted CLs? I will investigate and determine the root cause.
 
Cc: chanli@chromium.org liaoyuke@chromium.org
Labels: -Pri-3 Pri-1
While investigating false negatives: https://bugs.chromium.org/p/chromium/issues/detail?id=903495#c1. I discovered a false positive.

CL: https://chromium-review.googlesource.com/c/chromium/src/+/1307445/2

Symptoms. 
Build 1 'with patch': test fails 4 times
Build 1 'without patch': test passes 10 times
Build 1 'retry with patch': Test fails 40 times.
Build 2 'with patch': Test fails 3 times, passes once.

Next steps:
* Prevent obvious false positives from landing. If 'with patch' and 'retry with patch' appear to deterministically fail, and 'without patch' appears to deterministically pass, then mark the build as a failure.
* 'retry with patch' should be running tests 10 times, not 40. For some reason, browser_tests is spawning 4 shards and running each of them 10 times. Investigate and fix.
Actually, the first "next step" in c#1 is not correct. The desired behavior of preventing "obvious false positives from landing" is already in effect. If 'without patch' ever fails, we immediately mark the build as a success. If 'retry with patch' or 'with patch' ever succeeds, we also mark the build as a success.

This means that any build in which 'retry with patch' causes SUCCESS is by definition of the same form factor as the "next step" I listed, and there's nothing to do.

I think that the real problem is that "build 2" flakily passed. If a test flakily passes in the 'with patch' step, I think that we need to run it through 'without patch' to see if it flakily fails there. Unfortunately, this will significantly increase cycle time.

This is where a live database of expected flakiness for each test would be incredibly useful. We can try the proposal I listed here, but I'm concerned that it will significantly increase cycle time, since we have flaky tests in pretty much every build, which means we'd need to add 'without patch' step to every build.
There are always going to be some flakes that get through; perhaps most obviously, any flaky test that happens to pass initially currently won't be retried, and to avoid this we'd have to re-run *every* test some arbitrary number of times. So, I'd probably be pretty reluctant to do anything to cause something like build 2 to do a 'without patch` pass.

We've long talked in the past about wanting to re-run all *new* tests, which might help. Otherwise I think we just need to get better about handling flakiness that gets through the CQ and/or already exists.

The live database of expected flakiness is also something we've long talked about and will help w/ that as well.

Sign in to add a comment