New issue
Advanced search Search tips

Issue 908534 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Feature



Sign in to add a comment

Automatically run, and then disable flaky tests.

Project Member Reported by erikc...@chromium.org, Nov 26

Issue description

I suspect this is already on the roadmap for the Find-It team, but I haven't seen a crbug specifically targeting this. Please dupe if this feature request already exists.

In this crbug:
https://bugs.chromium.org/p/chromium/issues/detail?id=908517#c3

Find-It detected a flaky test, but no one followed up [probably because the flake is reasonably infrequent]. I independently found the test. I ran the test locally and discovered that it deterministically fails when run by itself. Presumably it passes when run in a batch due to state being carried over from previous tests. 

These failures are very likely to cause false rejects, and significantly increase run-time.

Example series of events [matching the events from the above crbug]. Numbers are fictional.

(1) Developer introduces a test. It passes 99% of the time when run as part of a batch. It fails 100% of the time when run by itself.
(2) The test is run once, in a batch, and passes the CQ. It is landed.
(3) Subsequently, ~1% of CLs fail the test when run in a batch. Both 'with patch' retries and 'retry with patch' will deterministically fail, since they run the test by itself.
(4) We then retry the build at the CQ layer. Since we're retrying all tests and the bad test is run in a batch, it most likely passes.

In the example above, I manually ran the test locally, confirmed that it deterministically fails when run standalone, and then disabled the test. Ideally we could automate this process.
 
Labels: -Type-Bug -Pri-3 Pri-1 Type-Feature
Cc: lijeffrey@chromium.org
+lijeffrey@ since this related to flake analysis and auto action.

Found a flake analysis on the test that was mentioned in  bug 908517 : https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVy3wELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKoAWNocm9taXVtLm1hYy9NYWMxMC4xMyBUZXN0cy82Nzc3L3dlYmtpdF9sYXlvdXRfdGVzdHMgb24gSW50ZWwgR1BVIG9uIE1hYyBvbiBNYWMtMTAuMTIuNi9hSFIwY0M5MFpYTjBjeTlrWlhaMGIyOXNjeTl6YjNWeVkyVnpMMlJsWW5WbloyVnlMMnhwZG1VdFpXUnBkQzF1YnkxeVpYWmxZV3d1YW5NPQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM

As this analysis shows, the test was considered as constantly fail around commit 609234. Not sure if that's the same as erikchen@ 's finding on the test being deterministically failing when run by itself. If so, I'm not sure for this particular case how Findit could handle.
Unfortunately, the find-it analysis was incorrect in this case [I notice it only has 20% confidence].

The root cause is that this CL [10-02] changed the test expectations: https://chromium-review.googlesource.com/c/chromium/src/+/1258249/2/third_party/WebKit/LayoutTests/TestExpectations

Previously, the test was failing, and the failures were being ignored. This CL attempted to fix the test, and then caused layout tests to stop ignoring the failures.

My suggestion is that instead of trying to identify a root cause and reverting that CL [in this case, it's really hard because the CL in question re-enabled a test that flakes <1% of the time, but when the test flakes, it always fails on retries]. 

we independently retry all flaky tests, and if the test deterministically fails, we disable the test and then assign a bug to OWNERs of the code. So instead of culprit identification, we purely try to mitigate.

Sign in to add a comment