New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 850710 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 837855



Sign in to add a comment

Flake detection idea: re-run to differentiate a consistent failure from a flaky test

Project Member Reported by liaoyuke@chromium.org, Jun 7 2018

Issue description

One things that prevents Flake Detection from being applied to other projects such as webrtc is that Flake Detection relies on the "without patch" step to differentiate a consistent failure from a flaky test.

For those projects without the "without patch" mechanism, one idea is to re-run the test to achieve the same goal. Here is how it works:

Imagine in a world without "without patch" rerun.

CL1: cq attempt1 build1: test t failed.
CL1: cq attempt2 build2: all tests passed.

Flake Detection caught t as a flaky test, so it reruns t with the failed hash 30 times to decide if it is a consistent failure (may not may not be related to the patch).

CL2: cq attempt1 build3: test t failed.
CL2: cq attempt2 build4: all tests passed.

Flake Detection caught t as a flaky test, so it reruns t with the failed hash 30 times to decide if it is a consistent failure (may not may not be related to the patch).

CL3: cq attempt1 build5: test t failed.
CL3: cq attempt2 build6: all tests passed.

Flake Detection caught t as a flaky test, so it reruns t with the failed hash 30 times to decide if it is a consistent failure (may not may not be related to the patch).

There are no code changes between the two attempts for each CL (except for rebase). Then because test t is flaky with 3 different CLs, it is considered as a flaky test. Before filing the bugs, Flake Detection checks if all the above three occurrences are consistent failures, if yes, DON'T file bug, otherwise, file a bug.

Why it works?
1. If t is a consistent failure that was committed to the code, then all the three re-runs would fail consistently, then a bug WON'T be filed.
2. If t is a consistent failure caused by a specific patch, the above situation is unlikely to happen because we look at 3 different CLs.
3. If t is a flaky test, the re-run won't fail consistently, so a bug WILL be filed.

Chatted with Roberto, with the build_index, it allows quickly find the hash given a build configuration and isolate target name.

Will explore more later.
 

Comment 1 by st...@chromium.org, Jun 7 2018

by "the failed hash", do you mean the git hash of the checkout?

Comment 2 by st...@chromium.org, Jun 7 2018

This sounds like a working idea! One factor to consider is VM capacity.
Labels: flake-detection

Sign in to add a comment