Alerts for flaky tests disappear if the last run passed |
|||
Issue descriptionOften, when I'm the perfbot health sheriff, I'll find myself triaging an especially difficult alert (that might take an hour or so to diagnose) only to find that, when I create a bug for the alert or find an existing bug, the alert no longer exists to link it to. This happened today while triaging a flaky failure of certain tests on the Nexus6 Webview Perf bot. As you can see in pass_fail.png, battor.steady_state is flaky. I started investigating the failure around 3:30pm, but a new passed run at 4:35pm caused the alert to disappear, leaving me with no alert to link the bug to and no context to provide to a future sheriff who might encounter the same problem.
,
Aug 10 2017
Yea: I think that, if a failure only happens ~33% of the time (pretty common among our tests), it's not reasonable to have that alert go away after 2 passing runs in a row, for example. This is actually a pretty similar problem to what we deal with when doing bisects for flakiness: after how many repetitions are you reasonably confident that a test is passing at a given CL? It might be worth reaching out to dtu@ (who's the TL of bisect) in order to get some ideas. I think that anything you do will have to be probabilistic, but to get rid of the alert on the first passing run probably isn't the right behavior.
,
Aug 13
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 18
|
|||
►
Sign in to add a comment |
|||
Comment 1 by seanmccullough@chromium.org
, Aug 9 2017Labels: Milestone-Flakiness Milestone-UX
Status: Available (was: Untriaged)