New issue
Advanced search Search tips

Issue 753109 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Alerts for flaky tests disappear if the last run passed

Project Member Reported by charliea@chromium.org, Aug 7 2017

Issue description

Often, when I'm the perfbot health sheriff, I'll find myself triaging an especially difficult alert (that might take an hour or so to diagnose) only to find that, when I create a bug for the alert or find an existing bug, the alert no longer exists to link it to. 

This happened today while triaging a flaky failure of certain tests on the Nexus6 Webview Perf bot. As you can see in pass_fail.png, battor.steady_state is flaky. I started investigating the failure around 3:30pm, but a new passed run at 4:35pm caused the alert to disappear, leaving me with no alert to link the bug to and no context to provide to a future sheriff who might encounter the same problem.
 
pass_fail.png
189 KB View Download
Cc: seanmccullough@chromium.org
Labels: Milestone-Flakiness Milestone-UX
Status: Available (was: Untriaged)
The "recently resolved" list used by CrOS might useful for this particular case (alert would still be visible after a passing build), but it really looks like we need to deal with the flakiness at the root of it.
Yea: I think that, if a failure only happens ~33% of the time (pretty common among our tests), it's not reasonable to have that alert go away after 2 passing runs in a row, for example. This is actually a pretty similar problem to what we deal with when doing bisects for flakiness: after how many repetitions are you reasonably confident that a test is passing at a given CL? It might be worth reaching out to dtu@ (who's the TL of bisect) in order to get some ideas. I think that anything you do will have to be probabilistic, but to get rid of the alert on the first passing run probably isn't the right behavior.
Project Member

Comment 3 by sheriffbot@chromium.org, Aug 13

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Available (was: Untriaged)

Sign in to add a comment