New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 685833 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Oct 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Flaky system health benchmarks?

Project Member Reported by zh...@chromium.org, Jan 26 2017

Issue description

During the perf bot health sheriff, I noticed that system health related benchmarks on Android have long failure history. Usually, people are advised to disable the failing story. But that is not helpful here because for each run, the failing story is different. 

That means:
1. system health benchmark almost always fails for each run.
2. but in each run, the failing story is different.

For example:
system_health.common_mobile failing on chromium.perf/Android Nexus5 Perf (2)
https://sheriff-o-matic.appspot.com/chromium.perf/examine/chromium.perf.Android%20Nexus5%20Perf%20(2).system_health.common_mobile.
The most recent 5 runs (4940 - 4944), the failing stories are the following respectively:
search:portal:google
browse:news:qq
background:news:nytimes
browse:news:reddit
browse:media:youtube
There is not even any repetition. :(

On sheriff-o-matic, any long failing system health related benchmark is in this situation.

It is probably fine because our goal is to get the performance benchmark data. As long as there is no consistent failure, it should be fine. But from perf bot health sheriff's point of view, this creates a mindset that it is safe to just ignore any long failing system health benchmark. This is potentially dangerous as no one would care about the system health benchmark redness while there are always other failures to take care of.

I am not sure if it is feasible to remove this flakiness completely. But at least if there is any signal on whether any story is failing consistently, that would be helpful.
 

Comment 1 by zh...@chromium.org, Jan 26 2017

Hmm, sheriff-o-matic sometimes does not show any alert. Here is the link to the bot mentioned in #1:
https://luci-milo.appspot.com/buildbot/chromium.perf/Android%20Nexus5%20Perf%20%282%29/
Cc: martiniss@chromium.org
We've talked about sharding by story, which I think should help here. Ideally, we'd not want to turn a bot red because of an individual story failing. Sharding is a long term thing and not something we still need to design, prototype and plan for.

However, would it be possible to keep a bot red if a single system health story fails? 
Components: Speed>Metrics>SystemHealthRegressions
Components: -Speed>Metrics>SystemHealthRegressions
Components: Speed>Metrics>SystemHealthRegressions
Status: WontFix (was: Untriaged)
Marking this WontFix since it's pretty obsolete; sheriff-o-matic and flakiness dashboard now show story history, so this essentially works the way we want now.

Sign in to add a comment