During the perf bot health sheriff, I noticed that system health related benchmarks on Android have long failure history. Usually, people are advised to disable the failing story. But that is not helpful here because for each run, the failing story is different.
That means:
1. system health benchmark almost always fails for each run.
2. but in each run, the failing story is different.
For example:
system_health.common_mobile failing on chromium.perf/Android Nexus5 Perf (2)
https://sheriff-o-matic.appspot.com/chromium.perf/examine/chromium.perf.Android%20Nexus5%20Perf%20(2).system_health.common_mobile.
The most recent 5 runs (4940 - 4944), the failing stories are the following respectively:
search:portal:google
browse:news:qq
background:news:nytimes
browse:news:reddit
browse:media:youtube
There is not even any repetition. :(
On sheriff-o-matic, any long failing system health related benchmark is in this situation.
It is probably fine because our goal is to get the performance benchmark data. As long as there is no consistent failure, it should be fine. But from perf bot health sheriff's point of view, this creates a mindset that it is safe to just ignore any long failing system health benchmark. This is potentially dangerous as no one would care about the system health benchmark redness while there are always other failures to take care of.
I am not sure if it is feasible to remove this flakiness completely. But at least if there is any signal on whether any story is failing consistently, that would be helpful.
Comment 1 by zh...@chromium.org
, Jan 26 2017