New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754707 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Closed: Oct 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

New perf dashboard alerts link to /group_report with old ones

Project Member Reported by sullivan@chromium.org, Aug 11 2017

Issue description

This happened a few times yesterday:

https://bugs.chromium.org/p/chromium/issues/detail?id=754252
https://chromeperf.appspot.com/group_report?sid=889f864b9ef112c634c7c0dc9a9c0e180efc3ae9aea26c0776461d22d2b012c4
Here are the timestamps and test paths from the alert json:
2017-08-05T02:33:45.513630 - ChromiumPerf/chromium-rel-mac11-air/system_health.common_desktop/cpu_time_percentage_avg/multitab_misc/multitab_misc_typical24
2017-07-04T13:09:49.935860 - ChromiumPerf/chromium-rel-mac11-air/thread_times.tough_scrolling_cases/thread_raster_cpu_time_per_frame
2017-07-01T09:05:09.314070 - ChromiumPerf/android-nexus7v2/thread_times.tough_scrolling_cases/thread_raster_cpu_time_per_frame
2017-06-30T00:41:40.590400 - ChromiumPerf/chromium-rel-mac11/thread_times.tough_scrolling_cases/thread_total_all_cpu_time_per_frame
Why did an alert from August get lumped in with 3 from July? Why doesn't it show on the page? Had the ones from July been previously triaged?

Same thing in
https://bugs.chromium.org/p/chromium/issues/detail?id=754238
https://chromeperf.appspot.com/group_report?sid=4c0d9a6cf68ae7b729b8a76dcc2533226114ac8399ccb481dc890d491a4c2ea5
One alert at 2017-08-05T02:33:45.513630 lumped in with several from early July, and the one from August is not visible.

Same thing in https://bugs.chromium.org/p/chromium/issues/detail?id=754236
https://chromeperf.appspot.com/group_report?sid=31252432481f650a55ba1da8ac023c409c8cdcc9f313b9aa3e97691e970eb159

Ben, could this have anything to do with recent grouping work?

This is pretty high priority as it appears to be causing alerts to be mis-traiged. I'll do some initial triage.

 
Summary: New perf dashboard alerts link to /group_report with old ones (was: Perf dashboard alerts firing late)
I don't see that alert json when I click those links.

The revision ranges in all of those reports overlap, so that aspect of the grouping algorithm appears to be WAI.

In 754252, the Tests differ but the Test Suites are the same, and they are not memory metrics, so it appears to be WAI.
In 754238, the Test Suites and measurement names are the same, so it appears to be WAI.
In 754236, there's one alert from system_health.memory_mobile and 11 from memory.top_10_mobile, but they are all related because they are all system_memory:native_heap:proportional_resident_size_avg, so it appears to be WAI.

AFIAK, neither the old nor new alert grouping algorithms have ever considered when alerts were generated. Should it?

The alerts page still hides triaged alerts by default. I haven't yet changed how alert states are managed.

Maybe sheriffs are just now getting around to triaging alerts from july whose revision ranges overlap with new alerts?

I can start digging in to see if the old backend grouping algorithm is still affecting some things, but please let me know if you find any group_reports whose revision ranges don't overlap, or if the algorithm should consider when alerts were generated.

Owner: simonhatch@chromium.org
Hmm, I can't reproduce what I saw Friday morning either, the "invisible" later alert in the list is lost. However I took a look at the graph: https://chromeperf.appspot.com/report?sid=9ba5b131be8ccce63d31acafe34fb8f417421e31e0f469be8e92a6990892a60a

That alert is from an improvement, so maybe this is some sort of artifact of where we filter out improvements. It seems to be a red herring.

Simon, do you think you could take a look at the dupes from August 10 or later on bug 750870:
* Why were the alerts showing up so late? They seem to be timestamped from July.
* Did everything go correctly? Spot-checking I think that the alert was on the same test_path that the bug title/bots list, and the bisect also chose one from the correct group.
Roughly looks like the first of these alerts started getting triaged around the end of July (crbug.com/750870 was filed on July 31, 2017) which would make the alerts at that time nearly a month old. In fact, alerts fired 1-2 weeks after were triaged just fine ( crbug.com/741697 ).


So trying to speculate on how this might happen, don't have any good ideas at the moment. Visiting /alerts has a few internal alerts still around from June 28 for system_health.

The alerts were generated at the seemingly correct time, back in June/July. There are no errors in the logs or the error console that seem related to this. There's plenty of data after each alert, so no change of these just suddenly getting turned back on after some extended period.

One possibility that these simply went untriaged for various reasons, but that seems unlikely. There have also been several sheriffs since those alerts *should* have fired and sullivan@ has reported having a clean alert page. This was several shifts after majidvp@ (who filed crbug.com/750870), and before primiano@ who filed a few more of these. Checking the conditions of having the "cat" up on the alerts page, the alerts query has to come back empty and this query specifies filters for improvements, triaged, and recovered. I'm not sure there are any conditions that query can miss untriaged alerts.

There was no new dashboard deployed between July 13, and Aug 9.

Another possibility is some sort of side-effect of the client-side grouping. I recall there needing to be numerous fixes to alerts-table when the Polymer 1.0 migration happened, so it's possible it's exposed some other bug there but that wouldn't explain why we had a report of a clean alerts page between filing crbug.com/750870 and the latest round of alerts associated with it.
Cc: pmeenan@chromium.org

Comment 6 by pwang@chromium.org, Sep 14 2017

Cc: pwang@chromium.org
Is there any update of this problem? As our team recently got lots of email alerts from July pointing to graph like this.
https://chromeperf.appspot.com/report?sid=d3e6cc5f75490a2f262a7ad62a845ca91ef4420295b852cff2db5c11cdebfeaa&start_rev=30550000942500000&end_rev=32120000993900000
If this is a side-effect of the grouping mechanism, as the user of the chrome perf dashboard, is there a setting to turn off the alert-grouping on our side?
re: #c6

Unfortunately no haven't looked into this beyond the initial investigation. I can try again tomorrow. Which team is this?
Cc: -pmeenan@chromium.org
Status: Archived (was: Untriaged)
I can't recall seeing this issue anytime recently.

Sign in to add a comment