New perf dashboard alerts link to /group_report with old ones |
||||||
Issue descriptionThis happened a few times yesterday: https://bugs.chromium.org/p/chromium/issues/detail?id=754252 https://chromeperf.appspot.com/group_report?sid=889f864b9ef112c634c7c0dc9a9c0e180efc3ae9aea26c0776461d22d2b012c4 Here are the timestamps and test paths from the alert json: 2017-08-05T02:33:45.513630 - ChromiumPerf/chromium-rel-mac11-air/system_health.common_desktop/cpu_time_percentage_avg/multitab_misc/multitab_misc_typical24 2017-07-04T13:09:49.935860 - ChromiumPerf/chromium-rel-mac11-air/thread_times.tough_scrolling_cases/thread_raster_cpu_time_per_frame 2017-07-01T09:05:09.314070 - ChromiumPerf/android-nexus7v2/thread_times.tough_scrolling_cases/thread_raster_cpu_time_per_frame 2017-06-30T00:41:40.590400 - ChromiumPerf/chromium-rel-mac11/thread_times.tough_scrolling_cases/thread_total_all_cpu_time_per_frame Why did an alert from August get lumped in with 3 from July? Why doesn't it show on the page? Had the ones from July been previously triaged? Same thing in https://bugs.chromium.org/p/chromium/issues/detail?id=754238 https://chromeperf.appspot.com/group_report?sid=4c0d9a6cf68ae7b729b8a76dcc2533226114ac8399ccb481dc890d491a4c2ea5 One alert at 2017-08-05T02:33:45.513630 lumped in with several from early July, and the one from August is not visible. Same thing in https://bugs.chromium.org/p/chromium/issues/detail?id=754236 https://chromeperf.appspot.com/group_report?sid=31252432481f650a55ba1da8ac023c409c8cdcc9f313b9aa3e97691e970eb159 Ben, could this have anything to do with recent grouping work? This is pretty high priority as it appears to be causing alerts to be mis-traiged. I'll do some initial triage.
,
Aug 11 2017
I don't see that alert json when I click those links. The revision ranges in all of those reports overlap, so that aspect of the grouping algorithm appears to be WAI. In 754252, the Tests differ but the Test Suites are the same, and they are not memory metrics, so it appears to be WAI. In 754238, the Test Suites and measurement names are the same, so it appears to be WAI. In 754236, there's one alert from system_health.memory_mobile and 11 from memory.top_10_mobile, but they are all related because they are all system_memory:native_heap:proportional_resident_size_avg, so it appears to be WAI. AFIAK, neither the old nor new alert grouping algorithms have ever considered when alerts were generated. Should it? The alerts page still hides triaged alerts by default. I haven't yet changed how alert states are managed. Maybe sheriffs are just now getting around to triaging alerts from july whose revision ranges overlap with new alerts? I can start digging in to see if the old backend grouping algorithm is still affecting some things, but please let me know if you find any group_reports whose revision ranges don't overlap, or if the algorithm should consider when alerts were generated.
,
Aug 14 2017
Hmm, I can't reproduce what I saw Friday morning either, the "invisible" later alert in the list is lost. However I took a look at the graph: https://chromeperf.appspot.com/report?sid=9ba5b131be8ccce63d31acafe34fb8f417421e31e0f469be8e92a6990892a60a That alert is from an improvement, so maybe this is some sort of artifact of where we filter out improvements. It seems to be a red herring. Simon, do you think you could take a look at the dupes from August 10 or later on bug 750870: * Why were the alerts showing up so late? They seem to be timestamped from July. * Did everything go correctly? Spot-checking I think that the alert was on the same test_path that the bug title/bots list, and the bisect also chose one from the correct group.
,
Aug 14 2017
Roughly looks like the first of these alerts started getting triaged around the end of July (crbug.com/750870 was filed on July 31, 2017) which would make the alerts at that time nearly a month old. In fact, alerts fired 1-2 weeks after were triaged just fine ( crbug.com/741697 ). So trying to speculate on how this might happen, don't have any good ideas at the moment. Visiting /alerts has a few internal alerts still around from June 28 for system_health. The alerts were generated at the seemingly correct time, back in June/July. There are no errors in the logs or the error console that seem related to this. There's plenty of data after each alert, so no change of these just suddenly getting turned back on after some extended period. One possibility that these simply went untriaged for various reasons, but that seems unlikely. There have also been several sheriffs since those alerts *should* have fired and sullivan@ has reported having a clean alert page. This was several shifts after majidvp@ (who filed crbug.com/750870), and before primiano@ who filed a few more of these. Checking the conditions of having the "cat" up on the alerts page, the alerts query has to come back empty and this query specifies filters for improvements, triaged, and recovered. I'm not sure there are any conditions that query can miss untriaged alerts. There was no new dashboard deployed between July 13, and Aug 9. Another possibility is some sort of side-effect of the client-side grouping. I recall there needing to be numerous fixes to alerts-table when the Polymer 1.0 migration happened, so it's possible it's exposed some other bug there but that wouldn't explain why we had a report of a clean alerts page between filing crbug.com/750870 and the latest round of alerts associated with it.
,
Aug 29 2017
,
Sep 14 2017
Is there any update of this problem? As our team recently got lots of email alerts from July pointing to graph like this. https://chromeperf.appspot.com/report?sid=d3e6cc5f75490a2f262a7ad62a845ca91ef4420295b852cff2db5c11cdebfeaa&start_rev=30550000942500000&end_rev=32120000993900000 If this is a side-effect of the grouping mechanism, as the user of the chrome perf dashboard, is there a setting to turn off the alert-grouping on our side?
,
Sep 14 2017
re: #c6 Unfortunately no haven't looked into this beyond the initial investigation. I can try again tomorrow. Which team is this?
,
Jul 30
,
Oct 4
I can't recall seeing this issue anytime recently. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sullivan@chromium.org
, Aug 11 2017