Sheriff-o-matic lists no alerts for chromium.perf |
|||||
Issue descriptionThe waterfall looks pretty red, but I see no alerts on https://sheriff-o-matic.appspot.com/chromium.perf (waterfall link: https://build.chromium.org/p/chromium.perf/waterfall)
,
Feb 6 2017
Should we be defaulting to useMilo for perf?
,
Feb 6 2017
Not yet. ?useMilo will display *some* alerts but they're not the same ones you'd see without it. We have a metric tracking the edit distance between the a-d alerts and the cron alerts (for staging, not live on prod yet) here: https://pcon.corp.google.com/p#chrome-infra/queryplayground?duration=86400&heatmapColorScale=viceroy&legendtable=false&names=Requests%20by%20app%20version&oldHeatmap=false&outputPoints=900&showEditor=true&stacked=true&title=Requests%20by%20app%20version&yAxisLabel=QPM&yAxisMin=0&query=mash&mash=Fetch(Raw('monarch.acquisitions.Task',%20'/chrome/infra/analyzer/cron_alert_diffs'),%20%7B'data_center':%20'appengine',%20'service_name':%20'sheriff-o-matic-staging'%7D)%0A%7C%20Window(Rate('20m'))%0A%7C%20Point(VAL%20*%2060)%0A%7C%20GroupBy(%5B'metric:tree'%5D) The cron (useMilo) alerts for the chromium.perf tree are still pretty far off from the alerts-dispatcher alerts. Chromium is the closest, and chromium.perf and android are the furthest off. We'll need to dig into that to see what's causing the difference. The alerting logic is the same for cron and a-d, but cron is fetching the build extract from milo, while a-d is still getting it from CBE.
,
Feb 6 2017
Got it. Since this is crucial for perf sheriff workflow, then this really is a P1 and should not go unowned. Sean is working on spam issues in monorail right now - martiniss@ do you have any extra cycles?
,
Feb 7 2017
I can look at making the milo alerts correct in SoM. I'm not sure how soon it'll get done though.
,
Feb 7 2017
,
Feb 11 2017
Will work on this starting Monday.
,
Feb 15 2017
martniss@ were you able to start on this monday as planned? Any updates?
,
Feb 15 2017
I haven't had a chance to :( I've been pretty busy. looking now.
,
Feb 15 2017
I dug into this somewhat. It looks like we're running into OOM issues with the cron job, which is killing it? It looks to me like the data is really of out date, but I'm not sure why exactly. I'll look at this more.
,
Mar 8 2017
There are workarounds for this. I can run a cron on my machine when the master json is too large. Lowering priority.
,
Jun 15 2017
seanmccullough@ actually migrated the chromium.perf tree to run on app engine, so this problem shouldn't happen anymore! Hopefully. Anyways, closing this issue. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by martiniss@chromium.org
, Feb 6 2017