New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 689144 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Sheriff-o-matic lists no alerts for chromium.perf

Project Member Reported by sullivan@chromium.org, Feb 6 2017

Issue description

The waterfall looks pretty red, but I see no alerts on https://sheriff-o-matic.appspot.com/chromium.perf (waterfall link: https://build.chromium.org/p/chromium.perf/waterfall)
 
Cc: seanmccullough@chromium.org
You can use https://sheriff-o-matic.appspot.com/chromium.perf?useMilo (note the useMilo at the end) to see alerts.

The problem has to do with how we generate alerts. The current way of generating them involves downloading master json from a caching service hosted on app engine. chromium.perf's master JSON data is large enough that it sometimes hits the app engine request size limit.

We're working on a new way to generate alerts which uses a different service which gzips the master json data, which allows it to be under the app engine data size limit. I think sean is working on the rollout of this new way to generate alerts, but he got pulled into some work on monorail spam AFAIK.
Should we be defaulting to useMilo for perf?
Not yet. ?useMilo will display *some* alerts but they're not the same ones you'd see without it. 

We have a metric tracking the edit distance between the a-d alerts and the cron alerts (for staging, not live on prod yet) here: https://pcon.corp.google.com/p#chrome-infra/queryplayground?duration=86400&heatmapColorScale=viceroy&legendtable=false&names=Requests%20by%20app%20version&oldHeatmap=false&outputPoints=900&showEditor=true&stacked=true&title=Requests%20by%20app%20version&yAxisLabel=QPM&yAxisMin=0&query=mash&mash=Fetch(Raw('monarch.acquisitions.Task',%20'/chrome/infra/analyzer/cron_alert_diffs'),%20%7B'data_center':%20'appengine',%20'service_name':%20'sheriff-o-matic-staging'%7D)%0A%7C%20Window(Rate('20m'))%0A%7C%20Point(VAL%20*%2060)%0A%7C%20GroupBy(%5B'metric:tree'%5D)

The cron (useMilo) alerts for the chromium.perf tree are still pretty far off from the alerts-dispatcher alerts. Chromium is the closest, and chromium.perf and android are the furthest off. We'll need to dig into that to see what's causing the difference.

The alerting logic is the same for cron and a-d, but cron is fetching the build extract from milo, while a-d is still getting it from CBE.


Got it.  Since this is crucial for perf sheriff workflow, then this really is a P1 and should not go unowned.

Sean is working on spam issues in monorail right now - martiniss@ do you have any extra cycles?
Owner: martiniss@chromium.org
Status: Assigned (was: Untriaged)
I can look at making the milo alerts correct in SoM. I'm not sure how soon it'll get done though.
Labels: Milestone-Reliability
Will work on this starting Monday.
martniss@ were you able to start on this monday as planned?  Any updates?
I haven't had a chance to :( I've been pretty busy. 

looking now.
I dug into this somewhat.

It looks like we're running into OOM issues with the cron job, which is killing it? It looks to me like the data is really of out date, but I'm not sure why exactly.

I'll look at this more.
Labels: -Pri-1 Pri-2
There are workarounds for this. I can run a cron on my machine when the master json  is too large. Lowering priority.
Status: Fixed (was: Assigned)
seanmccullough@ actually migrated the chromium.perf tree to run on app engine, so this problem shouldn't happen anymore! Hopefully. Anyways, closing this issue.

Sign in to add a comment