New issue
Advanced search Search tips

Issue 761435 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression



Sign in to add a comment

sheriff-o-matic alerts are out of date

Project Member Reported by charliea@chromium.org, Sep 1 2017

Issue description

(CCing altimin@, the current perfbot health sheriff)

I noticed on the flakiness dashboard that there's a failure occurring on the waterfall that isn't showing up on sheriff-o-matic.

flakiness_dashboard.png clearly shows that browse:media:tumblr has failed in the last five runs on Mac 10.11 Perf.

Milo corroborates this story in milo.png.

However, when I go to sheriff-o-matic and filter by system_health.common_desktop, I only see one system_health.common_desktop failure: browse:social:facebook_infinite_scroll is said to be flakily failing on Mac Air 10.11 Perf. However, when I go back to Milo/the flakiness dashboard, it's clear that this failure has been resolved for a while (see mac_air_flakiness_dashboard.png)

It looks like these alerts are several days out of date. Any idea what's going on?
 
mac_air_flakiness_dashboard.png
109 KB View Download
milo.png
223 KB View Download
flakiness_dashboard.png
77.8 KB View Download
Cc: martiniss@chromium.org
https://screenshot.googleplex.com/pURiMc2tcy3

chromium.perf analyzer has been OOMing for the past three days.

We should set up alerting for this.

Big picture: perf has a lot more failures to examine than the other trees, and it frequently hits the memory limit for GAE instances and they get killed.

martiniss@ is this at all related to the extra test-results perf is uploading now?
I don't think it's related to that, but maybe? Let me take a look.
Also https://chromium-review.googlesource.com/c/chromium/src/+/639577 hasn't landed yet, so that couldn't have caused it.
jojwang@'s deployed version is "11124-2a2d3d0". The previous version is "11211-7fa4298". The first number is smaller, indicating that their version of code was out of date. https://chromium.googlesource.com/infra/infra/+/2a2d3d0 is from august 8th. I think that's probably what caused this; IIRC I landed some code this month which helped with perf OOMs.

I'm reverting to the previous version on SOM. That should help with these issues.
Cc: jojwang@chromium.org
Status: Fixed (was: Untriaged)
Alerts are back. 

Sign in to add a comment