New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 750250 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

perf builder failure should be tree blocker

Project Member Reported by nedngu...@google.com, Jul 28 2017

Issue description

In  issue 748509 , perf builders bots were failing for 3 days without being acted. This caused all perf bots stop functioning due to the lack of new builds.

I propose that all perf builder bots should be tree blocker & sheriffed by Chromium sheriffs.

Dirk: does this sound ok to you?
 
Summary: perf builder failure should be tree blocker (was: perf builder failing for 3 days without being acted on)
We should get CQ coverage for these bots if we want them to be tree blockers, right? I'm not sure how hard that is to do
Ideally, yes, we'd have coverage in the CQ. In practice, though, at least the Win builders might be too slow, so we'd have to look at it.

Perhaps more importantly, the Perf builders are doing full official builds (w/ src-internal checked out), and that means that non-Googlers can't access the build logs and wouldn't be able to see why their builds are failing, and I don't think we can put them on the CQ as a result.

I think, historically, we've also been unwilling to put them on the main waterfall because of that reason, but we've given up on the idea that sheriffs will be non-Googlers, and so I think having them be on the main waterfall and not the CQ is probably an acceptable tradeoff. I'll double-check with a few others to see if they strongly disagree.

Regardless, why were the builds failing for 3 days on the perf waterfall; why didn't the perf bot health sheriff deal with it?
* why didn't the perf bot health sheriff deal with it: I think this is partly because they are overwhelmed by the number of failures perf waterfall have.

In this particular instance, perf bot health sheriffs did file the bug ( issue 748509 ), but they didn't access the bug priority right. I think this is where we can try to improve SOM to be clearer about priority. +martiniss
> why didn't the perf bot health sheriff deal with it: I think this is 
> partly because they are overwhelmed by the number of failures perf 
> waterfall have.

Fair enough. And, to be clear: I really want to shift as much of the load as we can
to the main sheriff pool, so I do want us to do whatever we can here.
Owner: dpranke@chromium.org
Thanks Dirk. I temporarily assign this to you for checking with others if they agree. Once the action items are clear, feel free to reassign to me so I can triage this bug to benchmarking team.
Status: Assigned (was: Untriaged)
Cc: benhenry@chromium.org
Just a note that I am currently heavily investigating this from the perf side as my guiding mission for the work I took over from Ben (pruning the current configurations on the perf waterfall) is to actually get rid of the perf bot health sheriff rotation (ie move it to the main sheriff pool).

Dirk, please keep me in the look with any actions items you see from your end.  A brief doc will hopefully be distributed from our side by early next week to start discussing exactly this.
sorry keep me in the loop...
Dirk: is there any reply from others in #3? A P0 bug about perf builders going down just happening again today ( issue 786368 )
I've gotten no objections. Pending any issues w/ the tooling (which I'm investigating now), I'll move the builders over to be tree-closers and monitored by the chromium sheriffs.
How much traffic are we going to see? Have all of the sheriffs been informed?
We shouldn't see much; if we do, then that's all for the better, since this is the sort of thing I want the main sheriffs worrying about, not the perf bot health sheriffs. 

All of the sheriffs have not been informed yet, but I will publicize this as part of making the changes.
Thanks for driving this, Dirk! 
Cc: nednguyen@chromium.org simonhatch@chromium.org estaab@chromium.org fmea...@chromium.org no...@chromium.org dtu@chromium.org jam@chromium.org phajdan@google.com
 Issue 618544  has been merged into this issue.
Cc: -nednguyen@chromium.org
Cc: seanmccullough@chromium.org
Labels: -Pri-2 Pri-1
Status: Started (was: Assigned)
CL: https://crrev.com/c/830830

(Not sure if I need anything else as well ...)
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2

commit b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2
Author: Dirk Pranke <dpranke@chromium.org>
Date: Mon Dec 18 23:45:01 2017

Make chromium.perf builders main-tree-closers.

This CL changes the gatekeeper configuration so that
compile failures on the perf builders will close the
main Chromium tree and notify the Chromium sheriffs,
rather than just affecting the bot health sheriffs.

R=seanmccullough@chromium.org, nednguyen@google.com
BUG= 750250 

Change-Id: I4e1a9d56843c7de18b2f45c5d439b8bcb2f2f8ad
Reviewed-on: https://chromium-review.googlesource.com/830830
Commit-Queue: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Sean McCullough <seanmccullough@chromium.org>

[modify] https://crrev.com/b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2/scripts/slave/gatekeeper.json

Status: Fixed (was: Started)
I think this is done, though I haven't actually seen a compile failure on one of the bots, so I don't know if it's working or not. 

If it isn't, please re-open.

Sign in to add a comment