perf builder failure should be tree blocker |
||||||||
Issue descriptionIn issue 748509 , perf builders bots were failing for 3 days without being acted. This caused all perf bots stop functioning due to the lack of new builds. I propose that all perf builder bots should be tree blocker & sheriffed by Chromium sheriffs. Dirk: does this sound ok to you?
,
Jul 28 2017
We should get CQ coverage for these bots if we want them to be tree blockers, right? I'm not sure how hard that is to do
,
Jul 28 2017
Ideally, yes, we'd have coverage in the CQ. In practice, though, at least the Win builders might be too slow, so we'd have to look at it. Perhaps more importantly, the Perf builders are doing full official builds (w/ src-internal checked out), and that means that non-Googlers can't access the build logs and wouldn't be able to see why their builds are failing, and I don't think we can put them on the CQ as a result. I think, historically, we've also been unwilling to put them on the main waterfall because of that reason, but we've given up on the idea that sheriffs will be non-Googlers, and so I think having them be on the main waterfall and not the CQ is probably an acceptable tradeoff. I'll double-check with a few others to see if they strongly disagree. Regardless, why were the builds failing for 3 days on the perf waterfall; why didn't the perf bot health sheriff deal with it?
,
Jul 28 2017
* why didn't the perf bot health sheriff deal with it: I think this is partly because they are overwhelmed by the number of failures perf waterfall have. In this particular instance, perf bot health sheriffs did file the bug ( issue 748509 ), but they didn't access the bug priority right. I think this is where we can try to improve SOM to be clearer about priority. +martiniss
,
Jul 28 2017
> why didn't the perf bot health sheriff deal with it: I think this is > partly because they are overwhelmed by the number of failures perf > waterfall have. Fair enough. And, to be clear: I really want to shift as much of the load as we can to the main sheriff pool, so I do want us to do whatever we can here.
,
Jul 31 2017
Thanks Dirk. I temporarily assign this to you for checking with others if they agree. Once the action items are clear, feel free to reassign to me so I can triage this bug to benchmarking team.
,
Jul 31 2017
,
Aug 2 2017
Just a note that I am currently heavily investigating this from the perf side as my guiding mission for the work I took over from Ben (pruning the current configurations on the perf waterfall) is to actually get rid of the perf bot health sheriff rotation (ie move it to the main sheriff pool). Dirk, please keep me in the look with any actions items you see from your end. A brief doc will hopefully be distributed from our side by early next week to start discussing exactly this.
,
Aug 2 2017
sorry keep me in the loop...
,
Nov 17 2017
Dirk: is there any reply from others in #3? A P0 bug about perf builders going down just happening again today ( issue 786368 )
,
Nov 20 2017
I've gotten no objections. Pending any issues w/ the tooling (which I'm investigating now), I'll move the builders over to be tree-closers and monitored by the chromium sheriffs.
,
Nov 20 2017
How much traffic are we going to see? Have all of the sheriffs been informed?
,
Nov 20 2017
We shouldn't see much; if we do, then that's all for the better, since this is the sort of thing I want the main sheriffs worrying about, not the perf bot health sheriffs. All of the sheriffs have not been informed yet, but I will publicize this as part of making the changes.
,
Nov 20 2017
Thanks for driving this, Dirk!
,
Dec 11 2017
Issue 618544 has been merged into this issue.
,
Dec 11 2017
,
Dec 15 2017
CL: https://crrev.com/c/830830 (Not sure if I need anything else as well ...)
,
Dec 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2 commit b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2 Author: Dirk Pranke <dpranke@chromium.org> Date: Mon Dec 18 23:45:01 2017 Make chromium.perf builders main-tree-closers. This CL changes the gatekeeper configuration so that compile failures on the perf builders will close the main Chromium tree and notify the Chromium sheriffs, rather than just affecting the bot health sheriffs. R=seanmccullough@chromium.org, nednguyen@google.com BUG= 750250 Change-Id: I4e1a9d56843c7de18b2f45c5d439b8bcb2f2f8ad Reviewed-on: https://chromium-review.googlesource.com/830830 Commit-Queue: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/b1a453a9c1e0ddf97b29ad7ba6cd6bd54631a4c2/scripts/slave/gatekeeper.json
,
Dec 20 2017
I think this is done, though I haven't actually seen a compile failure on one of the bots, so I don't know if it's working or not. If it isn't, please re-open. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by nedngu...@google.com
, Jul 28 2017