New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 736005 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug

Blocked on:
issue 736012



Sign in to add a comment

CQ slaves are starting late -- out of buildslaves?

Project Member Reported by pprabhu@chromium.org, Jun 22 2017

Issue description

Overnight, three CQ runs took 3+ hours.

Slave timelines show that some slaves are starting late:
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15131
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15128
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15130

Seriously, this was never so easy to see before. +davidriley Kudos!

Pri-0 due to impact. Mitigation should be easy.
 
Blockedon: 736012
File go/bugatrooper while I myself also try to get into the slaves. Pri-1 for them is 1-2 hours which should happen during this CQ run.
Cc: pmalani@chromium.org kirtika@chromium.org
This is going to have an impact on the current CQ run:
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15132

It's been going for 3 hours 9 minutes.

And
lakitu-paladin has just started: https://uberchromegw.corp.google.com/i/chromeos/builders/lakitu-paladin/builds/6872 
cyan-paladin has only had 1 hour 11 minutes to run so far: https://uberchromegw.corp.google.com/i/chromeos/builders/cyan-paladin/builds/2966

I'm going to decide if it's better to abort the late starting slaves to not waste too much time.

Cc: nxia@chromium.org
Unfortunately, all 26 CLs that were picked up are in the might_submit_set.

11:13:11: INFO: will_submit set contains 0 changes: []
might_submit set contains 26 changes: [CL:*397330 CL:*397449 CL:*397528 CL:*398128 CL:*399648 CL:*399728 CL:536882 CL:536883 CL:536884 CL:536885 CL:536886 CL:536887 CL:536888 CL:536889 CL:536890 CL:540780 CL:541304 CL:543176 CL:544296 CL:544921 CL:544922 CL:544923 CL:544924 CL:544925 CL:544926 CL:544927]
will_not_submit set contains 0 changes: []

===========================

cyan-paladin has had 1:33 to run so far, and takes roughly 2:10 to finish. Of the 26 changes, 18 are relevant to that slave.

lakitu-paladin hasn't yet reached DetectRelevantChanges stage.

Plan of action is to wait for lakitu to reach DetectRelevantChanges. Then, if we find that it only affects a small number of CLs, abort it. If we find it affects lots of CLs, abort both (since lakitu can't possibly finished before the 4 hour timeout set by the master (of which 3:15 have already passed)
In the meantime, the root cause has been dealt with on issue 736012
Status: Fixed (was: Started)
I'm going to let the current master run t completion. There's reasonable reason to believe that lakitu-paladin will finish successfully with about 5 minutes to spare before the master kills it. And 18/26 CLs are relevant to it.
cyan-paladin should finish before that.

So I'm keeping my hands off the stuff.
Status: Verified (was: Fixed)
slaves are back on track: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15133

Sign in to add a comment