New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 832754 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Reduce Auto Kill Timeout for ChromeOS builds.

Project Member Reported by dgarr...@chromium.org, Apr 13 2018

Issue description

Currently, when we restart ChromeOS waterfalls, we end up manually killing builds. This is a source of pain.

https://chrome-internal.googlesource.com/infradata/master-manager/+/master/desired_master_state.json

By editing this file, we can adjust the auto kill timeouts down from 12 hours to something reasonable (5 - 15 minutes), and not do the killing manually any more, which should make our lives easier.
 

Comment 1 by jkop@google.com, Apr 13 2018

crrev.com/i/369469 did roughly the opposite, so checking in with them might be worthwhile.
It's a good question, but what I'm suggesting is to just automate what we currently do by hand. IE: What we say is "if it won't finish in the next 10-15 minutes kill it".

So automatically kill anything that doesn't finish inside 10-15 minutes. No need for a person to find some button some where and click it.

Comment 3 by jkop@chromium.org, Apr 13 2018

Cc: jkop@chromium.org martiniss@chromium.org
My concern is that there may be other cases where it comes into play.
Not for us.

If our CQ normally finished in 30 minutes (for example), then a timeout of 30 might make sense because it would mean a restart never killed the CQ, but all of our builds are much, much longer than we want to wait for a restart.

A few minutes grace might save the occasional run, and doesn't hurt anything. But a long delay would prevent things from starting (while draining, no new builds can start), with little chance of saving a new build.

15 minutes might already be to long, since that potentially means we delay the start of a CQ/PFQ run by that long.

Either way, it won't matter after these builders migrate to swarming, so this is just a temp fix to make our current broken process less painful until it dies.
Owner: dgarr...@chromium.org
Status: Started (was: Untriaged)
I found the file.
Project Member

Comment 6 by bugdroid1@chromium.org, Apr 14 2018

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager/+/ccd5de5ca1fb0dda253512970dde7979be86bf31

commit ccd5de5ca1fb0dda253512970dde7979be86bf31
Author: Don Garrett <dgarrett@google.com>
Date: Sat Apr 14 19:36:01 2018

Status: Fixed (was: Started)

Sign in to add a comment