Reduce Auto Kill Timeout for ChromeOS builds. |
||||
Issue descriptionCurrently, when we restart ChromeOS waterfalls, we end up manually killing builds. This is a source of pain. https://chrome-internal.googlesource.com/infradata/master-manager/+/master/desired_master_state.json By editing this file, we can adjust the auto kill timeouts down from 12 hours to something reasonable (5 - 15 minutes), and not do the killing manually any more, which should make our lives easier.
,
Apr 13 2018
It's a good question, but what I'm suggesting is to just automate what we currently do by hand. IE: What we say is "if it won't finish in the next 10-15 minutes kill it". So automatically kill anything that doesn't finish inside 10-15 minutes. No need for a person to find some button some where and click it.
,
Apr 13 2018
My concern is that there may be other cases where it comes into play.
,
Apr 13 2018
Not for us. If our CQ normally finished in 30 minutes (for example), then a timeout of 30 might make sense because it would mean a restart never killed the CQ, but all of our builds are much, much longer than we want to wait for a restart. A few minutes grace might save the occasional run, and doesn't hurt anything. But a long delay would prevent things from starting (while draining, no new builds can start), with little chance of saving a new build. 15 minutes might already be to long, since that potentially means we delay the start of a CQ/PFQ run by that long. Either way, it won't matter after these builders migrate to swarming, so this is just a temp fix to make our current broken process less painful until it dies.
,
Apr 14 2018
I found the file.
,
Apr 14 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager/+/ccd5de5ca1fb0dda253512970dde7979be86bf31 commit ccd5de5ca1fb0dda253512970dde7979be86bf31 Author: Don Garrett <dgarrett@google.com> Date: Sat Apr 14 19:36:01 2018
,
Apr 14 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by jkop@google.com
, Apr 13 2018