LUCI Scheduler Abort took 25 minutes to take effect. |
||
Issue descriptionWith the previous waterfall the team had the ability to actively halt any builder in progress, and the halt was immediate. Today one can halt the builder but one must first navigate through several layers of UI, starting in legoland, then milo, then the actual task, which seems not intuitive for new team members (this should have been surfaced in legoland as part of the swarming migration IMO), and then when one does try to halt the builder it takes over 20 minutes for it to actually happen. For example when cancelling https://chrome-swarming.appspot.com/task?id=41249bc12b882110&refresh=10&show_raw=1&wide_logs=true the cancel button was pressed around 9:15 am Pacific, and the builder did not go into a killed state enough for the master 71 builder to restart until 9:40. What can we do to get back the ability to halt builders in seconds?
,
Nov 14
,
Nov 14
I was thinking that we had to stop these like we would the waterfall, e.g. we stop all the child builds, then the master will stop itself and start the next build. If the proper procedure is to abort the master, and let the new master come through and restart any child builds, and that is reasonably fast, then maybe this bug is a non issue?
,
Nov 14
Yep, we should generally only start or abort master builds now. Starting children directly isn't supported, and will probably become more and more problematic over time. Re-running a release build for a specific board is possible, but will overwrite any existing build artifacts from previous runs, and still not really supported.
,
Nov 14
What happens if we run a production trybot against a release branch for a specific board? Will it still generate a new version like it used to, or will this break things? |
||
►
Sign in to add a comment |
||
Comment 1 by dgarr...@chromium.org
, Nov 14