builder has pending builds and idle tasks |
|||
Issue descriptionToday at around 10:45 AM PST, https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel had, according to milo, 25 idle builders and 15 pending builds. After watching the builder for a bit, I noticed that a few of the idle machines got deleted, probably by machine provider. John and I hypothesized that the leases for these machines are close to being over, so no tasks are being scheduled on them. This information isn't present in any of the UIs displaying information about the bots though, so it's very confusing for the user. Could this be displayed in Milo, or Swarming?
,
Nov 15
Ah, true. I didn't notice that. But even so, I would (maybe naively) think that tasks would be scheduled up until that point. To me, a bot being live on swarming means it can run tasks, so seeing the bot live on swarming but not running tasks looks like a bug.
,
Nov 15
It's a function of the hard_timeout. The longer the hard_timeout, the larger the "dead zone" at the end of the bot's lifetime is.
,
Nov 15
There's issue 894201 about this, but when considering things like preemptive VMs which cannot extend their lease, we may have to decide how to handle this in a non-surprising way.
,
Nov 15
The new design at go/remove-mp would avoid this issue entirely. The GCE service would decide when to reclaim a VM. From Swarming's perspective, it would always schedule the task with no concern for how long the task could run or how long is left in the VM's lifetime. |
|||
►
Sign in to add a comment |
|||
Comment 1 by jbudorick@chromium.org
, Nov 15