Add way to quarantine a bot from "Swarming Bot Page" |
||||||||||
Issue descriptionOn the page at https://chromium-swarm.appspot.com/bot?id=vm91-m4&sort_stats=total%3Adesc I can "restart a bot". It would be also good if I could quarantine a bot using the UI. Then when I find bots which are causing failures like in https://bugs.chromium.org/p/chromium/issues/detail?id=718707 - I can just quarantine them myself. ⛆ |
|
|
,
May 8 2017
Goal is to prevent the bot from accepting any more tasks until it has been manually fixed. Shutting down a bot doesn't prevent it just starting back up again with the same misconfiguration. I believe machine provider shuts down and starts up bots regularly? A message for the quarantine would be good.
,
May 8 2017
> Shutting down a bot doesn't prevent it just starting back up again with the same misconfiguration. I believe machine provider shuts down and starts up bots regularly? I was unaware of that. I now see how that would not be what you want. Our Skia bot_config has a bit of logic that tells the bot to shut down if it gets two BOT_DIED in a row. Since we don't have MP, a human is required to fix and restart them. It does this by writing a few files to local bot disk to "remember" what happened, since Swarming tries to be stateless. I can envision at least part of a system that writes the message to ~/manual_quarantined or something. The bot can see this and know to be quarantined. However, I'm a bit sketchy on the details of being able to remove this state from the API/UI. M-A would have better ideas on that part.
,
May 8 2017
I care about the putting things into quarantine more than getting them back out at the moment. I would be happy with a CLI method for putting them back into the pool. Getting them out of the pool quickly is something anyone should be able to do, while putting them back in is something only troopers should really be doing. maruel@ - Thoughts?
,
May 9 2017
I'm fine with the idea, had filed https://github.com/luci/luci-py/issues/123 a long time ago. I had even started a branch locally. Just never made this a priority.
,
May 10 2017
,
Jun 5 2017
,
Aug 22 2017
,
May 3 2018
|
|||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by kjlubick@google.com
, May 8 2017