Issue metadata
Sign in to add a comment
|
Automatic quarantine on repeated failure |
||||||||||||||||||||||||
Issue descriptionWhen a bot has N failures in a row, throw a known good short step and if it fails too, forcibly quarantine the bot.
,
Nov 3 2017
It just occurred to me that this is trivially implementable on the bot side via bot_config.py on_after_task(). - If BOT_DIED occurred in two successive tasks, self-quarantine (this could eventually be upgraded to first occurence) - If task failure N successive tasks, self-quarantine This can be done with two global variable. The state will reset whenever bot_main is restarted: - bot_config update (new luci-config push) - bot_code update (new server push) - host rebooted This also gives an easy way to "mass reset" the state, which is a good thing. This will help with bugs like issue 766877. Removing issue 757931 as a blocker.
,
Nov 3 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/8c58dd05c19ef920ddbf7dc02211e0ab6f859f29 commit 8c58dd05c19ef920ddbf7dc02211e0ab6f859f29 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri Nov 03 20:24:30 2017
,
Nov 10 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/fdbdf8fa9895c9aa66cc989eeaf7a25fb2d54cd6 commit fdbdf8fa9895c9aa66cc989eeaf7a25fb2d54cd6 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri Nov 10 19:07:00 2017
,
Nov 6
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Aug 23 2017