New issue
Advanced search Search tips

Issue 757932 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 851212
Owner: ----
Closed: Nov 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Feature



Sign in to add a comment

Automatic quarantine on repeated failure

Project Member Reported by mar...@chromium.org, Aug 22 2017

Issue description

When a bot has N failures in a row, throw a known good short step and if it fails too, forcibly quarantine the bot.
 

Comment 1 by kbr@chromium.org, Aug 23 2017

More details on situations where this has been needed for certain types of bots are in https://github.com/luci/luci-py/issues/277 .

Blockedon: -757931
Labels: -Pri-3 Pri-1
It just occurred to me that this is trivially implementable on the bot side via bot_config.py on_after_task().

- If BOT_DIED occurred in two successive tasks, self-quarantine  (this could eventually be upgraded to first occurence)
- If task failure N successive tasks, self-quarantine

This can be done with two global variable.

The state will reset whenever bot_main is restarted:
- bot_config update (new luci-config push)
- bot_code update (new server push)
- host rebooted

This also gives an easy way to "mass reset" the state, which is a good thing. This will help with bugs like issue 766877.

Removing issue 757931 as a blocker.
Project Member

Comment 3 by bugdroid1@chromium.org, Nov 3 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/8c58dd05c19ef920ddbf7dc02211e0ab6f859f29

commit 8c58dd05c19ef920ddbf7dc02211e0ab6f859f29
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Nov 03 20:24:30 2017

Project Member

Comment 4 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/fdbdf8fa9895c9aa66cc989eeaf7a25fb2d54cd6

commit fdbdf8fa9895c9aa66cc989eeaf7a25fb2d54cd6
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Nov 10 19:07:00 2017

Mergedinto: 851212
Status: Duplicate (was: Available)

Sign in to add a comment