New issue
Advanced search Search tips

Issue 922451 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

swarming_bot: when the bot sends a SIGTERM to a task, discard the named cache

Project Member Reported by mar...@chromium.org, Jan 16 (6 days ago)

Issue description

A SIGTERM is sent under these conditions:
- execution hard timeout
- I/O timeout
- client request to kill a task

Actual:
If run_isolated is able to terminate properly, the named cache is kept.

Expected:
Whenever a SIGTERM is sent to the task, discard the named cache.

Rationale:
In practice, when sending a SIGTERM, we can't assume the task handles this correctly, so we're better to discard the task.
 

Comment 1 by vadimsh@chromium.org, Jan 16 (6 days ago)

Cc: jbudorick@chromium.org
As John mentioned in the chat, this may have destabilizing effect on a fleet:

1. Imaging a compilation task took slightly longer than allowed time and were killed due to timed out.
2. Its cache is deleted.
3. Next time it runs, it will take even more longer, hitting a timeout for sure.
4. Goto 2.

In another scenario, if timeouts are caused by something shared and external (e.g. Swarming outage, Goma outage), many bots will remove they caches all at once, making the recovery slow (or overloading something else).

So it's not clear when we should delete caches automatically (if ever) :(

Comment 2 by jbudorick@chromium.org, Jan 16 (6 days ago)

Is it possible to create a distinction between regular named caches and "disposable" named caches s.t. we could dispose of the latter but not the former in the event of a SIGTERM?

(e.g., we might want to dispose of the builder cache automatically, but we probably don't want to entirely dispose of the git cache, so the former would be disposable while the latter would not be)

Comment 3 by mar...@chromium.org, Jan 16 (6 days ago)

More knobs means the next noogler will be surprised and in general recipes will start to drift to become misconfigured*, so we must trade off carefully.

* Especially that this kind of misconfiguration would only take effect under exceptional circumstances.

Sign in to add a comment