New issue
Advanced search Search tips

Issue 816601 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on: View detail
issue 813412
issue 821723

Blocking:
issue 812896
issue 816602



Sign in to add a comment

Swarming: correctly surface post-task failure

Project Member Reported by mar...@chromium.org, Feb 26 2018

Issue description

Sometimes failure happens after the task successfully completed. Failures may still happen after the task is considered a success. Examples of post-success failures:
- Named cache cleanup; like the source of issue 812896 
- Zombie processes outliving the main process; issue 808836
- Leaks, like leaking temporary files;  issue 684070 

This requires well definining:
- Separate completing and sealing a task; issue 813412

Which raises questions:
- Do we need to keep a window of time to "forcibly mark a task as failed" after the task was completed?
- Do we keep this "post success failure" as a secondary success bit or do we want to always wait for all cleanup before marking the task as terminated?
- Do we want to just support "exit_code==0 && internal_failure=True" ?
- Post task failure is not necessarily an internal_failure, so this looks incorrect.

This definitely need more brainstorming.
 

Comment 1 by mar...@chromium.org, Feb 26 2018

Blocking: 816602

Comment 2 by mar...@chromium.org, Feb 26 2018

Blocking: 812896

Comment 3 by mar...@chromium.org, Mar 14 2018

Blockedon: 821723

Sign in to add a comment