Sometimes failure happens after the task successfully completed. Failures may still happen after the task is considered a success. Examples of post-success failures:
- Named cache cleanup; like the source of issue 812896
- Zombie processes outliving the main process; issue 808836
- Leaks, like leaking temporary files; issue 684070
This requires well definining:
- Separate completing and sealing a task; issue 813412
Which raises questions:
- Do we need to keep a window of time to "forcibly mark a task as failed" after the task was completed?
- Do we keep this "post success failure" as a secondary success bit or do we want to always wait for all cleanup before marking the task as terminated?
- Do we want to just support "exit_code==0 && internal_failure=True" ?
- Post task failure is not necessarily an internal_failure, so this looks incorrect.
This definitely need more brainstorming.
Comment 1 by mar...@chromium.org
, Feb 26 2018