Swarming bot does when cache cleanup fails. |
||
Issue descriptionWhen a Swarming bot attempts to cleanup its caches after a build, the cache cleanup may fail. Some potential causes for failure include: 1) File handles that remain open by orphaned tasks from the build (*). 2) File permission issues on Windows. An example of such a failure is here: https://chromium-swarm.appspot.com/task?id=372fc2340c856b10&refresh=10&show_raw=1&wide_logs=true Propose: 1) Determine which caches should be purged by dropping a purge manifest JSON file. 2) Continue to cleanup caches at the end of the build. If successful, delete manifest (1). 3) When Swarming server starts, if manifest (1) exists, purge caches. 4) If (2) fails for any reason, reboot the system. (*) While (1) may not be Swarming's fault, it should still handle it.
,
Jul 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/bd3cbc5ca8f6fd82345bb9073364c56bf2b73130 commit bd3cbc5ca8f6fd82345bb9073364c56bf2b73130 Author: dnj <dnj@google.com> Date: Fri Jul 07 18:16:44 2017 [run_isolated] Tolerate cache uninstall errors. If a named cache cannot be uninstalled, the Swarming bot will fail with an unfriendly code path and the task will terminate as BOT_DIED. This can happen if a zombie process lingers from a task and retains a handle to the named cache. Swarming already has code paths to handle zombie processes and task space purge errors. This patch makes it so that named cache deletion failures fall through to standard cleanup code instead of raising an exception. BUG= chromium:740109 TEST=None R=maruel@chromium.org, vadimsh@chromium.org Review-Url: https://codereview.chromium.org/2973113003 [modify] https://crrev.com/bd3cbc5ca8f6fd82345bb9073364c56bf2b73130/client/run_isolated.py
,
Jul 11 2017
This should be fixed now. |
||
►
Sign in to add a comment |
||
Comment 1 by d...@chromium.org
, Jul 7 2017