Swarming test runner leaves confusing error message on Linux about test leaving behind files |
||||
Issue descriptionThis week I've noticed that a WebRTC test, "modules_unittests", occasionally fails with this log output: [485/485] CommonFormats/AudioProcessingTest.Formats/45 (15556 ms) Failed to delete /b/s/w/itrLbA_n (1 files remaining). Maybe the test has a subprocess outliving it. Sleeping 2 seconds. Failed to delete /b/s/w/itrLbA_n (2 files remaining). Maybe the test has a subprocess outliving it. Sleeping 4 seconds. Failed to delete /b/s/w/itrLbA_n. The following files remain: - /b/s/w/itrLbA_n - /b/s/w/itrLbA_n 11707 2017-06-07 16:04:11.977 E: Failure with [Errno 2] No such file or directory: '/b/s/w/itrLbA_n' Failed to delete the temp directory, thus failing the task. This may be due to a subprocess outliving the main task process, holding on to resources. Please fix the task so that it releases resources and cleans up subprocesses. Examples: https://build.chromium.org/p/client.webrtc/builders/Linux64%20Debug/builds/13446 https://build.chromium.org/p/client.webrtc/builders/Linux%20UBSan/builds/5922 This log strikes me as odd for a few reasons: * The directory can't be deleted even though it's empty? The only file listed "remaining" is the directory itself. * It's also listed twice, and the second time through the loop, "files remaining" becomes 2. * After the timeout, the actual error is revealed to be "no such file or directory"? Then why did "fs.isdir(tmp_dir)" return true earlier? When did it actually get deleted? This test may actually be leaving behind files, but the log message doesn't provide any useful information. Looking at other bugs, such as https://bugs.chromium.org/p/chromium/issues/detail?id=724588, it appears that modules_unittests isn't the only test that can produce this behavior. So, what could be going on? Is it possible there's a bug in run_isolated.py or file_path.py? I just don't understand what could produce those error messages.
,
Jun 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/70b369823ed763667b503adde008eb765d55846a commit 70b369823ed763667b503adde008eb765d55846a Author: maruel <maruel@chromium.org> Date: Thu Jun 08 15:20:04 2017 Improve zombie process error message to be actionable. Give more background information about what is happening and why this is bad, as the previous message was not actionable at all. R=tandrii@chromium.org BUG= 730969 Review-Url: https://codereview.chromium.org/2924283002 [modify] https://crrev.com/70b369823ed763667b503adde008eb765d55846a/client/run_isolated.py
,
Jun 8 2017
Will deploy the new wording soon. Assigning back to reporter for closing or appropriate follow up.
,
Jun 8 2017
I didn't have an issue with the "failed to delete temp directory" message. My confusion is with this part: Failed to delete /b/s/w/itrLbA_n. The following files remain: - /b/s/w/itrLbA_n - /b/s/w/itrLbA_n 11707 2017-06-07 16:04:11.977 E: Failure with [Errno 2] No such file or directory: '/b/s/w/itrLbA_n' Which is different than what happens on Windows: Failed to delete e:\b\swarm_slave\w\ir. The following files remain: - \\?\e:\b\swarm_slave\w\ir\out\Release - \\?\e:\b\swarm_slave\w\ir\out - \\?\e:\b\swarm_slave\w\ir 3848 2017-05-19 18:40:06.483 E: Failure with [Error 32] The process cannot access the file because it is being used by another process: u'\\\\?\\e:\\b\\swarm_slave\\w\\ir\\out\\Release'
,
Jun 8 2017
My hypothesis is it can happen due to a race condition; while run_isolated enumerating files and directory and then deleting them, a zombie process could still be creating files in the tree (e.g. a log file upon shutdown). POSIX and Windows have wildly different behaviour with the way this surfaces.
,
Jun 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/58d87da66eaf1e54b37ec09bd3fcf76bdf6fafd0 commit 58d87da66eaf1e54b37ec09bd3fcf76bdf6fafd0 Author: maruel <maruel@chromium.org> Date: Thu Jun 08 20:06:40 2017 Fix wording changed in 70b369823ed763667b. Got caught copy-pasting! R=tandrii@chromium.org BUG= 730969 Review-Url: https://codereview.chromium.org/2931913002 [modify] https://crrev.com/58d87da66eaf1e54b37ec09bd3fcf76bdf6fafd0/client/run_isolated.py
,
Jun 13 2017
,
Jun 13 2017
Filed issue 732811 and issue 732808 as ideas to help with debugging. Marking this issue as closed as there's no AIs remaining. |
||||
►
Sign in to add a comment |
||||
Comment 1 by mar...@chromium.org
, Jun 8 2017Owner: mar...@chromium.org
Status: Assigned (was: Untriaged)