New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 730969 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Swarming test runner leaves confusing error message on Linux about test leaving behind files

Project Member Reported by deadbeef@chromium.org, Jun 8 2017

Issue description

This week I've noticed that a WebRTC test, "modules_unittests", occasionally fails with this log output:

[485/485] CommonFormats/AudioProcessingTest.Formats/45 (15556 ms)
Failed to delete /b/s/w/itrLbA_n (1 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 2 seconds.
Failed to delete /b/s/w/itrLbA_n (2 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 4 seconds.
Failed to delete /b/s/w/itrLbA_n. The following files remain:
- /b/s/w/itrLbA_n
- /b/s/w/itrLbA_n
11707 2017-06-07 16:04:11.977 E: Failure with [Errno 2] No such file or directory: '/b/s/w/itrLbA_n'
Failed to delete the temp directory, thus failing the task.
This may be due to a subprocess outliving the main task
process, holding on to resources. Please fix the task so
that it releases resources and cleans up subprocesses.

Examples:
https://build.chromium.org/p/client.webrtc/builders/Linux64%20Debug/builds/13446
https://build.chromium.org/p/client.webrtc/builders/Linux%20UBSan/builds/5922

This log strikes me as odd for a few reasons:
* The directory can't be deleted even though it's empty? The only file listed "remaining" is the directory itself.
* It's also listed twice, and the second time through the loop, "files remaining" becomes 2.
* After the timeout, the actual error is revealed to be "no such file or directory"? Then why did "fs.isdir(tmp_dir)" return true earlier? When did it actually get deleted?

This test may actually be leaving behind files, but the log message doesn't provide any useful information. Looking at other bugs, such as https://bugs.chromium.org/p/chromium/issues/detail?id=724588, it appears that modules_unittests isn't the only test that can produce this behavior.

So, what could be going on? Is it possible there's a bug in run_isolated.py or file_path.py? I just don't understand what could produce those error messages.
 
Labels: -Type-Bug Type-Feature
Owner: mar...@chromium.org
Status: Assigned (was: Untriaged)
I got multiple reports since the message was toned down. Sent https://codereview.chromium.org/2924283002 to make it more actionable for users.
Project Member

Comment 2 by bugdroid1@chromium.org, Jun 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-py.git/+/70b369823ed763667b503adde008eb765d55846a

commit 70b369823ed763667b503adde008eb765d55846a
Author: maruel <maruel@chromium.org>
Date: Thu Jun 08 15:20:04 2017

Improve zombie process error message to be actionable.

Give more background information about what is happening and why this is bad, as
the previous message was not actionable at all.

R=tandrii@chromium.org
BUG= 730969 

Review-Url: https://codereview.chromium.org/2924283002

[modify] https://crrev.com/70b369823ed763667b503adde008eb765d55846a/client/run_isolated.py

Owner: deadbeef@chromium.org
Will deploy the new wording soon. Assigning back to reporter for closing or appropriate follow up.
I didn't have an issue with the "failed to delete temp directory" message. My confusion is with this part:

Failed to delete /b/s/w/itrLbA_n. The following files remain:
- /b/s/w/itrLbA_n
- /b/s/w/itrLbA_n
11707 2017-06-07 16:04:11.977 E: Failure with [Errno 2] No such file or directory: '/b/s/w/itrLbA_n'

Which is different than what happens on Windows:

Failed to delete e:\b\swarm_slave\w\ir. The following files remain:
- \\?\e:\b\swarm_slave\w\ir\out\Release
- \\?\e:\b\swarm_slave\w\ir\out
- \\?\e:\b\swarm_slave\w\ir
3848 2017-05-19 18:40:06.483 E: Failure with [Error 32] The process cannot access the file because it is being used by another process: u'\\\\?\\e:\\b\\swarm_slave\\w\\ir\\out\\Release'
My hypothesis is it can happen due to a race condition; while run_isolated enumerating files and directory and then deleting them, a zombie process could still be creating files in the tree (e.g. a log file upon shutdown).

POSIX and Windows have wildly different behaviour with the way this surfaces.
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 8 2017

Cc: mbonadei@chromium.org

Comment 8 by maruel@google.com, Jun 13 2017

Cc: deadbeef@chromium.org
Owner: mar...@chromium.org
Status: Fixed (was: Assigned)
Filed issue 732811 and issue 732808 as ideas to help with debugging. Marking this issue as closed as there's no AIs remaining.

Sign in to add a comment