Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of joedow@chromium.org
content_browsertests failing on chromium.win/Win 7 Tests x64 (1)
Builders failed on:
- Win 7 Tests x64 (1):
https://build.chromium.org/p/chromium.win/builders/Win%207%20Tests%20x64%20%281%29
Looking back through this builders history (for at least several days), I see numerous cases where the tests all pass but the run is marked as failed due to a running child subprocess:
Enumerating processes:
- pid 5104; Handles: 3; Exe: None; Cmd: "e:\b\s\w\ir\out\Release_x64\content_browsertests.exe" --type=gpu-process --field-trial-handle=896,12648372179796321696,3462781468841023858,131072 --disable-gl-drawing-for-tests --override-use-software-gl-for-tests --gpu-preferences=KAAAAAAAAAAAJwAAAQAAAAAAAAAAAGAAAQAAAAAAAAAIAAAAAAAAACgAAAAEAAAAIAAAAAAAAAAoAAAAAAAAADAAAAAAAAAAOAAAAAAAAAAQAAAAAAAAAAAAAAAKAAAAEAAAAAAAAAAAAAAACwAAABAAAAAAAAAAAQAAAAoAAAAQAAAAAAAAAAEAAAALAAAA --gpu-vendor-id=0xffff --gpu-device-id=0xffff --gpu-driver-vendor=swiftshader --gpu-driver-version --gpu-driver-date --ipc-connection-timeout=30 --service-request-channel-token=65091B429CEF16DF24DDAF2609B793B9 --mojo-platform-channel-handle=912 --ignored=" --type=renderer " /prefetch:2
Terminating 1 processes:
- 5104 killed
*** Swarming tried multiple times to delete the run directory and failed ***
*** Hard failing the task ***
Swarming detected that your testing script ran an executable, which may have
started a child executable, and the main script returned early, leaving the
children executables playing around unguided.
You don't want to leave children processes outliving the task on the Swarming
bot, do you? The Swarming bot doesn't.
How to fix?
- For any process that starts children processes, make sure all children
processes terminated properly before each parent process exits. This is
especially important in very deep process trees.
- This must be done properly both in normal successful task and in case of
task failure. Cleanup is very important.
- The Swarming bot sends a SIGTERM in case of timeout.
- You have 30.0 seconds to comply after the signal was sent to the process
before the process is forcibly killed.
- To achieve not leaking children processes in case of signals on timeout, you
MUST handle signals in each executable / python script and propagate them to
children processes.
- When your test script (python or binary) receives a signal like SIGTERM or
CTRL_BREAK_EVENT on Windows), send it to all children processes and wait for
them to terminate before quitting.
Comment 1 by joedow@chromium.org
, Jan 23 2018