New issue
Advanced search Search tips

Issue 804937 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: ----



Sign in to add a comment

ServiceManagerContextTest.TerminateOnServiceQuit causing content_browsertests failures

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Jan 23 2018

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of joedow@chromium.org

content_browsertests failing on chromium.win/Win 7 Tests x64 (1)

Builders failed on: 
- Win 7 Tests x64 (1): 
  https://build.chromium.org/p/chromium.win/builders/Win%207%20Tests%20x64%20%281%29

Looking back through this builders history (for at least several days), I see numerous cases where the tests all pass but the run is marked as failed due to a running child subprocess:
Enumerating processes:
- pid 5104; Handles: 3; Exe: None; Cmd: "e:\b\s\w\ir\out\Release_x64\content_browsertests.exe" --type=gpu-process --field-trial-handle=896,12648372179796321696,3462781468841023858,131072 --disable-gl-drawing-for-tests --override-use-software-gl-for-tests --gpu-preferences=KAAAAAAAAAAAJwAAAQAAAAAAAAAAAGAAAQAAAAAAAAAIAAAAAAAAACgAAAAEAAAAIAAAAAAAAAAoAAAAAAAAADAAAAAAAAAAOAAAAAAAAAAQAAAAAAAAAAAAAAAKAAAAEAAAAAAAAAAAAAAACwAAABAAAAAAAAAAAQAAAAoAAAAQAAAAAAAAAAEAAAALAAAA --gpu-vendor-id=0xffff --gpu-device-id=0xffff --gpu-driver-vendor=swiftshader --gpu-driver-version --gpu-driver-date --ipc-connection-timeout=30 --service-request-channel-token=65091B429CEF16DF24DDAF2609B793B9 --mojo-platform-channel-handle=912 --ignored=" --type=renderer " /prefetch:2
Terminating 1 processes:
- 5104 killed
*** Swarming tried multiple times to delete the run directory and failed ***
*** Hard failing the task ***
Swarming detected that your testing script ran an executable, which may have
started a child executable, and the main script returned early, leaving the
children executables playing around unguided.
You don't want to leave children processes outliving the task on the Swarming
bot, do you? The Swarming bot doesn't.
How to fix?
- For any process that starts children processes, make sure all children
  processes terminated properly before each parent process exits. This is
  especially important in very deep process trees.
  - This must be done properly both in normal successful task and in case of
    task failure. Cleanup is very important.
- The Swarming bot sends a SIGTERM in case of timeout.
  - You have 30.0 seconds to comply after the signal was sent to the process
    before the process is forcibly killed.
- To achieve not leaking children processes in case of signals on timeout, you
  MUST handle signals in each executable / python script and propagate them to
  children processes.
  - When your test script (python or binary) receives a signal like SIGTERM or
    CTRL_BREAK_EVENT on Windows), send it to all children processes and wait for
    them to terminate before quitting.

 

Comment 1 by joedow@chromium.org, Jan 23 2018

Components: Tests>Flaky UI>Browser
Labels: Test-Flaky OS-Windows

Comment 2 by joedow@chromium.org, Jan 23 2018

Labels: Sheriff-Chromium
Adding to tracking list to find an owner

Comment 3 by joedow@chromium.org, Jan 24 2018

 Issue 804938  has been merged into this issue.

Comment 4 by joedow@chromium.org, Jan 24 2018

I have found one commonality in the runs which fail.  This test is included in the retry list:
ServiceManagerContextTest.TerminateOnServiceQuit

In the successful runs, this test passes the first time.

Note that in both failure and success scenarios, there can be 4-15 tests which require a rerun to pass, however the only test which showed up in the failure scenario and not the passing scenario was this one.

Comment 5 by joedow@chromium.org, Jan 24 2018

Components: -UI>Browser Internals>Services>ServiceManager
I'm confident this test is to blame.  It looks like the test times out (i.e. the service doesn't quit) and then the swarming bot kills the prodecss and marks the whole suite as failed (even though a follow up test succeeds).

Comment 6 by joedow@chromium.org, Jan 24 2018

Summary: ServiceManagerContextTest.TerminateOnServiceQuit causing content_browsertests failures (was: flaky: content_browsertests marked as failed due to running child subprocess)

Comment 7 by joedow@chromium.org, Jan 24 2018

Labels: -Sheriff-Chromium
Removing the sheriff tag since this one has a workaround now.
Project Member

Comment 8 by bugdroid1@chromium.org, Jan 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/01a9285084115732acecede498d535ea4cfffe83

commit 01a9285084115732acecede498d535ea4cfffe83
Author: Joe Downing <joedow@chromium.org>
Date: Wed Jan 24 21:45:36 2018

Disabling ServiceManagerContextTest.TerminateOnServiceQuit for Windows

Over the last few days we've seen several cases of
{viz_}content_browsertests failing even when all of the retires pass.
The reason for the failure is that the swarming bot cannot clean up the
directory due to child process(es) still running.  The one thing all of
the failures have in common is that the initial run of
TerminateOnServiceQuit timed out.  Perhaps there is some additional
clean up the test framework needs to do in this case.

TBR=rockot@chromium.org

BUG=804937

Change-Id: I6937c038f91f7b254ab7ad2b8005c94ca064a4ee
Reviewed-on: https://chromium-review.googlesource.com/884230
Reviewed-by: Joe Downing <joedow@chromium.org>
Commit-Queue: Joe Downing <joedow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#531694}
[modify] https://crrev.com/01a9285084115732acecede498d535ea4cfffe83/content/browser/service_manager/service_manager_context_browsertest.cc

Sign in to add a comment