New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 852796 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

Unsandboxed GPU data-gathering processes are causing layout test failures

Project Member Reported by jam@chromium.org, Jun 14 2018

Issue description

While debugging  bug 820996 , I noticed that sometimes layout test fail because of a gpu process that's not killed. See https://chromium-swarm.appspot.com/task?id=3e15d2241cc7a210&refresh=10&show_raw=1 as one example, with the relevant text at the bottom.


For the last 200 builds of https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Win7/, I see this happening at least 6 times. There's the disable-gpu-sandbox flag which points towards the GpuDataManagerImplPrivate code as that's the only path to launch unsandboxed gpu processes as far as I can tell from code search.

Assigning to Zhenyao as initial owner, please redirect as necessary.




Failed to delete e:\b\swarm_slave\w\ir (4 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 2 seconds.
Failed to delete e:\b\swarm_slave\w\ir (4 files remaining).
  Maybe the test has a subprocess outliving it.
  Sleeping 4 seconds.
Failed to delete e:\b\swarm_slave\w\ir. The following files remain:
- \\?\e:\b\swarm_slave\w\ir
- \\?\e:\b\swarm_slave\w\ir\out
- \\?\e:\b\swarm_slave\w\ir\out\Release
- \\?\e:\b\swarm_slave\w\ir\out\Release\content_shell.exe
Enumerating processes:
- pid 8152; Handles: 2; Exe: None; Cmd: "e:\b\swarm_slave\w\ir\out\Release\content_shell.exe" --type=gpu-process --field-trial-handle=888,13561354560919848494,4039284332152666673,131072 --enable-features=OutOfBlinkCORS --disable-gpu-sandbox --disable-gpu-rasterization --disable-skia-runtime-opts --enable-logging --run-web-tests --enable-crash-reporter --crash-dumps-dir="e:\b\swarm_slave\w\ir\out\Release\crash-dumps\reports" --register-font-files="e:\b\swarm_slave\w\ir\out\Release\/test_fonts/Ahem.ttf" --gpu-preferences=KAAAAAAAAACAAwBgAQAAAAAAAAAAAGAAEAAAAAAAAAAAAAAAAAAAACgAAAAEAAAAIAAAAAAAAAAoAAAAAAAAADAAAAAAAAAAOAAAAAAAAAAQAAAAAAAAAAAAAAAKAAAAEAAAAAAAAAAAAAAACwAAABAAAAAAAAAAAQAAAAoAAAAQAAAAAAAAAAEAAAALAAAA --use-gl=swiftshader --run-web-tests --enable-crash-reporter --crash-dumps-dir="e:\b\swarm_slave\w\ir\out\Release\crash-dumps\reports" --register-font-files="e:\b\swarm_slave\w\ir\out\Release\/test_fonts/Ahem.ttf" --enable-logging --service-request-channel-token=5698221134764946719 --mojo-platform-channel-handle=2040 /prefetch:2
Terminating 1 processes:
- 8152 killed
*** Swarming tried multiple times to delete the run directory and failed ***
*** Hard failing the task ***

 

Comment 1 by zmo@chromium.org, Jun 14 2018

Cc: zmo@chromium.org
Owner: magchen@chromium.org
We have two cases that may launch unsandboxed GPU process

1) about:gpu page opens and request full GPU info

I don't think this is relevant here

2) after Chrome launches for a while, launch an unsandboxed GPU process to query Vulkan/D3D12 support.

For layout tests, if they keeps running long enough (I think they do), then we will trigger this one.

Fix proposal: add a commandline switch to bypass this info collection and pass the switch to layout tests (and maybe browser tests, content browser tests, telemetry tests, etc, where Chrome doesn't relaunch for each test)

Comment 2 by zmo@chromium.org, Jun 14 2018

What surprises me is this process doesn't exit by itself, because by design it should.

Regardless, on tests we don't need to launch it in the first place, so the above proposed fix is still valid.

Comment 3 by jam@chromium.org, Jun 14 2018

Yeah I'm surprised the process isn't getting killed. child processes generally kill themselves when they notice the browser process has died. that does depend on the IO and main thread not being blocked.

could this process be doing something on the main thread that takes more than 5 seconds or so?

Comment 4 by zmo@chromium.org, Jun 14 2018

The process is collecting Vulkan driver support and D3D12 driver support in the main thread. On a slow machine it is possible to be more than 5s. Some early Vulkan drivers are very buggy. That said, our bots should not have these drivers installed, so in theory we should just fail to launch the DLL and exit, so it's still a mystery to me.

Comment 5 by jam@chromium.org, Jun 14 2018

The swarming job in the first comment is on this vm: https://chromium-swarm.appspot.com/bot?id=vm87-m4&sort_stats=total%3Adesc which says it has no gpu. 

Comment 6 by kbr@chromium.org, Jun 20 2018

Cc: -kbr@chromium.org
Components: Internals>GPU>Internals
A new command line option -disable-gpu-process-for-dx12-vulkan-info-collection will be created to skip this gpu process.

Comment 8 by kbr@chromium.org, Jun 26 2018

Cc: sugoi@chromium.org magchen@chromium.org piman@chromium.org capn@chromium.org
 Issue 856398  has been merged into this issue.

Comment 9 by kbr@chromium.org, Jun 26 2018

Cc: kbr@chromium.org
I'm concerned about adding a flag to handle this situation. This subordinate GPU process used for info gathering should be more robust, and shouldn't hang indefinitely.
Instead of adding a new flag, can we use the GPU watchdog to avoid hanging?

Comment 10 by piman@chromium.org, Jun 26 2018

Do we know that it's hanging? It sounded like it was just not joined at chrome termination. Agreed we should investigate.
However I think the flag is useful, as the collection is not useful, and possibly actively randomizing/harmful when running e.g. layout tests.

Comment 11 by kbr@chromium.org, Jun 26 2018

It does sound possible that the child processes for the GPU info collection aren't being joined properly. I doubt that they are racing with a short-lived main test harness. In  Issue 856398  and in comment #1, the entire shard ran to completion (which takes minutes), and the data gathering GPU process – which should have been launched at the beginning of the shard – was still lingering.

I support adding a flag to disable this info collection, but it sounds like there is an underlying bug in the product. All things considered I think we should focus on fixing that bug rather than just disabling the info collection in this case.

Comment 12 by kbr@chromium.org, Jun 26 2018

...but, to be sure, stabilizing webkit_layout_tests on the commit queue is the most crucial thing right now, so adding and using the flag in content_shell SGTM.

Project Member

Comment 13 by bugdroid1@chromium.org, Jun 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c92718e66ebccd305fba542190cc242be5b7f44b

commit c92718e66ebccd305fba542190cc242be5b7f44b
Author: Maggie Chen <magchen@chromium.org>
Date: Wed Jun 27 02:25:17 2018

Create a new command line option --disable-gpu-process-for-dx12-vulkan-info-collection

This new command line option is created to disable the non-sandboxed
gpu process for collecting DX12/Vulkan information. Although this process only
exists for a very short period of time, it can sometimes interfere with the
layout test or the performance tests. With this option, those tests can run
without any interfering.

BUG= 852796 , 856398 

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I43582a15cd2451da9081111b79adec396c9acd4f
Reviewed-on: https://chromium-review.googlesource.com/1112463
Commit-Queue: Maggie Chen <magchen@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#570636}
[modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/content/browser/browser_main_loop.cc
[modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/content/shell/browser/layout_test/layout_test_content_browser_client.cc
[modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/gpu/config/gpu_switches.cc
[modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/gpu/config/gpu_switches.h

Project Member

Comment 14 by bugdroid1@chromium.org, Jul 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a

commit f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a
Author: Maggie Chen <magchen@chromium.org>
Date: Mon Jul 09 22:59:36 2018

Disable non-sandboxed gpu process for layout and browser tests

The comman line switch --disable-gpu-process-for-dx12-vulkan-info-collection
is added to layout tests and browser test so the non-sandboxed gpu process,
which is used for DX12 and Vulkan info collection and histograms, can be
disabled to avoid interference. The info collection process is not needed
for these tests.

BUG= 852796 , 856398 

Change-Id: I7fac317987996aeb4a5ea37e473468f69ef3d89a
Reviewed-on: https://chromium-review.googlesource.com/1123167
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Maggie Chen <magchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#573501}
[modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/public/test/browser_test_base.cc
[modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/public/test/test_launcher.cc
[modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/app/shell_main_delegate.cc
[modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/browser/layout_test/layout_test_browser_main.cc
[modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/browser/layout_test/layout_test_content_browser_client.cc

Status: Fixed (was: Assigned)

Sign in to add a comment