Unsandboxed GPU data-gathering processes are causing layout test failures |
|||||
Issue descriptionWhile debugging bug 820996 , I noticed that sometimes layout test fail because of a gpu process that's not killed. See https://chromium-swarm.appspot.com/task?id=3e15d2241cc7a210&refresh=10&show_raw=1 as one example, with the relevant text at the bottom. For the last 200 builds of https://ci.chromium.org/buildbot/chromium.webkit/WebKit%20Win7/, I see this happening at least 6 times. There's the disable-gpu-sandbox flag which points towards the GpuDataManagerImplPrivate code as that's the only path to launch unsandboxed gpu processes as far as I can tell from code search. Assigning to Zhenyao as initial owner, please redirect as necessary. Failed to delete e:\b\swarm_slave\w\ir (4 files remaining). Maybe the test has a subprocess outliving it. Sleeping 2 seconds. Failed to delete e:\b\swarm_slave\w\ir (4 files remaining). Maybe the test has a subprocess outliving it. Sleeping 4 seconds. Failed to delete e:\b\swarm_slave\w\ir. The following files remain: - \\?\e:\b\swarm_slave\w\ir - \\?\e:\b\swarm_slave\w\ir\out - \\?\e:\b\swarm_slave\w\ir\out\Release - \\?\e:\b\swarm_slave\w\ir\out\Release\content_shell.exe Enumerating processes: - pid 8152; Handles: 2; Exe: None; Cmd: "e:\b\swarm_slave\w\ir\out\Release\content_shell.exe" --type=gpu-process --field-trial-handle=888,13561354560919848494,4039284332152666673,131072 --enable-features=OutOfBlinkCORS --disable-gpu-sandbox --disable-gpu-rasterization --disable-skia-runtime-opts --enable-logging --run-web-tests --enable-crash-reporter --crash-dumps-dir="e:\b\swarm_slave\w\ir\out\Release\crash-dumps\reports" --register-font-files="e:\b\swarm_slave\w\ir\out\Release\/test_fonts/Ahem.ttf" --gpu-preferences=KAAAAAAAAACAAwBgAQAAAAAAAAAAAGAAEAAAAAAAAAAAAAAAAAAAACgAAAAEAAAAIAAAAAAAAAAoAAAAAAAAADAAAAAAAAAAOAAAAAAAAAAQAAAAAAAAAAAAAAAKAAAAEAAAAAAAAAAAAAAACwAAABAAAAAAAAAAAQAAAAoAAAAQAAAAAAAAAAEAAAALAAAA --use-gl=swiftshader --run-web-tests --enable-crash-reporter --crash-dumps-dir="e:\b\swarm_slave\w\ir\out\Release\crash-dumps\reports" --register-font-files="e:\b\swarm_slave\w\ir\out\Release\/test_fonts/Ahem.ttf" --enable-logging --service-request-channel-token=5698221134764946719 --mojo-platform-channel-handle=2040 /prefetch:2 Terminating 1 processes: - 8152 killed *** Swarming tried multiple times to delete the run directory and failed *** *** Hard failing the task ***
,
Jun 14 2018
What surprises me is this process doesn't exit by itself, because by design it should. Regardless, on tests we don't need to launch it in the first place, so the above proposed fix is still valid.
,
Jun 14 2018
Yeah I'm surprised the process isn't getting killed. child processes generally kill themselves when they notice the browser process has died. that does depend on the IO and main thread not being blocked. could this process be doing something on the main thread that takes more than 5 seconds or so?
,
Jun 14 2018
The process is collecting Vulkan driver support and D3D12 driver support in the main thread. On a slow machine it is possible to be more than 5s. Some early Vulkan drivers are very buggy. That said, our bots should not have these drivers installed, so in theory we should just fail to launch the DLL and exit, so it's still a mystery to me.
,
Jun 14 2018
The swarming job in the first comment is on this vm: https://chromium-swarm.appspot.com/bot?id=vm87-m4&sort_stats=total%3Adesc which says it has no gpu.
,
Jun 20 2018
,
Jun 26 2018
A new command line option -disable-gpu-process-for-dx12-vulkan-info-collection will be created to skip this gpu process.
,
Jun 26 2018
Issue 856398 has been merged into this issue.
,
Jun 26 2018
I'm concerned about adding a flag to handle this situation. This subordinate GPU process used for info gathering should be more robust, and shouldn't hang indefinitely. Instead of adding a new flag, can we use the GPU watchdog to avoid hanging?
,
Jun 26 2018
Do we know that it's hanging? It sounded like it was just not joined at chrome termination. Agreed we should investigate. However I think the flag is useful, as the collection is not useful, and possibly actively randomizing/harmful when running e.g. layout tests.
,
Jun 26 2018
It does sound possible that the child processes for the GPU info collection aren't being joined properly. I doubt that they are racing with a short-lived main test harness. In Issue 856398 and in comment #1, the entire shard ran to completion (which takes minutes), and the data gathering GPU process – which should have been launched at the beginning of the shard – was still lingering. I support adding a flag to disable this info collection, but it sounds like there is an underlying bug in the product. All things considered I think we should focus on fixing that bug rather than just disabling the info collection in this case.
,
Jun 26 2018
...but, to be sure, stabilizing webkit_layout_tests on the commit queue is the most crucial thing right now, so adding and using the flag in content_shell SGTM.
,
Jun 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c92718e66ebccd305fba542190cc242be5b7f44b commit c92718e66ebccd305fba542190cc242be5b7f44b Author: Maggie Chen <magchen@chromium.org> Date: Wed Jun 27 02:25:17 2018 Create a new command line option --disable-gpu-process-for-dx12-vulkan-info-collection This new command line option is created to disable the non-sandboxed gpu process for collecting DX12/Vulkan information. Although this process only exists for a very short period of time, it can sometimes interfere with the layout test or the performance tests. With this option, those tests can run without any interfering. BUG= 852796 , 856398 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I43582a15cd2451da9081111b79adec396c9acd4f Reviewed-on: https://chromium-review.googlesource.com/1112463 Commit-Queue: Maggie Chen <magchen@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#570636} [modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/content/browser/browser_main_loop.cc [modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/content/shell/browser/layout_test/layout_test_content_browser_client.cc [modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/gpu/config/gpu_switches.cc [modify] https://crrev.com/c92718e66ebccd305fba542190cc242be5b7f44b/gpu/config/gpu_switches.h
,
Jul 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a commit f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a Author: Maggie Chen <magchen@chromium.org> Date: Mon Jul 09 22:59:36 2018 Disable non-sandboxed gpu process for layout and browser tests The comman line switch --disable-gpu-process-for-dx12-vulkan-info-collection is added to layout tests and browser test so the non-sandboxed gpu process, which is used for DX12 and Vulkan info collection and histograms, can be disabled to avoid interference. The info collection process is not needed for these tests. BUG= 852796 , 856398 Change-Id: I7fac317987996aeb4a5ea37e473468f69ef3d89a Reviewed-on: https://chromium-review.googlesource.com/1123167 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Maggie Chen <magchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#573501} [modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/public/test/browser_test_base.cc [modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/public/test/test_launcher.cc [modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/app/shell_main_delegate.cc [modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/browser/layout_test/layout_test_browser_main.cc [modify] https://crrev.com/f58fd11a72fd8ab08da30a0666be3cdf2efb8e7a/content/shell/browser/layout_test/layout_test_content_browser_client.cc
,
Jul 13
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by zmo@chromium.org
, Jun 14 2018Owner: magchen@chromium.org