WebglConformance tests don't seem to robustly detect GPU process context loss |
||
Issue descriptionIn issue 830046, a crash was seen in the WebGL conformance test runs. In particular, in one shard: test 11 is conformance2/rendering/blitframebuffer-size-overflow.html test 12 is conformance2/rendering/draw-buffers-driver-hang.html In local testing on Win/NVIDIA/OpenGL, blitframebuffer-size-overflow.html caused a GPU process crash. However, this test reports itself as passing, and the next test (draw-buffers-driver-hang.html) somehow ends up hitting a DCHECK in the GPU process, causing it to fail instead.
,
Apr 7 2018
It's difficult to detect this reliably. Detecting GPU process crashes is inherently racy. There is an attempt to detect any GPU process crashes which produced minidumps and to symbolize them into the log output here: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_integration_test.py?q=SymbolizeUnsymbolizedMinidumps&sq=package:chromium&l=193 If that mechanism would have caught this but is otherwise broken, then we should fix it. If you can think of some fairly cheap check that could be written in JavaScript and which could be run at the end of each WebGL conformance test in our harness to reliably detect whether the GPU process is still working, we could add it to the conformance harness script here and call it at the end of each test: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py For example, maybe we could try creating a WebGL context against a dummy 2x2 canvas, clear it to red, read it back and ensure it rendered red. However, this is likely to be very expensive so we would need to do some dry runs to see how much slower it makes the tests. We could also plausibly add an API to Telemetry to return the current GPU process crash count. That might actually already be available through the browser.GetSystemInfo() API used here for example: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_test_expectations.py?q=GetSystemInfo&l=118 The GetSystemInfo API is expensive, too. That's why the result of the call is cached in the test expectations. More testing needed to figure out the best solution.
,
Jul 11
Here's another case where the wrong test was detected as failing after a GPU process crash, from bug 859998 : https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/1171
,
Jul 13
|
||
►
Sign in to add a comment |
||
Comment 1 by kainino@chromium.org
, Apr 6 2018