New issue
Advanced search Search tips

Issue 830051 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 859998



Sign in to add a comment

WebglConformance tests don't seem to robustly detect GPU process context loss

Project Member Reported by kainino@chromium.org, Apr 6 2018

Issue description

In issue 830046, a crash was seen in the WebGL conformance test runs.
In particular, in one shard:

test 11 is conformance2/rendering/blitframebuffer-size-overflow.html
test 12 is conformance2/rendering/draw-buffers-driver-hang.html

In local testing on Win/NVIDIA/OpenGL, blitframebuffer-size-overflow.html caused a GPU process crash.
However, this test reports itself as passing, and the next test (draw-buffers-driver-hang.html) somehow ends up hitting a DCHECK in the GPU process, causing it to fail instead.
 
Summary: WebglConformance tests don't seem to robustly detect GPU process context loss (was: WebglConformance tests don't seem to robustly detect GPU process crashes)

Comment 2 by kbr@chromium.org, Apr 7 2018

It's difficult to detect this reliably. Detecting GPU process crashes is inherently racy. There is an attempt to detect any GPU process crashes which produced minidumps and to symbolize them into the log output here:

https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_integration_test.py?q=SymbolizeUnsymbolizedMinidumps&sq=package:chromium&l=193

If that mechanism would have caught this but is otherwise broken, then we should fix it.

If you can think of some fairly cheap check that could be written in JavaScript and which could be run at the end of each WebGL conformance test in our harness to reliably detect whether the GPU process is still working, we could add it to the conformance harness script here and call it at the end of each test:

https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py

For example, maybe we could try creating a WebGL context against a dummy 2x2 canvas, clear it to red, read it back and ensure it rendered red. However, this is likely to be very expensive so we would need to do some dry runs to see how much slower it makes the tests.

We could also plausibly add an API to Telemetry to return the current GPU process crash count. That might actually already be available through the browser.GetSystemInfo() API used here for example:
https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/gpu_test_expectations.py?q=GetSystemInfo&l=118

The GetSystemInfo API is expensive, too. That's why the result of the call is cached in the test expectations.

More testing needed to figure out the best solution.

Here's another case where the wrong test was detected as failing after a GPU process crash, from  bug 859998 : https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/1171
Blockedon: 859998

Sign in to add a comment