New issue
Advanced search Search tips

Issue 920033 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows
Pri: 1
Type: Bug

Blocked on:
issue 739604

Blocking:
issue 920265



Sign in to add a comment

oes-vertex-array-object.html flaky on ANGLE's GL backend

Project Member Reported by jdarpinian@chromium.org, Jan 9

Issue description

WebGL conformance test conformance/extensions/oes-vertex-array-object.html became flaky just recently. Here are a few example failures:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13944
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13947

On the flakiness dashboard it looks like this may have started after https://chromium-review.googlesource.com/c/chromium/src/+/1389615 landed.
 
Blocking: 739604
Labels: -Type-Bug Type-Bug-Regression
Cc: geoffl...@chromium.org
Components: Internals>GPU>ANGLE
All recent failures:

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13950
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13947
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13944
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/13943

Failure mode:

[87/469] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_extensions_oes_vertex_array_object failed unexpectedly 6.0965s:
  
  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:155
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:199
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:288
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:284
      self.fail(self._WebGLTestMessages(self.tab))
    fail at .swarming_module/lib/python2.7/unittest/case.py:410
      raise self.failureException(msg)
  AssertionError: Drawing with VAO that has the color array disabled should pass
  at (25, 25) expected: 128,128,128,255 was 0,0,0,255
  FAIL Drawing with VAO that has the color array disabled should pass
  at (25, 25) expected: 128,128,128,255 was 0,0,0,255
  Drawing with VAO that has the color array disabled should pass
  at (20, 20) expected: 128,128,128,255 was 0,0,0,255
  FAIL Drawing with VAO that has the color array disabled should pass
  at (20, 20) expected: 128,128,128,255 was 0,0,0,255
  Drawing with VAO that has the color array disabled should pass
  at (15, 15) expected: 128,128,128,255 was 0,0,0,255
  FAIL Drawing with VAO that has the color array disabled should pass
  at (15, 15) expected: 128,128,128,255 was 0,0,0,255
  Drawing with VAO that has the color array disabled should pass
  at (10, 10) expected: 128,128,128,255 was 0,0,0,255
  FAIL Drawing with VAO that has the color array disabled should pass
  at (10, 10) expected: 128,128,128,255 was 0,0,0,255
  

The failures are happening in webgl_conformance_gl_passthrough_tests on both NVIDIA and Intel GPUs.

geofflang@ since these failures are intermittent I have a feeling that there's a bug in ANGLE's state management on the OpenGL backend. I actually saw a flake of this test in the tryjobs for my CL but thought it might be unrelated. How could we get to the bottom of this? I wasn't immediately able to reproduce locally, even just trying to rerun the failing shard.

Project Member

Comment 3 by bugdroid1@chromium.org, Jan 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c5b459baa54d6f183d1d4c5b50fcb388035cd15f

commit c5b459baa54d6f183d1d4c5b50fcb388035cd15f
Author: Kenneth Russell <kbr@chromium.org>
Date: Wed Jan 09 01:20:08 2019

Suppress oes-vertex-array-object on Linux/passthrough.

conformance/extensions/oes-vertex-array-object.html is occasionally
rendering incorrect results for one sub-test on this configuration.

Bug: 920033
No-Try: True
Change-Id: I6db494ed4aff0f259e76c76c31048ad5e4097c7d
Reviewed-on: https://chromium-review.googlesource.com/c/1400818
Reviewed-by: James Darpinian <jdarpinian@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#620978}
[modify] https://crrev.com/c5b459baa54d6f183d1d4c5b50fcb388035cd15f/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

Labels: -Pri-1 -Type-Bug-Regression Pri-2 Type-Bug
Summary: oes-vertex-array-object.html flaky on Linux with passthrough command decoder (was: oes-vertex-array-object.html became flaky on Linux)
Suppression landed. Downgrading to P2 bug.

Blockedon: 739604
Blocking: -739604
Turning around blocking/blocked on relationship to be able to close parent bug.

It will be difficult without a reliable repro, if it's a specific sub-section of the test that fails we may be able to analyze the code and try to make a repro or speculative fix.  It failing on Linux-only may also be a clue.
Blocking: 920265
Cc: jmad...@chromium.org
Labels: OS-Windows
Owner: geoffl...@chromium.org
Summary: oes-vertex-array-object.html flaky on ANGLE's GL backend (was: oes-vertex-array-object.html flaky on Linux with passthrough command decoder)
The test fails intermittently with:
  --use-gl=angle --use-angle=gl
with or without the passthrough command decoder, on both Windows and Linux. (NVIDIA GPUs in both, but this has been seen on the Intel bots too, so I don't think it's an NVIDIA driver bug.)

To reproduce, just load sdk/tests/conformance/extensions/oes-vertex-array-object.html and reload the browser over and over again. At some point, a few of the tests will fail. Have seen ~3 different places in the test where the failures happened. Doesn't happen with the native GL driver, nor with ANGLE's D3D backend.

Have looked through ContextGL, StateManagerGL and VertexArrayGL looking for places where dirty bits might not be being propagated correctly to the driver. In StateManagerGL::syncState, in the gl::State::DIRTY_BIT_PROGRAM_EXECUTABLE case, it looked strange that propagateProgramToVAO was only being called if program was either null, or didn't have a compute shader attached - but now I realize that's the common case.

Couldn't see any other obvious missing state syncing in VertexArrayGL::syncClientSideData or syncDrawState.

The most common failure mode is:

FAIL Drawing with VAO that has the color array disabled should pass
at (25, 25) expected: 128,128,128,255 was 0,0,0,255
FAIL Drawing with VAO that has the color array disabled should pass
at (20, 20) expected: 128,128,128,255 was 0,0,0,255
FAIL Drawing with VAO that has the color array disabled should pass
at (15, 15) expected: 128,128,128,255 was 0,0,0,255
FAIL Drawing with VAO that has the color array disabled should pass
at (10, 10) expected: 128,128,128,255 was 0,0,0,255
PASS Drawing with VAO that has the color array disabled should pass
PASS Drawing with VAO that has the color array disabled should pass

The attached version of the conformance test is cut down a bit. The failure only ever seems to happen if at least runUnboundDeleteTests() is included, and more often the more tests are included.

I'm wondering at this point if the bug could be related to the deletion of VAOs; could that leave ANGLE's context in a broken state? The check "if (mVertexArray && mVertexArray->id() == vertexArray)" in State::removeVertexArrayBinding looked a little suspicious - what if the previously-deleted VAO's name was returned from a subsequent allocation? I tried taking out handle reuse in HandleAllocator::allocate() - no change.

Geoff, Jamie, could I ask for some help tracking this down?

oes-vertex-array-object.html
25.8 KB View Download
Owner: ynovikov@chromium.org
Yuly, interested in looking at this?

Looking at the code that Ken mentioned would be good.  We should also try marking the vertex array as dirty every draw call to see if it's a synchronization edge case.

Comment 10 by kbr@chromium.org, Today (7 hours ago)

Labels: -Pri-2 Pri-1
Could we please make progress on this bug? It sounds like a general bug in ANGLE's OpenGL backend and we can't advocate to partners to pick up that backend until it's 100% reliable from our standpoint.

Thanks.

Comment 11 by jmadill@google.com, Today (6 hours ago)

I reduced this a bit further (attached).

Note that you need to actually restart the browser between trials to repro the flake. Could be a dirty bits thing. I'll try a bisect.


oes-vertex-array-object.html
25.9 KB View Download

Comment 12 by jmad...@chromium.org, Today (5 hours ago)

I went back to Wed Nov 21 2018 https://crrev.com/c/1352591 and the test was still flaking. I then was having trouble building the code because of a breaking step in copying the compiler DLL.

We might have to just diagnose the test rather than finding the culprit. But bisecting Chrome instead of ANGLE or using other tricks might work as well.

Sign in to add a comment