conformance2/textures/canvas_sub_rectangle tests flaky on Linux NVIDIA |
|||||||||||||||||
Issue descriptionTelemetry's browser_test_runner (which is the harness that runs the GPU tests) currently runs its tests via Python's unittest runner. It's being switched to typ in Issue 636153 in order to pick up many key features, including uploading to the flakiness dashboard. It looks like the switch has changed the order in which the tests run, or something similar. There are two consecutive try jobs on Linux NVIDIA attempting to switch over to typ: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel/builds/6193 https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel/builds/6179 which failed the same test: gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_r11f_g11f_b10f_rgb_half_float This is: conformance2/textures/canvas_sub_rectangle/tex-2d-r11f_g11f_b10f-rgb-half_float.html in the conformance tests repository. Full stdout attached. It looks like the tests ran in this order: [1/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_context_constants_and_properties_2 passed 2.0174s [2/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_glsl3_array_complex_indexing passed 0.2812s [3/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_glsl3_vector_dynamic_indexing passed 0.4391s [4/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_glsl3_vector_dynamic_indexing_nv_driver_bug passed 5.9751s [5/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_rendering_attrib_type_match passed 0.9397s [6/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_rendering_blitframebuffer_filter_srgb passed 0.1386s [7/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_rendering_blitframebuffer_test passed 0.1116s [8/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_samplers_samplers passed 0.0934s [9/181] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_r11f_g11f_b10f_rgb_half_float failed unexpectedly 9.0866s: Not sure whether this failure will be reproducible with the current harness, running just a subset of the tests. For the record, this is why it's important to be able to reproduce exactly how one shard executes, including the tests which ran and the order in which they ran. Will try to suppress this failure to allow the harness cutover to proceed.
,
Feb 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6c107d90f66410de1d944109e2cce930142fdd7 commit a6c107d90f66410de1d944109e2cce930142fdd7 Author: kbr <kbr@chromium.org> Date: Tue Feb 21 04:30:02 2017 Suppress two flaking or failing tests on Linux NVIDIA. conformance2/textures/canvas_sub_rectangle/ tex-2d-r11f_g11f_b10f-rgb-half_float.html : Failing after typ cutover. conformance2/textures/image_bitmap_from_image_data/ tex-2d-srgb8-rgb-unsigned_byte.html : Flaky in try runs. BUG=694354, 694359 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel TBR=zmo@chromium.org Review-Url: https://codereview.chromium.org/2703123003 Cr-Commit-Position: refs/heads/master@{#451699} [modify] https://crrev.com/a6c107d90f66410de1d944109e2cce930142fdd7/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Feb 28 2017
,
Feb 28 2017
,
Mar 17 2017
,
Apr 7 2017
,
Apr 7 2017
Observed WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb16f_rgb_half_float starting to fail in this WebGL conformance roll: https://codereview.chromium.org/2798083005/ WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb16f_rgb_half_float failed reliably: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel/builds/7546 https://chromium-swarm.appspot.com/task?id=3560121a77996b10&refresh=10&show_raw=1 [13/177] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb16f_rgb_half_float failed unexpectedly 8.2114s: Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:73 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:213 self.fail(self._WebGLTestMessages(self.tab)) fail at /usr/lib/python2.7/unittest/case.py:412 raise self.failureException(msg) AssertionError: should be 255,0,0 at (0, 0) expected: 255,0,0 was 0,0,0 FAIL should be 255,0,0 at (0, 0) expected: 255,0,0 was 0,0,0 should be 255,0,0 ... I wonder whether this might be caused by the recent enabling of EXT_color_buffer_half_float in Chrome's rendering pipeline and/or updates to CopyTextureCHROMIUM. Removing Telemetry and Infra folks from CC: list.
,
Apr 11 2017
conformance2/textures/canvas_sub_rectangle/tex-2d-rgb565-rgb-unsigned_byte.html also seen flaky here: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel/builds/7687 https://chromium-swarm.appspot.com/task?id=3573ec88a9007a10&refresh=10&show_raw=1 [7/172] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb565_rgb_unsigned_byte failed unexpectedly 9.4864s: Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:73 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:203 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:217 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:213 self.fail(self._WebGLTestMessages(self.tab)) fail at /usr/lib/python2.7/unittest/case.py:412 raise self.failureException(msg) AssertionError: should be 255,0,0 at (0, 0) expected: 255,0,0 was 0,0,0 FAIL should be 255,0,0 at (0, 0) expected: 255,0,0 was 0,0,0 should be 255,0,0 at (16, 0) expected: 255,0,0 was 0,0,0 FAIL should be 255,0,0 ... Marking flaky. I think there might be missing synchronization between the hardware-accelerated 2D canvas's OpenGL context and the consuming WebGL context.
,
Apr 12 2017
Sorry, I forgot to reference this bug in https://codereview.chromium.org/2806313004/ when marking the test flaky.
,
Apr 19 2017
,
Apr 28 2017
,
Apr 28 2017
Looking through the logs from #7 and #8 above: https://chromium-swarm.appspot.com/task?id=3560121a77996b10&refresh=10&show_raw=1 https://chromium-swarm.appspot.com/task?id=3573ec88a9007a10&refresh=10&show_raw=1 They both were caused by Issue 713127 . There may still be actually flaky failures in these tests, perhaps caused by context scheduling in the GPU process. I can imagine bugs where switching between WebGL 2.0 contexts and the compositor's "OpenGL ES 2.0" contexts in a certain order might cause the context virtualization to do the wrong thing. ANGLE's context virtualization seems to be more thorough than the command buffer's at this point and we should push to use the pass-through command buffer and MANGLE on Linux as soon as possible.
,
Apr 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/66dcadde6f847545a20911b4ae5dd641466b920b commit 66dcadde6f847545a20911b4ae5dd641466b920b Author: kbr <kbr@chromium.org> Date: Sat Apr 29 00:18:08 2017 Remove canvas_sub_rectangle failure suppressions on Linux. After more testing on top-of-tree, it seems that the fix for Issue 713127 should have addressed all of these. Remove the suppressions in order to show any remaining issues on the waterfall. BUG= 694359 , 715696 NOTRY=true CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2852623003 Cr-Commit-Position: refs/heads/master@{#468190} [modify] https://crrev.com/66dcadde6f847545a20911b4ae5dd641466b920b/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Apr 29 2017
I'll monitor these tests for flakes for a few days but if they don't show up any more will duplicate this into Issue 713127 .
,
May 1 2017
Unfortunately there are still quite a lot of flakes of these tests on the bots: https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48484 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48481 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48476 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48475 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48474 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48473 https://build.chromium.org/p/chromium.gpu.fyi/builders/Linux%20Release%20%28NVIDIA%29/builds/48470 All of these are the same root cause: the test WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb16f_rgb_float failing immediately after the test WebglConformance_conformance2_samplers_samplers ran. So far I haven't been able to reproduce this locally; it must have something to do with scheduling of contexts on the service side. Will mark these tests flaky again.
,
May 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5d2506ab023f35332c5a46417f8173f2439b7d90 commit 5d2506ab023f35332c5a46417f8173f2439b7d90 Author: kbr <kbr@chromium.org> Date: Mon May 01 22:04:43 2017 Mark conformance2/textures/canvas_sub_rectangle/* flaky on Linux NVIDIA. These tests are still flaky when virtualized GL contexts are used in the command buffer. So far no luck reproducing these failures locally. BUG= 694359 TBR=zmo@chromium.org NOTRY=true CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2853133002 Cr-Commit-Position: refs/heads/master@{#468440} [modify] https://crrev.com/5d2506ab023f35332c5a46417f8173f2439b7d90/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
May 1 2017
Ran the following up to and including WebglConformance_conformance2_textures_canvas_sub_rectangle_tex_2d_rgb16f_rgb_float multiple times: ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --show-stdout --browser=release --passthrough -v --extra-browser-args="--enable-logging=stderr --js-flags=--expose-gc" --webgl-conformance-version=2.0.1 --read-abbreviated-json-results-from=content/test/data/gpu/webgl2_conformance_tests_output.json --total-shards=15 --shard-index=10 > output.txt 2>&1 On the 9th iteration I finally saw the failure. Not sure why this is less flaky on my workstation than on the bots.
,
May 2 2017
Issue 717588 has been merged into this issue.
,
May 2 2017
Have been thinking about how this could possibly happen. It seems to me that it's likely related to context destruction and mismanagement of virtual context state in that situation. sunnyps@ mentioned that Issue 715997 sounds similar. I added a DCHECK ensuring that we don't try to use a ContextState as "prev_state" once its associated GLES2CmdDecoderImpl has been destroyed. The bug was triggered, but the assert didn't fire, so that doesn't seem to be the cause.
,
May 2 2017
It seems to me that there must be some rarely-taken code path in which the "next" GLES2Decoder starts processing commands without its context being made current.
,
May 2 2017
Doesn't seem to be that an incorrect context is current. Here's a patch and output from a failing run which didn't hit the new DCHECK in GLES2DecoderImpl::DoCommandsImpl.
,
May 3 2017
The bug appears to be this (all using virtual contexts): 1) A new command buffer is created. The previous virtual context state has some things set in it. 2) GpuCommandBufferStub::Initialize calls MakeCurrent on the GLContextVirtual. Because the GLES2Decoder isn't initialized yet, GLContext::MakeVirtuallyCurrent takes the shortcut where it doesn't call RestoreState. 3) After initialization, GpuCommandBufferStub::Initialize leaves this virtual context current. 4) In some relatively rare situations, the next work to be done will be on this exact command decoder. GLES2DecoderImpl::MakeCurrent calls down into GLContextVirtual::MakeCurrent and GLContext::MakeVirtuallyCurrent, which *again* short-circuits the state restoration because it thinks it's already current. At this point the command decoder does a *lot* of work with the real GL context state mostly set to the previous virtual context's state. The fix is to call GLContext::ReleaseCurrent at the end of GpuCommandBufferStub::Initialize if virtual GL contexts are being used. Attached is a stack trace and a patch showing the sampler object on texture unit 0 being non-null in some situations per the above command line. I found it very timing-dependent to catch this. Making GetContextState() virtual in GLStateRestorer changed the timing enough that it never triggered. It was necessary to add a GetContextStateForTesting to catch it. Also, this stack trace doesn't really prove this hypothesis, since it would be the next set of WebGL calls that would break, but I'm pretty sure this is the bug.
,
May 3 2017
Up for review: Add more strict DCHECKs around context state. https://codereview.chromium.org/2852353003 Fix bug in virtualized GL context state management upon creation. https://codereview.chromium.org/2862443002
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1d44ab0e3d0a7cd05ce1da956bb87a85a8b88174 commit 1d44ab0e3d0a7cd05ce1da956bb87a85a8b88174 Author: kbr <kbr@chromium.org> Date: Wed May 03 16:54:08 2017 Fix bug in virtualized GL context state management upon creation. Because the first MakeCurrent call against a virtual GL context leaves the real context state indeterminate (since the GLES2Decoder isn't initialized and by definition neither is the GLStateRestorer), force it to be made current again after initialization. This fixes a longstanding bug where if the next GLES2Decoder to do significant work was the same one that was just created, it would do this work with the majority of the real GL context state set to that of the previous virtual GL context. BUG= 694359 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2862443002 Cr-Commit-Position: refs/heads/master@{#469007} [modify] https://crrev.com/1d44ab0e3d0a7cd05ce1da956bb87a85a8b88174/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py [modify] https://crrev.com/1d44ab0e3d0a7cd05ce1da956bb87a85a8b88174/gpu/ipc/service/gpu_command_buffer_stub.cc [modify] https://crrev.com/1d44ab0e3d0a7cd05ce1da956bb87a85a8b88174/gpu/ipc/service/gpu_command_buffer_stub.h
,
May 3 2017
The above CL was run through the linux_optional_gpu_tests_rel tryserver multiple times and it came back green every time, so I think that it fixes the root cause of the flakiness. Closing as fixed.
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a21cb49e77bf727d692bbe8dd8eda85a9d089e93 commit a21cb49e77bf727d692bbe8dd8eda85a9d089e93 Author: kbr <kbr@chromium.org> Date: Wed May 03 21:00:28 2017 Add more strict DCHECKs around context state. Upon processing commands in the GLES2Decoder, assert that the context is current. Make virtual contexts' definition of "current" more strict. This required moving a DCHECK inside the base GLContext class. These aren't needed, but were useful during recent debugging to confirm that the bug wasn't in these areas. BUG= 694359 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2852353003 Cr-Commit-Position: refs/heads/master@{#469115} [modify] https://crrev.com/a21cb49e77bf727d692bbe8dd8eda85a9d089e93/gpu/command_buffer/service/gl_context_virtual.cc [modify] https://crrev.com/a21cb49e77bf727d692bbe8dd8eda85a9d089e93/gpu/command_buffer/service/gles2_cmd_decoder.cc [modify] https://crrev.com/a21cb49e77bf727d692bbe8dd8eda85a9d089e93/ui/gl/gl_context.cc
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/84c6c1ca98f4352c0d487e414df24a5e7ef8a4df commit 84c6c1ca98f4352c0d487e414df24a5e7ef8a4df Author: kbr <kbr@chromium.org> Date: Wed May 03 23:07:44 2017 Revert of Add more strict DCHECKs around context state. (patchset #1 id:1 of https://codereview.chromium.org/2852353003/ ) Reason for revert: Breaks an optimization needed on some Android devices for performance. Original issue's description: > Add more strict DCHECKs around context state. > > Upon processing commands in the GLES2Decoder, assert that the context > is current. > > Make virtual contexts' definition of "current" more strict. This > required moving a DCHECK inside the base GLContext class. > > These aren't needed, but were useful during recent debugging to > confirm that the bug wasn't in these areas. > > BUG= 694359 > CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel > > Review-Url: https://codereview.chromium.org/2852353003 > Cr-Commit-Position: refs/heads/master@{#469115} > Committed: https://chromium.googlesource.com/chromium/src/+/a21cb49e77bf727d692bbe8dd8eda85a9d089e93 TBR=jbauman@chromium.org,zmo@chromium.org,piman@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 694359 Review-Url: https://codereview.chromium.org/2859963002 Cr-Commit-Position: refs/heads/master@{#469185} [modify] https://crrev.com/84c6c1ca98f4352c0d487e414df24a5e7ef8a4df/gpu/command_buffer/service/gl_context_virtual.cc [modify] https://crrev.com/84c6c1ca98f4352c0d487e414df24a5e7ef8a4df/gpu/command_buffer/service/gles2_cmd_decoder.cc [modify] https://crrev.com/84c6c1ca98f4352c0d487e414df24a5e7ef8a4df/ui/gl/gl_context.cc
,
May 4 2017
,
May 16 2017
,
May 16 2017
,
May 16 2017
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by kbr@chromium.org
, Feb 21 2017