Multiple webgl conformance tests failing on Linux FYI Release (AMD R7 240) |
||||
Issue descriptionFailing tests: WebglConformance_conformance_canvas_draw_static_webgl_to_multiple_canvas_test WebglConformance_conformance_canvas_draw_webgl_to_canvas_test WebglConformance_conformance_rendering_draw_webgl_to_canvas_2d_repeatedly WebglConformance_conformance_textures_misc_tex_image_webgl First failing build: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/2466 Test failures seem to be due to pixels not matching which is indicative of a race. CL to use SharedImage is in the regression range: https://chromium-review.googlesource.com/c/1348974 ANGLE range are these commits, none of which look suspect: 88faa69 ES31: Add unsized array length support in SSBO by Qin Jiajia · 2 days ago a48f26f ES31: Use deepCopy to make sure that every node being used only once by Qin Jiajia · 2 days ago 3a25622 Update ANGLE_multiview validation. by Jamie Madill · 2 days ago 611bbaa Vulkan: Convert vertex attributes in compute by Shahbaz Youssefi · 2 days ago chromium/3638 I discussed a possible data race in SharedImageInterfaceProxy with ericrk@ although I think it only matters when GMBs are used (CL on the way). Sample log: Log: [98/138] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_textures_misc_tex_image_webgl failed unexpectedly 6.2201s: Traceback (most recent call last): _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:155 self.RunActualGpuTest(url, *args) RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:199 getattr(self, test_name)(test_path, *args[1:]) _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:288 self._CheckTestCompletion() _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:284 self.fail(self._WebGLTestMessages(self.tab)) fail at .swarming_module/lib/python2.7/unittest/case.py:410 raise self.failureException(msg) AssertionError: Canvas should be green at (0, 0) expected: 0,255,0,255 was 255,0,0,255 FAIL Canvas should be green at (0, 0) expected: 0,255,0,255 was 255,0,0,255 Locals: msg : u'Canvas should be green\nat (0, 0) expected: 0,255,0,255 was 255,0,0,255\nFAIL Canvas should be green\nat (0, 0) expected: 0,255,0,255 was 255,0,0,255\n'
,
Dec 13
Thanks Sunny for tracking that down. This particular configuration has an old GPU and driver and it's not currently feasible to upgrade it. Is it possible to blacklist the use of GMBs or similar on this hardware in order to fall back to a safer configuration?
,
Dec 14
We don't use GMBs on this configuration. The tests affected seem to involve copying from webgl to 2d canvas. kbr@ and I reviewed DrawingBuffer::CopyToPlatformTexture (used in texImage2D and drawImage from webgl to 2d canvas) and didn't find anything obviously wrong there. One theory is that it happens because we now allocate texture storage on the raster decoder context instead of the webgl context and these bots run with native GL contexts instead of virtualized contexts. It's possible we need explicit synchronization between the raster decoder context and webgl context.
,
Dec 14
We're going to try enabling virtualized contexts on this bot and see if the problem goes away.
,
Dec 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6a2c0a87208cdc837f960c96a9c188dfdfa9ee34 commit 6a2c0a87208cdc837f960c96a9c188dfdfa9ee34 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Fri Dec 14 00:08:01 2018 Enqueue destroy shared image message under lock Fix a potential bug where last_flush_id_ could be set out of order if two threads call DestroySharedImage. R=ericrk@chromium.org TBR=piman@chromium.org Bug: 870116, 914976 , 882591 Change-Id: I974bff506211cafdc49440306203d6523cf614e5 Reviewed-on: https://chromium-review.googlesource.com/c/1376852 Reviewed-by: Eric Karl <ericrk@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#616512} [modify] https://crrev.com/6a2c0a87208cdc837f960c96a9c188dfdfa9ee34/gpu/ipc/client/shared_image_interface_proxy.cc
,
Dec 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4876ff14f9e7a4f97a38ef7c90406d5427b51573 commit 4876ff14f9e7a4f97a38ef7c90406d5427b51573 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Fri Dec 14 02:07:28 2018 Enable virtualized contexts on Linux AMD A recent change to use SharedImage for WebGL has caused conformance tests that copy from WebGL backbuffer to 2d canvas to fail on Linux AMD bots which use native GL contexts. It is speculated that this is due to flush ordering not working and missing any other form of synchronization. SharedImages are created on the raster decoder context instead of the WebGL context which might be just enough to change context scheduling and cause these issues. Bug: 914976 Change-Id: I3c4fb1dfe4d5839a647dde5dc9ca047c6d630a7f Reviewed-on: https://chromium-review.googlesource.com/c/1377729 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#616559} [modify] https://crrev.com/4876ff14f9e7a4f97a38ef7c90406d5427b51573/gpu/config/gpu_driver_bug_list.json
,
Dec 15
Doesn't look like virtualized contexts helped :-(
,
Dec 15
Oops, it looks like different passthrough tests are now failing - so virtualized contexts might have helped after all.
,
Dec 15
The passthrough tests fail due to a timeout: https://chromium-swarm.appspot.com/task?id=41c485dbb8228d10&refresh=10&show_raw=1
,
Dec 15
Timeout is due to all tests crashing with this failure over and over again: [16281:16281:1213/235316.976271:FATAL:gl_context.cc(292)] Check failed: static_bindings_initialized_. #0 0x55df5d02ddaf base::debug::StackTrace::StackTrace() #1 0x55df5cf7329b logging::LogMessage::~LogMessage() #2 0x55df5e103257 gl::GLContext::InitializeDynamicBindings() #3 0x55df5e103162 gl::GLContext::ReinitializeDynamicBindings() #4 0x55df5e7e8864 gpu::gles2::GLES2DecoderPassthroughImpl::Initialize() #5 0x55df5e8b5102 gpu::GLES2CommandBufferStub::Initialize() #6 0x55df5e8ab676 gpu::GpuChannel::OnCreateCommandBuffer() #7 0x55df5e8aaee3 _ZN3IPC8MessageTI38GpuChannelMsg_CreateCommandBuffer_MetaNSt3__15tupleIJ28GPUCreateCommandBufferConfigiN4base24UnsafeSharedMemoryRegionEEEENS3_IJN3gpu13ContextResultENS8_12CapabilitiesEEEEE8DispatchINS8_10GpuChannelESE_vMSE_FvRKS4_iS6_PS9_PSA_EEEbPKNS_7MessageEPT_PT0_PT1_T2_ #8 0x55df5e8aaca8 gpu::GpuChannel::OnControlMessageReceived() #9 0x55df5e8ac1d9 gpu::GpuChannel::HandleMessageHelper() #10 0x55df59d4b3fd _ZN4base8internal7InvokerINS0_9BindStateIMN3net14MDnsClientImpl4CoreEFvRKNSt3__14pairINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEtEEEJNS_7WeakPtrIS5_EESE_EEEFvvEE3RunEPNS0_13BindStateBaseE #11 0x55df5cf7c929 base::debug::TaskAnnotator::RunTask() #12 0x55df5cf7bbe0 base::MessageLoopImpl::RunTask() #13 0x55df5cf7c2d2 base::MessageLoopImpl::DoWork() #14 0x55df5cf7edaf base::(anonymous namespace)::WorkSourceDispatch() #15 0x7ff2b4f5c197 g_main_context_dispatch #16 0x7ff2b4f5c3f0 <unknown> #17 0x7ff2b4f5c49c g_main_context_iteration #18 0x55df5cf7eb62 base::MessagePumpGlib::Run() #19 0x55df5cf7b6b5 base::MessageLoopImpl::Run() #20 0x55df5cfa5e96 base::RunLoop::Run() #21 0x55df61665aa4 content::GpuMain() #22 0x55df5cab1729 content::ContentMainRunnerImpl::Run() #23 0x55df5cae4230 service_manager::Main() #24 0x55df5caaf8a1 content::ContentMain() #25 0x55df599a21b3 ChromeMain #26 0x7ff2b0cbd830 __libc_start_main #27 0x55df599a202a _start
,
Dec 15
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8 commit ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Sat Dec 15 04:46:10 2018 Initialize GL static bindings before dynamic bindings Currently static bindings are initialized as side-effect of calling MakeCurrent on the native GLContext implementation. When virtualized contexts were enabled on Linux AMD, passthrough tests started failing because of uninitialized static bindings presumably because MakeCurrent is being skipped somehow due to interaction between ANGLE and GLContextEGL. Bug: 914976 Change-Id: I7366176a9f0a74e0f5a379eef3c230f0152b7310 Reviewed-on: https://chromium-review.googlesource.com/c/1378975 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#616953} [modify] https://crrev.com/ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8/ui/gl/gl_context.cc
,
Dec 17
Current failures: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29?limit=200 e.g.: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/2493 Having a hard time getting the logs from one of the shards to even load; the size must be really large. https://chromium-swarm.appspot.com/task?id=41d4f1db9733ee10&refresh=10&show_raw=1
,
Dec 17
Ok, looks like just fixing the crash wasn't enough. Maybe we should disable virtualized contexts if passthrough command decoder is used? Is there any other configuration where we run virtualized contexts with passthrough?
,
Dec 17
There are different crashes now: [28433:28433:1217/044624.464818:FATAL:gles2_cmd_decoder_passthrough.cc(1226)] Check failed: api() == gl::g_current_gl_context_tls->Get()->Api (0x2695a05e1f50 vs. 0x2695a05e1070) #0 0x5584f48e495f base::debug::StackTrace::StackTrace() #1 0x5584f482a75b logging::LogMessage::~LogMessage() #2 0x5584f60c1bdd gpu::gles2::GLES2DecoderPassthroughImpl::MakeCurrent() #3 0x5584f6184bab gpu::CommandBufferStub::MakeCurrent() #4 0x5584f6184a02 gpu::CommandBufferStub::OnMessageReceived() #5 0x5584f61839bf IPC::MessageRouter::RouteMessage() #6 0x5584f6181e11 gpu::GpuChannel::HandleMessageHelper() #7 0x5584f617f79f gpu::GpuChannel::HandleMessage() #8 0x5584f15e1a6d _ZN4base8internal7InvokerINS0_9BindStateIMN3net14MDnsClientImpl4CoreEFvRKNSt3__14pairINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEtEEEJNS_7WeakPtrIS5_EESE_EEEFvvEE3RunEPNS0_13BindStateBaseE #9 0x5584f5e10924 gpu::Scheduler::RunNextTask() #10 0x5584f15c8b74 _ZN4base8internal7InvokerINS0_9BindStateIMN3net16HostResolverImpl8ProcTaskEFvvEJNS_7WeakPtrIS5_EEEEEFvvEE7RunOnceEPNS0_13BindStateBaseE #11 0x5584f4833a39 base::debug::TaskAnnotator::RunTask() #12 0x5584f4832cf0 base::MessageLoopImpl::RunTask() #13 0x5584f48333e2 base::MessageLoopImpl::DoWork() #14 0x5584f4835ebf base::(anonymous namespace)::WorkSourceDispatch() #15 0x7f9fa94ec197 g_main_context_dispatch #16 0x7f9fa94ec3f0 <unknown> #17 0x7f9fa94ec49c g_main_context_iteration #18 0x5584f4835c72 base::MessagePumpGlib::Run() #19 0x5584f48327c5 base::MessageLoopImpl::Run() #20 0x5584f485c646 base::RunLoop::Run() #21 0x5584f8f5fc34 content::GpuMain() #22 0x5584f4360be9 content::ContentMainRunnerImpl::Run() #23 0x5584f43938e0 service_manager::Main() #24 0x5584f435ef21 content::ContentMain() #25 0x5584f123a1b3 ChromeMain #26 0x7f9fa524d830 __libc_start_main #27 0x5584f123a02a _start Another symptom of MakeCurrent() not working correctly in passthrough + virtualized context configuration.
,
Dec 17
Ah, yes, we should not use virtualized GL contexts with the passthrough command decoder. ANGLE already virtualizes GL contexts internally. Can we easily make that change to the blacklist entry?
,
Dec 17
That is an extra dimension of condition we need to add to the blacklisting.
,
Dec 18
Ah. Rather than go that route, should we change the use_virtualized_gl_contexts workaround so it is ignored if the passthrough command decoder is in use?
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/11278a3777db2c1cba694871953c24a43ee853eb commit 11278a3777db2c1cba694871953c24a43ee853eb Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Tue Dec 18 02:26:19 2018 Disable virtualized contexts with passthrough command decoder Using virtualized contexts with passthrough command decoder causes crashes during MakeCurrent() due to inconsistent state. Bug: 914976 Change-Id: I1c8398d4a9539e8165c32594897ea434d2c7b54e Reviewed-on: https://chromium-review.googlesource.com/c/1381265 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#617351} [modify] https://crrev.com/11278a3777db2c1cba694871953c24a43ee853eb/gpu/ipc/service/gles2_command_buffer_stub.cc
,
Dec 19
With build 2497 that contained the above CL onwards, only the dawn_end2end_tests fail now which are being tracked in another bug: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/2497 |
||||
►
Sign in to add a comment |
||||
Comment 1 by sunn...@chromium.org
, Dec 13