New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: Bug

Blocking:
issue 882591



Sign in to add a comment
link

Issue 914976: Multiple webgl conformance tests failing on Linux FYI Release (AMD R7 240)

Reported by sunn...@chromium.org, Dec 13 Project Member

Issue description

Failing tests:
WebglConformance_conformance_canvas_draw_static_webgl_to_multiple_canvas_test
WebglConformance_conformance_canvas_draw_webgl_to_canvas_test
WebglConformance_conformance_rendering_draw_webgl_to_canvas_2d_repeatedly
WebglConformance_conformance_textures_misc_tex_image_webgl

First failing build: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/2466

Test failures seem to be due to pixels not matching which is indicative of a race.

CL to use SharedImage is in the regression range: https://chromium-review.googlesource.com/c/1348974

ANGLE range are these commits, none of which look suspect:
88faa69 ES31: Add unsized array length support in SSBO by Qin Jiajia · 2 days ago
a48f26f ES31: Use deepCopy to make sure that every node being used only once by Qin Jiajia · 2 days ago
3a25622 Update ANGLE_multiview validation. by Jamie Madill · 2 days ago
611bbaa Vulkan: Convert vertex attributes in compute by Shahbaz Youssefi · 2 days ago chromium/3638

I discussed a possible data race in SharedImageInterfaceProxy with ericrk@ although I think it only matters when GMBs are used (CL on the way).

Sample log:
Log: [98/138] gpu_tests.webgl_conformance_integration_test.WebGLConformanceIntegrationTest.WebglConformance_conformance_textures_misc_tex_image_webgl failed unexpectedly 6.2201s:
  
  Traceback (most recent call last):
    _RunGpuTest at content/test/gpu/gpu_tests/gpu_integration_test.py:155
      self.RunActualGpuTest(url, *args)
    RunActualGpuTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:199
      getattr(self, test_name)(test_path, *args[1:])
    _RunConformanceTest at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:288
      self._CheckTestCompletion()
    _CheckTestCompletion at content/test/gpu/gpu_tests/webgl_conformance_integration_test.py:284
      self.fail(self._WebGLTestMessages(self.tab))
    fail at .swarming_module/lib/python2.7/unittest/case.py:410
      raise self.failureException(msg)
  AssertionError: Canvas should be green
  at (0, 0) expected: 0,255,0,255 was 255,0,0,255
  FAIL Canvas should be green
  at (0, 0) expected: 0,255,0,255 was 255,0,0,255
  
  Locals:
    msg : u'Canvas should be green\nat (0, 0) expected: 0,255,0,255 was 255,0,0,255\nFAIL Canvas should be green\nat (0, 0) expected: 0,255,0,255 was 255,0,0,255\n'
 

Comment 1 by sunn...@chromium.org, Dec 13

Cc: kbr@chromium.org

Comment 2 by kbr@chromium.org, Dec 13

Thanks Sunny for tracking that down. This particular configuration has an old GPU and driver and it's not currently feasible to upgrade it.

Is it possible to blacklist the use of GMBs or similar on this hardware in order to fall back to a safer configuration?

Comment 3 by sunn...@chromium.org, Dec 14

We don't use GMBs on this configuration.

The tests affected seem to involve copying from webgl to 2d canvas.

kbr@ and I reviewed DrawingBuffer::CopyToPlatformTexture (used in texImage2D and drawImage from webgl to 2d canvas) and didn't find anything obviously wrong there.

One theory is that it happens because we now allocate texture storage on the raster decoder context instead of the webgl context and these bots run with native GL contexts instead of virtualized contexts.  It's possible we need explicit synchronization between the raster decoder context and webgl context.

Comment 4 by sunn...@chromium.org, Dec 14

We're going to try enabling virtualized contexts on this bot and see if the problem goes away.

Comment 5 by bugdroid1@chromium.org, Dec 14

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a2c0a87208cdc837f960c96a9c188dfdfa9ee34

commit 6a2c0a87208cdc837f960c96a9c188dfdfa9ee34
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri Dec 14 00:08:01 2018

Enqueue destroy shared image message under lock

Fix a potential bug where last_flush_id_ could be set out of order if
two threads call DestroySharedImage.

R=ericrk@chromium.org
TBR=piman@chromium.org

Bug: 870116,  914976 ,  882591 
Change-Id: I974bff506211cafdc49440306203d6523cf614e5
Reviewed-on: https://chromium-review.googlesource.com/c/1376852
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616512}
[modify] https://crrev.com/6a2c0a87208cdc837f960c96a9c188dfdfa9ee34/gpu/ipc/client/shared_image_interface_proxy.cc

Comment 6 by bugdroid1@chromium.org, Dec 14

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4876ff14f9e7a4f97a38ef7c90406d5427b51573

commit 4876ff14f9e7a4f97a38ef7c90406d5427b51573
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri Dec 14 02:07:28 2018

Enable virtualized contexts on Linux AMD

A recent change to use SharedImage for WebGL has caused conformance
tests that copy from WebGL backbuffer to 2d canvas to fail on Linux
AMD bots which use native GL contexts.

It is speculated that this is due to flush ordering not working and
missing any other form of synchronization.  SharedImages are created on
the raster decoder context instead of the WebGL context which might be
just enough to change context scheduling and cause these issues.

Bug:  914976 
Change-Id: I3c4fb1dfe4d5839a647dde5dc9ca047c6d630a7f
Reviewed-on: https://chromium-review.googlesource.com/c/1377729
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616559}
[modify] https://crrev.com/4876ff14f9e7a4f97a38ef7c90406d5427b51573/gpu/config/gpu_driver_bug_list.json

Comment 7 by sunn...@chromium.org, Dec 15

Cc: zmo@chromium.org
Labels: Hotlist-PixelWrangler
Doesn't look like virtualized contexts helped :-(

Comment 8 by sunn...@chromium.org, Dec 15

Oops, it looks like different passthrough tests are now failing - so virtualized contexts might have helped after all.

Comment 10 by sunn...@chromium.org, Dec 15

Timeout is due to all tests crashing with this failure over and over again:

[16281:16281:1213/235316.976271:FATAL:gl_context.cc(292)] Check failed: static_bindings_initialized_. 
#0 0x55df5d02ddaf base::debug::StackTrace::StackTrace()
#1 0x55df5cf7329b logging::LogMessage::~LogMessage()
#2 0x55df5e103257 gl::GLContext::InitializeDynamicBindings()
#3 0x55df5e103162 gl::GLContext::ReinitializeDynamicBindings()
#4 0x55df5e7e8864 gpu::gles2::GLES2DecoderPassthroughImpl::Initialize()
#5 0x55df5e8b5102 gpu::GLES2CommandBufferStub::Initialize()
#6 0x55df5e8ab676 gpu::GpuChannel::OnCreateCommandBuffer()
#7 0x55df5e8aaee3 _ZN3IPC8MessageTI38GpuChannelMsg_CreateCommandBuffer_MetaNSt3__15tupleIJ28GPUCreateCommandBufferConfigiN4base24UnsafeSharedMemoryRegionEEEENS3_IJN3gpu13ContextResultENS8_12CapabilitiesEEEEE8DispatchINS8_10GpuChannelESE_vMSE_FvRKS4_iS6_PS9_PSA_EEEbPKNS_7MessageEPT_PT0_PT1_T2_
#8 0x55df5e8aaca8 gpu::GpuChannel::OnControlMessageReceived()
#9 0x55df5e8ac1d9 gpu::GpuChannel::HandleMessageHelper()
#10 0x55df59d4b3fd _ZN4base8internal7InvokerINS0_9BindStateIMN3net14MDnsClientImpl4CoreEFvRKNSt3__14pairINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEtEEEJNS_7WeakPtrIS5_EESE_EEEFvvEE3RunEPNS0_13BindStateBaseE
#11 0x55df5cf7c929 base::debug::TaskAnnotator::RunTask()
#12 0x55df5cf7bbe0 base::MessageLoopImpl::RunTask()
#13 0x55df5cf7c2d2 base::MessageLoopImpl::DoWork()
#14 0x55df5cf7edaf base::(anonymous namespace)::WorkSourceDispatch()
#15 0x7ff2b4f5c197 g_main_context_dispatch
#16 0x7ff2b4f5c3f0 <unknown>
#17 0x7ff2b4f5c49c g_main_context_iteration
#18 0x55df5cf7eb62 base::MessagePumpGlib::Run()
#19 0x55df5cf7b6b5 base::MessageLoopImpl::Run()
#20 0x55df5cfa5e96 base::RunLoop::Run()
#21 0x55df61665aa4 content::GpuMain()
#22 0x55df5cab1729 content::ContentMainRunnerImpl::Run()
#23 0x55df5cae4230 service_manager::Main()
#24 0x55df5caaf8a1 content::ContentMain()
#25 0x55df599a21b3 ChromeMain
#26 0x7ff2b0cbd830 __libc_start_main
#27 0x55df599a202a _start

Comment 11 by bugdroid1@chromium.org, Dec 15

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8

commit ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Sat Dec 15 04:46:10 2018

Initialize GL static bindings before dynamic bindings

Currently static bindings are initialized as side-effect of calling
MakeCurrent on the native GLContext implementation.  When virtualized
contexts were enabled on Linux AMD, passthrough tests started failing
because of uninitialized static bindings presumably because MakeCurrent
is being skipped somehow due to interaction between ANGLE and
GLContextEGL.

Bug:  914976 
Change-Id: I7366176a9f0a74e0f5a379eef3c230f0152b7310
Reviewed-on: https://chromium-review.googlesource.com/c/1378975
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616953}
[modify] https://crrev.com/ca64cbccb12ebc8a09d25aff7a9fa4ca053497d8/ui/gl/gl_context.cc

Comment 13 by sunn...@chromium.org, Dec 17

Cc: geoffl...@chromium.org
Ok, looks like just fixing the crash wasn't enough.  Maybe we should disable  virtualized contexts if passthrough command decoder is used?  Is there any other configuration where we run virtualized contexts with passthrough?

Comment 14 by sunn...@chromium.org, Dec 17

There are different crashes now:

[28433:28433:1217/044624.464818:FATAL:gles2_cmd_decoder_passthrough.cc(1226)] Check failed: api() == gl::g_current_gl_context_tls->Get()->Api (0x2695a05e1f50 vs. 0x2695a05e1070)
#0 0x5584f48e495f base::debug::StackTrace::StackTrace()
#1 0x5584f482a75b logging::LogMessage::~LogMessage()
#2 0x5584f60c1bdd gpu::gles2::GLES2DecoderPassthroughImpl::MakeCurrent()
#3 0x5584f6184bab gpu::CommandBufferStub::MakeCurrent()
#4 0x5584f6184a02 gpu::CommandBufferStub::OnMessageReceived()
#5 0x5584f61839bf IPC::MessageRouter::RouteMessage()
#6 0x5584f6181e11 gpu::GpuChannel::HandleMessageHelper()
#7 0x5584f617f79f gpu::GpuChannel::HandleMessage()
#8 0x5584f15e1a6d _ZN4base8internal7InvokerINS0_9BindStateIMN3net14MDnsClientImpl4CoreEFvRKNSt3__14pairINS6_12basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEtEEEJNS_7WeakPtrIS5_EESE_EEEFvvEE3RunEPNS0_13BindStateBaseE
#9 0x5584f5e10924 gpu::Scheduler::RunNextTask()
#10 0x5584f15c8b74 _ZN4base8internal7InvokerINS0_9BindStateIMN3net16HostResolverImpl8ProcTaskEFvvEJNS_7WeakPtrIS5_EEEEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#11 0x5584f4833a39 base::debug::TaskAnnotator::RunTask()
#12 0x5584f4832cf0 base::MessageLoopImpl::RunTask()
#13 0x5584f48333e2 base::MessageLoopImpl::DoWork()
#14 0x5584f4835ebf base::(anonymous namespace)::WorkSourceDispatch()
#15 0x7f9fa94ec197 g_main_context_dispatch
#16 0x7f9fa94ec3f0 <unknown>
#17 0x7f9fa94ec49c g_main_context_iteration
#18 0x5584f4835c72 base::MessagePumpGlib::Run()
#19 0x5584f48327c5 base::MessageLoopImpl::Run()
#20 0x5584f485c646 base::RunLoop::Run()
#21 0x5584f8f5fc34 content::GpuMain()
#22 0x5584f4360be9 content::ContentMainRunnerImpl::Run()
#23 0x5584f43938e0 service_manager::Main()
#24 0x5584f435ef21 content::ContentMain()
#25 0x5584f123a1b3 ChromeMain
#26 0x7f9fa524d830 __libc_start_main
#27 0x5584f123a02a _start

Another symptom of MakeCurrent() not working correctly in passthrough + virtualized context configuration.

Comment 15 by kbr@chromium.org, Dec 17

Ah, yes, we should not use virtualized GL contexts with the passthrough command decoder. ANGLE already virtualizes GL contexts internally. Can we easily make that change to the blacklist entry?

Comment 16 by zmo@google.com, Dec 17

That is an extra dimension of condition we need to add to the blacklisting.

Comment 17 by kbr@chromium.org, Dec 18

Ah.

Rather than go that route, should we change the use_virtualized_gl_contexts workaround so it is ignored if the passthrough command decoder is in use?

Comment 18 by bugdroid1@chromium.org, Dec 18

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/11278a3777db2c1cba694871953c24a43ee853eb

commit 11278a3777db2c1cba694871953c24a43ee853eb
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Tue Dec 18 02:26:19 2018

Disable virtualized contexts with passthrough command decoder

Using virtualized contexts with passthrough command decoder causes
crashes during MakeCurrent() due to inconsistent state.

Bug:  914976 
Change-Id: I1c8398d4a9539e8165c32594897ea434d2c7b54e
Reviewed-on: https://chromium-review.googlesource.com/c/1381265
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617351}
[modify] https://crrev.com/11278a3777db2c1cba694871953c24a43ee853eb/gpu/ipc/service/gles2_command_buffer_stub.cc

Comment 19 by sunn...@chromium.org, Dec 19

Status: Fixed (was: Assigned)
With build 2497 that contained the above CL onwards, only the dawn_end2end_tests fail now which are being tracked in another bug: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/2497

Sign in to add a comment