Crash in in_process_command_buffer.cc breaks linux GPU Nvidia FYI bot |
|||||||
Issue descriptionOn https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20(NVIDIA) There are a number of failures: 6329 WebglConformance_conformance2_rendering_draw_with_integer_texture_base_level passthrough, regular 6330 WebglConformance_conformance2_rendering_multisampling_fragment_evaluation, passthrough only 6331 WebglConformance_conformance2_rendering_multisampling_fragment_evaluation, passthrough only. etc. backer@ maybe you can take a look? all downstream from InProcessCommandBuffer. Cause is not obvious to me. Trace: 0 libc-2.19.so + 0x36c37 rax = 0x0000000000000000 rdx = 0x0000000000000006 rcx = 0xffffffffffffffff rbx = 0x0000000000000015 rsi = 0x0000000000007308 rdi = 0x00000000000072bc rbp = 0x00007ff439d55090 rsp = 0x00007ff439d54f58 r8 = 0x000036b1e2d148d7 r9 = 0x00000000000003f8 r10 = 0x0000000000000008 r11 = 0x0000000000000202 r12 = 0x00007ff439d55938 r13 = 0x00007ff439d55928 r14 = 0x00007ff439d55930 r15 = 0x000036b1e2d148d7 rip = 0x00007ff4464f6c37 Found by: given as instruction pointer in context 1 chrome!~LogMessage [callback.h : 129 + 0x7] rbp = 0x00007ff439d55910 rsp = 0x00007ff439d550a0 rip = 0x00007ff452aea914 Found by: previous frame's frame pointer 2 chrome!viz::GetContextLostReason(gpu::error::Error, gpu::error::ContextLostReason) + 0xcd rbx = 0x00007ff439d55928 rbp = 0x00007ff439d55a60 rsp = 0x00007ff439d55920 r12 = 0x00007ff439d55ac8 r13 = 0x000036b1dff9cb90 r14 = 0x00007ff439d55aa0 r15 = 0x00007ff44de711fb rip = 0x00007ff453d87fcd Found by: call frame info 3 chrome!OnContextLost [viz_process_context_provider.cc : 261 + 0x5] rbx = 0x000036b1dfe4fe00 rbp = 0x00007ff439d55c20 rsp = 0x00007ff439d55a70 r12 = 0x00007ff439d55ac8 r13 = 0x000036b1dff9cb90 r14 = 0x00007ff439d55aa0 r15 = 0x00007ff44de711fb rip = 0x00007ff4541de879 Found by: call frame info 4 chrome!OnGpuControlLostContext [callback.h : 99 + 0x3] rbx = 0x000036b1dfe4d800 rbp = 0x00007ff439d55d70 rsp = 0x00007ff439d55c30 r12 = 0x0000000000000000 r13 = 0x00007ff4585efb3c r14 = 0x000036b1e323d770 r15 = 0x00007ff454343850 rip = 0x00007ff453d9787b Found by: call frame info 5 chrome!OnContextLost [in_process_command_buffer.cc : 731 + 0x5] rbx = 0x000036b1dfe52800 rbp = 0x00007ff439d55ec0 rsp = 0x00007ff439d55d80 r12 = 0x0000000000000000 r13 = 0x00007ff4585efb3c r14 = 0x000036b1e323d770 r15 = 0x00007ff454343850 rip = 0x00007ff45434391f Found by: call frame info 6 chrome!Run [bind_internal.h : 516 + 0x14] rbx = 0x000036b1e323d740 rbp = 0x00007ff439d56020 rsp = 0x00007ff439d55ed0 r12 = 0x0000000000000000 r13 = 0x00007ff4585efb3c r14 = 0x000036b1e323d770 r15 = 0x00007ff454343850 rip = 0x00007ff44fa38134 Found by: call frame info 7 chrome!RunTask [callback.h : 99 + 0x3] rbx = 0x00007ff4585ed358 rbp = 0x00007ff439d56230 rsp = 0x00007ff439d56030 r12 = 0x00007ff439d56410 r13 = 0x00007ff4585efb3c r14 = 0x0000000000000000 r15 = 0x000036b1df89e808 rip = 0x00007ff452af358f Found by: call frame info 8 chrome!RunTask [message_loop_impl.cc : 404 + 0xf] rbx = 0x0000000000000000 rbp = 0x00007ff439d563f0 rsp = 0x00007ff439d56240 r12 = 0x0000000000000000 r13 = 0x00007ff4585efb3c r14 = 0x000036b1df850c80 r15 = 0x00007ff439d56410 rip = 0x00007ff452af2a6f Found by: call frame info 9 chrome!DoWork [message_loop_impl.cc : 415 + 0xb] rbx = 0x000036b1df850c80 rbp = 0x00007ff439d565e0 rsp = 0x00007ff439d56400 r12 = 0x00007ff439d564e0 r13 = 0x00007ff439d56478 r14 = 0x00007ff439d56410 r15 = 0x000036b1dfd90380 rip = 0x00007ff452af3002 Found by: call frame info 10 chrome!Run [message_pump_default.cc : 39 + 0x9] rbx = 0x000036b1df850c80 rbp = 0x00007ff439d56630 rsp = 0x00007ff439d565f0 r12 = 0x000036b1dfd90370 r13 = 0x0000000000000000 r14 = 0x000036b1dfd90360 r15 = 0x000036b1dfd90380 rip = 0x00007ff452af5306 Found by: call frame info 11 chrome!Run [message_loop_impl.cc : 356 + 0x9] rbx = 0x000036b1df850c80 rbp = 0x00007ff439d56790 rsp = 0x00007ff439d56640 r12 = 0x00007ff439d56a80 r13 = 0x000036b1df807000 r14 = 0x0000000000000001 r15 = 0x000036b1df807010 rip = 0x00007ff452af2551 Found by: call frame info 12 chrome!Run [run_loop.cc : 102 + 0x9] rbx = 0x00007ff439d56a80 rbp = 0x00007ff439d568f0 rsp = 0x00007ff439d567a0 r12 = 0x00007ff439d56a80 r13 = 0x000036b1df807000 r14 = 0x00007ff439d56a98 r15 = 0x000036b1df807010 rip = 0x00007ff452b1c656 Found by: call frame info 13 chrome!Run [thread.cc : 257 + 0x8] rbx = 0x0000000000007308 rbp = 0x00007ff439d56a50 rsp = 0x00007ff439d56900 r12 = 0x00007ff439d56a80 r13 = 0x000036b1df807000 r14 = 0x00007ff439d56a80 r15 = 0x000036b1df807010 rip = 0x00007ff452b6e86a Found by: call frame info 14 chrome!ThreadMain [thread.cc : 353 + 0xd] rbx = 0x000036b1df807018 rbp = 0x00007ff439d56be0 rsp = 0x00007ff439d56a60 r12 = 0x00007ff439d56a80 r13 = 0x000036b1df807000 r14 = 0x000036b1df7d88c0 r15 = 0x000036b1df807010 rip = 0x00007ff452b6ec46 Found by: call frame info 15 chrome!ThreadFunc [platform_thread_posix.cc : 81 + 0x8] rbx = 0x000036b1dfd9fe20 rbp = 0x00007ff439d56c10 rsp = 0x00007ff439d56bf0 r12 = 0x00007ff439d57700 r13 = 0x0000000000000000 r14 = 0x000036b1df807000 r15 = 0x000036b1df7c5bd0 rip = 0x00007ff452bb5208 Found by: call frame info 16 libpthread-2.19.so + 0x8184 rbx = 0x0000000000000000 rbp = 0x0000000000000000 rsp = 0x00007ff439d56c20 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x00007ff439d579c0 r15 = 0x00007ff439d57700 rip = 0x00007ff44c38c184 Found by: call frame info 17 libc-2.19.so + 0xfa37d rsp = 0x00007ff439d56cc0 rip = 0x00007ff4465ba37d Found by: stack scanning Thread 0 0 libc-2.19.so + 0xecfdd rax = 0xfffffffffffffdfc rdx = 0x0000000000000011 rcx = 0xffffffffffffffff rbx = 0x000036b1df7c3c00 rsi = 0x0000000000000003 rdi = 0x000036b1dfd4d680 rbp = 0x0000000000000003 rsp = 0x00007ffcd0336b10 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x000036b1df852f28 r11 = 0x0000000000000293 r12 = 0x000036b1dfd4d680 r13 = 0x0000000000000011 r14 = 0x00007ff44a4dcb40 r15 = 0x0000000000000003 rip = 0x00007ff4465acfdd Found by: given as instruction pointer in context 1 libglib-2.0.so.0.4002.0 + 0x48fe4 rsp = 0x00007ffcd0336b20 rip = 0x00007ff44a4cdfe4 Found by: stack scanning 2 libglib-2.0.so.0.4002.0 + 0x490ec rsp = 0x00007ffcd0336b80 rip = 0x00007ff44a4ce0ec Found by: stack scanning 3 chrome!base::MessagePumpGlib::Run(base::MessagePump::Delegate*) + 0xd2 rsp = 0x00007ffcd0336ba0 rip = 0x00007ff452af5802 Found by: stack scanning 4 chrome!Run [message_loop_impl.cc : 356 + 0x9] rsp = 0x00007ffcd0336bf0 rip = 0x00007ff452af2551 Found by: stack scanning 5 chrome!do_free_with_callback [thread_cache.h : 201 + 0x8] rsp = 0x00007ffcd0336c20 rip = 0x00007ff44f6ec70c Found by: stack scanning 6 chrome!CalledOnValidSequence [lock.h : 55 + 0x8] rsp = 0x00007ffcd0336c50 rip = 0x00007ff452b20fe7 Found by: stack scanning 7 chrome!Unlock [lock_impl.h : 70 + 0x5] rsp = 0x00007ffcd0336c80 rip = 0x00007ff44fa1b532 Found by: stack scanning 8 chrome!CalledOnValidThread [lock.h : 55 + 0x8] rsp = 0x00007ffcd0336cd0 rip = 0x00007ff452b6ee66 Found by: stack scanning 9 chrome!DetachFromSequence [lock.h : 54 + 0x8] rsp = 0x00007ffcd0336d20 rip = 0x00007ff452b21053 Found by: stack scanning 10 chrome!Run [run_loop.cc : 102 + 0x9] rsp = 0x00007ffcd0336d50 rip = 0x00007ff452b1c656 Found by: stack scanning 11 chrome!operator new [allocator_shim.cc : 159 + 0xa] rsp = 0x00007ffcd0336d70 rip = 0x00007ff452bb5b2e Found by: stack scanning 12 chrome!AddAsyncEnabledStateObserver [lock.h : 55 + 0x8] rsp = 0x00007ffcd0336de0 rip = 0x00007ff452b89d06 Found by: stack scanning 13 chrome!OnMessageLoopStarted [tracing_sampler_profiler.cc : 184 + 0x9] rsp = 0x00007ffcd0336e50 rip = 0x00007ff45458e0d0 Found by: stack scanning 14 chrome!GpuMain [gpu_main.cc : 356 + 0x5] rsp = 0x00007ffcd0336eb0 rip = 0x00007ff45716fa7e Found by: stack scanning 15 chrome!do_free_with_callback [thread_cache.h : 201 + 0x8] rsp = 0x00007ffcd0336ed0 rip = 0x00007ff44f6ec70c Found by: stack scanning 16 chrome!GetBuildTime [time.h : 656 + 0x15] rsp = 0x00007ffcd0336ef0 rip = 0x00007ff452accba9 Found by: stack scanning 17 .org.chromium.Chromium.ch4N8Z (deleted) + 0x7a00 rsp = 0x00007ffcd0336fe0 rip = 0x00007ff44c9ada00 Found by: stack scanning
,
Nov 15
I think I got the test running, but I can't repro the crash: $ content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=exact --browser-executable=$(pwd)/out/linux_rel/chrome --test-filter=rendering_blitframebuffer_size_overflow --webgl-conformance-version=2 --show-stdout (WARNING) 2018-11-15 10:40:52,312 desktop_browser_finder.FindAllAvailableBrowsers:278 Chrome build location for linux_x86_64 not found. Browser will be run without Flash. DevTools listening on ws://127.0.0.1:45335/devtools/browser/39b2b76a-4dbd-417d-8515-c61375cadebb (ERROR) 2018-11-15 10:40:54,329 linux_platform_backend._GetOSVersion:32 Unrecognizable OS version: None. Will fallback to 0.0 [0/1] gpu_tests.webgl_conformance_integration_test.WebGLConforman...lConformance_conformance2_rendering_blitframebuffer_size_overflow[175242:175242:1115/104057.059748:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.062206:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.064188:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.066147:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.068114:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.070022:ERROR:gles2_cmd_decoder.cc(8772)] [.WebGL-0x2a9776e61a00]GL ERROR :GL_INVALID_VALUE : glBlitFramebufferCHROMIUM: the width or height of src or dst region overflowed [175242:175242:1115/104057.107803:ERROR:raster_decoder.cc(1390)] RasterDecoder context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTEXT_RESET_KHR [175242:175242:1115/104057.107877:ERROR:raster_decoder.cc(1109)] RasterDecoderImpl: Context reset detected after MakeCurrent. [175242:175242:1115/104057.107907:ERROR:command_buffer_stub.cc(345)] Context lost because MakeCurrent failed. [175242:175242:1115/104057.107950:ERROR:gpu_channel_manager.cc(217)] Exiting GPU process because some drivers cannot recover from problems. [175242:175242:1115/104057.108083:ERROR:gpu_channel_manager.cc(217)] Exiting GPU process because some drivers cannot recover from problems. [175242:175242:1115/104057.193494:ERROR:shared_image_stub.cc(191)] SharedImageStub: context already lost [175205:175205:1115/104057.240125:ERROR:command_buffer_proxy_impl.cc(123)] ContextResult::kTransientFailure: Failed to send GpuChannelMsg_CreateCommandBuffer. [175205:175205:1115/104057.240168:ERROR:context_provider_command_buffer.cc(141)] GpuChannelHost failed to create command buffer. [1:1:1115/104057.240151:ERROR:command_buffer_proxy_impl.cc(104)] ContextResult::kTransientFailure: Shared memory region is not valid [1:1:1115/104057.240314:ERROR:context_provider_command_buffer.cc(141)] GpuChannelHost failed to create command buffer.
,
Nov 15
I did repro and it's a race. We get a callback from here on the "client side" of InProcCmdBuffer (run on display compositor thread): https://cs.chromium.org/chromium/src/components/viz/service/display_embedder/viz_process_context_provider.cc?rcl=b731a1c8effd445c10a1daaf9ee2aac434a4730d&l=105 But we read the error status from the server side here (running on GPU main): https://cs.chromium.org/chromium/src/components/viz/service/display_embedder/viz_process_context_provider.cc?rcl=0282b7ca5cb57f9af303368f858d9a79fe5bf935&l=259 The read from GPU main is thread safe because it's behind a lock, but I believe we're racing between the reading and writing of the error value.
,
Nov 15
,
Nov 15
WebglConformance_conformance2_rendering_blitframebuffer_size_overflow is a flaky test which has been causing these context lost on Linux NVIDIA (bug 830046). It has been skipped before and then later re-enabled. We're going to disable it again, so the crash you're seeing here may stop happening.
,
Nov 15
Up to you. I have a fix in the CQ for the race. It would land, but tree is closed. https://chromium-review.googlesource.com/c/chromium/src/+/1338160
,
Nov 15
Since you have a fix already (nice, thanks!), we may not need to Skip the test. (Although I think we will still need to mark it as Fail)
,
Nov 15
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e6f9a0890732a07a371c4521db9c6976cca4ee07 commit e6f9a0890732a07a371c4521db9c6976cca4ee07 Author: Jonathan Backer <backer@chromium.org> Date: Thu Nov 15 19:04:55 2018 Fix race storing error on context lost Bug: 905511 Change-Id: I1d37cb900749085384c6abcb2e0b9d8224f0a190 Reviewed-on: https://chromium-review.googlesource.com/c/1338160 Reviewed-by: Jonathan Backer <backer@chromium.org> Reviewed-by: Robert Kroeger <rjkroege@chromium.org> Commit-Queue: Jonathan Backer <backer@chromium.org> Cr-Commit-Position: refs/heads/master@{#608463} [modify] https://crrev.com/e6f9a0890732a07a371c4521db9c6976cca4ee07/gpu/ipc/in_process_command_buffer.cc
,
Nov 15
,
Nov 15
Thanks rjkroege@ for reporting this and backer@ for your quick fix!
,
Nov 16
,
Nov 21
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by backer@chromium.org
, Nov 15