mus: gpu context lost errors during make current on display init |
||||||
Issue descriptionmus: gpu context lost errors during make current on display init On ToT @ #526497 running cros on linux-desktop: (1) Run chrome --mus --ash-dev-shortcuts --use-gl=egl (note: use-gl is optional, but w/o that your WM may crash) (2) Press CTRL+SHIFT+D a few times (errors are flaky) Expected: The secondary display (window) is shown and hidden without issue. Actual: Errors around losing context while making current, example below. This is also affecting my work on --mus unified mode (CTRL+SHIFT+J). [7165:7165:0102/123912.459072:ERROR:gles2_cmd_decoder.cc(16433)] Onscreen context lost via ARB/EXT_robustness. Reset status = 0x92bb [7165:7165:0102/123912.459152:FATAL:gles2_cmd_decoder.cc(16452)] Check failed: false. #0 0x7f546169706c base::debug::StackTrace::StackTrace() #1 0x7f54616bdb5c logging::LogMessage::~LogMessage() #2 0x7f545d8a2d75 gpu::gles2::GLES2DecoderImpl::CheckResetStatus() #3 0x7f545d87eaba gpu::gles2::GLES2DecoderImpl::MakeCurrent() #4 0x7f545d9696ce gpu::CommandBufferStub::MakeCurrent() #5 0x7f545d969532 gpu::CommandBufferStub::OnMessageReceived() #6 0x7f5460892fb2 IPC::MessageRouter::RouteMessage() #7 0x7f545d975541 gpu::GpuChannel::HandleMessageHelper() #8 0x7f545d9731f1 gpu::GpuChannel::HandleMessage() #9 0x7f545d97676e _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu10GpuChannelEFvRKN3IPC7MessageEEJNS_7WeakPtrIS4_EENS5_8MessageTI35GpuCommandBufferMsg_AsyncFlush_MetaNSt3__15tupleIJijbEEEvEEEEEFvvEE7RunOnceEPNS0_13BindStateBaseE #10 0x7f545d935b0f gpu::Scheduler::RunNextTask() #11 0x7f545d90ec87 _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu5gles229AsyncReadPixelsCompletedQueryEFvvEJNS_7WeakPtrIS5_EEEEEFvvEE3RunEPNS0_13BindStateBaseE #12 0x7f5461697945 base::debug::TaskAnnotator::RunTask() #13 0x7f54616c80b9 base::internal::IncomingTaskQueue::RunTask() #14 0x7f54616cbbbb base::MessageLoop::RunTask() #15 0x7f54616cbf53 base::MessageLoop::DeferOrRunPendingTask() #16 0x7f54616cc1e6 base::MessageLoop::DoWork() #17 0x7f54616ce719 base::MessagePumpLibevent::Run() #18 0x7f54616cb4b9 base::MessageLoop::Run() #19 0x7f5461701099 base::RunLoop::Run() #20 0x7f545e48a319 content::GpuMain() #21 0x7f545f33acd1 content::ContentMainRunnerImpl::Run() #22 0x7f5461bea42b service_manager::Main() #23 0x7f545f3396b4 content::ContentMain() #24 0x56388c429c66 ChromeMain #25 0x7f5454be1f45 __libc_start_main #26 0x56388c42994a _start
,
Jan 2 2018
Oh, and would be curious what the unhandled reset status at https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gles2_cmd_decoder.cc?q=gles2_cmd_decoder.cc&sq=package:chromium&l=16452 is.
,
Jan 2 2018
Which platform is that? --use-gl=egl is not a supported configuration on Linux.
,
Jan 2 2018
Also, the DCHECK is that the driver is returning an invalid value (see https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robustness.txt), so it sounds like your driver is having issues
,
Jan 2 2018
Looks like caused by https://www.khronos.org/registry/OpenGL/extensions/NV/NV_robustness_video_memory_purge.txt #define GL_PURGED_CONTEXT_RESET_NV 0x92BB
,
Jan 2 2018
The driver status is 37563 (0x92BB), but I don't see that in glext.h, do you know what it means? I just had an NVIDIA driver update, maybe that's problematic? But Kyle hit the same thing. I'm not sure how much of the logs you want, --enable-gpu-debugging is very verbose: [22014:22014:0102/133752.647670:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate [22014:22014:0102/133752.647816:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kBindTexture [22014:22014:0102/133752.647887:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kTexParameteri [22014:22014:0102/133752.647960:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kTexParameteri [22014:22014:0102/133752.648029:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate [22014:22014:0102/133752.648131:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate [22014:22014:0102/133752.648182:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kFinish [22014:22014:0102/133752.832697:ERROR:gles2_cmd_decoder.cc(16433)] Onscreen context lost via ARB/EXT_robustness. Reset status = 0x92bb [22014:22014:0102/133752.832734:FATAL:gles2_cmd_decoder.cc(16452)] Check failed: false. Unhandled driver status 37563 I can try to bisect and to build debug to try the other flag if you think that'd help. Various Mus team members have told me to use --use-gl=egl for linux-desktop Chrome OS builds when running with --mus. When I don't supply --use-gl=egl I get similar errors and sometimes it crashes my Linux window manager...
,
Jan 2 2018
Ah, I missed your earlier comment, does that shed some light on what's happening?
,
Jan 2 2018
It's odd, because we never specify (EGL|GLX)_GENERATE_RESET_ON_VIDEO_MEMORY_PURGE_NV on context creation, so the driver should not be emitting this error code. Sounds like a driver bug. We could explicitly enable it when the extension is present, and treat GL_PURGED_CONTEXT_RESET_NV just like GL_INNOCENT_CONTEXT_RESET_ARB. But either way, the driver loses the context, nothing much we can do about that... What does ctrl-shift-D do? Is it trying to change screen resolution and things?
,
Jan 2 2018
The driver update is likely what made this happen suddenly.
,
Jan 2 2018
ctrl-shift-D toggles the presence of a second display.
,
Jan 2 2018
Should I try reinstalling my driver or loading a newer/older one? I'm not actually sure how to do this on Linux/Goobuntu, I could try this: http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux
,
Jan 3 2018
How are driver bugs like this generally handled? --mus no longer implies --viz, so how come we only hit this with --mus and not classic chromeos?
,
Jan 3 2018
I have no idea what --mus does at this point so I'm not sure why it affects one and not the other.
As to how we generally handle this bug: this is linux on a non-standard config, so we generally ignore. If it does affect the product, we either blacklist altogether, or if it affects a large population we try to find a workaround to the extent that it makes sense.
Losing contexts is not the end of the world, we generally handle this pretty well. In this case, because the driver reports a value that it shouldn't we go through a NOTREACHED() path, which asserts in debug, but goes through in release. In practice I believe we would ignore the lost context, which sounds like we would end up in a bad state (i.e. not recover), so we can fix that (as suggested in #8).
I have no idea what causes the lost context in the first place though - here's what the documentation says:
The NVIDIA OpenGL driver architecture on Linux has a limitation:
resources located in video memory are not persistent across certain
events. VT switches, suspend/resume events, and mode switching
events may erase the contents of video memory. Any resource that
is located exclusively in video memory, such as framebuffer objects
(FBOs), will be lost. As the OpenGL specification makes no mention
of events where the video memory is allowed to be cleared, the
driver attempts to hide this fact from the application, but cannot
do it for all resources.
If you're doing modesetting on ctrl-shift-D (still not sure I understand what "toggles the presence of a second display" means when running on top of X11, but if this is something that crashes the WM, it sounds like it's doing much more than just creating/destroying X windows), then I guess that's why, and there's probably nothing we can do about it.
,
Jan 9 2018
,
Jan 10 2018
My WM just crashed again using ctrl-shift-j, is there some logging I can provide? ctrl-shift-D makes a new X window that ash uses as a second display.
,
Jan 11 2018
Kyle found that mus [sometimes?] destroys the XWindow before the WindowTreeHostMus/ui::Compositor. So, the problem is likely that the GPU is referencing something that's been deleted. Oshima may know why we delay WindowTreeHost destruction for possible re-use. We might need to delay the DisplaySynchronizer's ash/wm -> mus/ws display destruction notification. There are probably multiple defects packed into this issue, but I'll try to follow up there.
,
Jan 11 2018
Basically the ui::Compositors LayerTreeHost should be destroyed before XDestroyWindow() is called. Here are the lines in question: Put a log after this one: https://cs.chromium.org/chromium/src/ui/compositor/compositor.cc?l=234&rcl=a3360971dab6da9c2a28f9362dee531c1a1e2ffe Put a log before this one: https://cs.chromium.org/chromium/src/ui/platform_window/x11/x11_window_base.cc?l=60&rcl=a3360971dab6da9c2a28f9362dee531c1a1e2ffe If they're happening in the wrong order that will be problematic.
,
Jan 11 2018
We shouldn't be destroying an X window when adding a new display ... are we doing that for some reason?
,
Jan 11 2018
This happens when removing displays (on ctrl-shift-d to remove an XWindow and simulate the destruction of a display, and *probably* on device when unplugging/disabling a display). There is also existing quirky behavior that removes the displays and recreates them as part of the configuration to toggle unified and mirror mode. Those initial changes are reported from ash to mus by DisplaySynchronizer similar to removing a display.
,
Jan 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5697939468e0812830e0362a3e6c1b44c352cfc6 commit 5697939468e0812830e0362a3e6c1b44c352cfc6 Author: Mike Wasserman <msw@chromium.org> Date: Wed Jan 17 22:38:58 2018 mus: Fix gpu context lost errors and crashes on display removal Destroy AshWindowTreeHostMus's compositor on shutdown. (Mus async destroys the underlying platform window) Destroy displays on root removal, not config changes. (revert http://crrev.com/c/761764 to match cash dtor ordering) Use a stub window for Ash's virtual unified display. (matches cash to use an offscreen surface for rendering) Bug: 798538 Test: No errors/crashes removing displays with --mus, mirroring/unified works. Change-Id: I064416cfc377bf19ec9afa214baa28a75846acf8 Reviewed-on: https://chromium-review.googlesource.com/862603 Commit-Queue: Michael Wasserman <msw@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Commit-Position: refs/heads/master@{#529930} [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/display/mirror_window_controller.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/host/ash_window_tree_host_mus.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/system/night_light/night_light_controller.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/system/night_light/night_light_controller_unittest.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/display.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/display_manager.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/platform_display_default.cc [modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/window_tree_unittest.cc
,
Jan 17 2018
,
Feb 26 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by vmi...@chromium.org
, Jan 2 2018