New issue
Advanced search Search tips

Issue 798538 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Chrome
Pri: 2
Type: Bug

Blocking:
issue 731255



Sign in to add a comment

mus: gpu context lost errors during make current on display init

Project Member Reported by msw@chromium.org, Jan 2 2018

Issue description

mus: gpu context lost errors during make current on display init

On ToT @ #526497 running cros on linux-desktop:
(1) Run chrome --mus --ash-dev-shortcuts --use-gl=egl (note: use-gl is optional, but w/o that your WM may crash)
(2) Press CTRL+SHIFT+D a few times (errors are flaky)
Expected: The secondary display (window) is shown and hidden without issue.
Actual: Errors around losing context while making current, example below.

This is also affecting my work on --mus unified mode (CTRL+SHIFT+J).

[7165:7165:0102/123912.459072:ERROR:gles2_cmd_decoder.cc(16433)] Onscreen context lost via ARB/EXT_robustness. Reset status = 0x92bb
[7165:7165:0102/123912.459152:FATAL:gles2_cmd_decoder.cc(16452)] Check failed: false. 
#0 0x7f546169706c base::debug::StackTrace::StackTrace()
#1 0x7f54616bdb5c logging::LogMessage::~LogMessage()
#2 0x7f545d8a2d75 gpu::gles2::GLES2DecoderImpl::CheckResetStatus()
#3 0x7f545d87eaba gpu::gles2::GLES2DecoderImpl::MakeCurrent()
#4 0x7f545d9696ce gpu::CommandBufferStub::MakeCurrent()
#5 0x7f545d969532 gpu::CommandBufferStub::OnMessageReceived()
#6 0x7f5460892fb2 IPC::MessageRouter::RouteMessage()
#7 0x7f545d975541 gpu::GpuChannel::HandleMessageHelper()
#8 0x7f545d9731f1 gpu::GpuChannel::HandleMessage()
#9 0x7f545d97676e _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu10GpuChannelEFvRKN3IPC7MessageEEJNS_7WeakPtrIS4_EENS5_8MessageTI35GpuCommandBufferMsg_AsyncFlush_MetaNSt3__15tupleIJijbEEEvEEEEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#10 0x7f545d935b0f gpu::Scheduler::RunNextTask()
#11 0x7f545d90ec87 _ZN4base8internal7InvokerINS0_9BindStateIMN3gpu5gles229AsyncReadPixelsCompletedQueryEFvvEJNS_7WeakPtrIS5_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#12 0x7f5461697945 base::debug::TaskAnnotator::RunTask()
#13 0x7f54616c80b9 base::internal::IncomingTaskQueue::RunTask()
#14 0x7f54616cbbbb base::MessageLoop::RunTask()
#15 0x7f54616cbf53 base::MessageLoop::DeferOrRunPendingTask()
#16 0x7f54616cc1e6 base::MessageLoop::DoWork()
#17 0x7f54616ce719 base::MessagePumpLibevent::Run()
#18 0x7f54616cb4b9 base::MessageLoop::Run()
#19 0x7f5461701099 base::RunLoop::Run()
#20 0x7f545e48a319 content::GpuMain()
#21 0x7f545f33acd1 content::ContentMainRunnerImpl::Run()
#22 0x7f5461bea42b service_manager::Main()
#23 0x7f545f3396b4 content::ContentMain()
#24 0x56388c429c66 ChromeMain
#25 0x7f5454be1f45 __libc_start_main
#26 0x56388c42994a _start
 
Cc: piman@chromium.org zmo@chromium.org
There's not too much to go on from the stack, though it's interesting that we're scheduling decoding inside a 'AsyncReadPixelsCompletedQuery' stack.

Could you try logging more with instructions at: https://www.chromium.org/developers/how-tos/debugging-gpu-related-code  i.e --enable-gpu-debugging --enable-gpu-service-logging (latter works in debug builds only).

Do you think you could bisect the issue?

Comment 3 by piman@chromium.org, Jan 2 2018

Which platform is that? --use-gl=egl is not a supported configuration on Linux.

Comment 4 by piman@chromium.org, Jan 2 2018

Also, the DCHECK is that the driver is returning an invalid value (see https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robustness.txt), so it sounds like your driver is having issues

Comment 5 by piman@chromium.org, Jan 2 2018

Looks like caused by https://www.khronos.org/registry/OpenGL/extensions/NV/NV_robustness_video_memory_purge.txt

#define GL_PURGED_CONTEXT_RESET_NV        0x92BB

Comment 6 by msw@chromium.org, Jan 2 2018

The driver status is 37563 (0x92BB), but I don't see that in glext.h, do you know what it means?
I just had an NVIDIA driver update, maybe that's problematic? But Kyle hit the same thing.

I'm not sure how much of the logs you want, --enable-gpu-debugging is very verbose:

[22014:22014:0102/133752.647670:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate
[22014:22014:0102/133752.647816:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kBindTexture
[22014:22014:0102/133752.647887:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kTexParameteri
[22014:22014:0102/133752.647960:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kTexParameteri
[22014:22014:0102/133752.648029:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate
[22014:22014:0102/133752.648131:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kDeleteTexturesImmediate
[22014:22014:0102/133752.648182:ERROR:gles2_cmd_decoder.cc(5504)] [.DisplayCompositor-0x2a3ada5afa00]cmd: kFinish
[22014:22014:0102/133752.832697:ERROR:gles2_cmd_decoder.cc(16433)] Onscreen context lost via ARB/EXT_robustness. Reset status = 0x92bb
[22014:22014:0102/133752.832734:FATAL:gles2_cmd_decoder.cc(16452)] Check failed: false. Unhandled driver status 37563

I can try to bisect and to build debug to try the other flag if you think that'd help.

Various Mus team members have told me to use --use-gl=egl for linux-desktop Chrome OS builds when running with --mus.
When I don't supply --use-gl=egl I get similar errors and sometimes it crashes my Linux window manager...

Comment 7 by msw@chromium.org, Jan 2 2018

Ah, I missed your earlier comment, does that shed some light on what's happening?

Comment 8 by piman@chromium.org, Jan 2 2018

It's odd, because we never specify (EGL|GLX)_GENERATE_RESET_ON_VIDEO_MEMORY_PURGE_NV on context creation, so the driver should not be emitting this error code. Sounds like a driver bug.

We could explicitly enable it when the extension is present, and treat GL_PURGED_CONTEXT_RESET_NV just like GL_INNOCENT_CONTEXT_RESET_ARB.

But either way, the driver loses the context, nothing much we can do about that...


What does ctrl-shift-D do? Is it trying to change screen resolution and things?

Comment 9 by piman@chromium.org, Jan 2 2018

The driver update is likely what made this happen suddenly.

Comment 10 by msw@chromium.org, Jan 2 2018

ctrl-shift-D toggles the presence of a second display.

Comment 11 by msw@chromium.org, Jan 2 2018

Should I try reinstalling my driver or loading a newer/older one?
I'm not actually sure how to do this on Linux/Goobuntu, I could try this:
http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux

Comment 12 by sky@chromium.org, Jan 3 2018

Blocking: 731255
How are driver bugs like this generally handled? --mus no longer implies --viz, so how come we only hit this with --mus and not classic chromeos?
I have no idea what --mus does at this point so I'm not sure why it affects one and not the other.

As to how we generally handle this bug: this is linux on a non-standard config, so we generally ignore. If it does affect the product, we either blacklist altogether, or if it affects a large population we try to find a workaround to the extent that it makes sense.
Losing contexts is not the end of the world, we generally handle this pretty well. In this case, because the driver reports a value that it shouldn't we go through a NOTREACHED() path, which asserts in debug, but goes through in release. In practice I believe we would ignore the lost context, which sounds like we would end up in a bad state (i.e. not recover), so we can fix that (as suggested in #8).

I have no idea what causes the lost context in the first place though - here's what the documentation says:

    The NVIDIA OpenGL driver architecture on Linux has a limitation:
    resources located in video memory are not persistent across certain
    events. VT switches, suspend/resume events, and mode switching
    events may erase the contents of video memory. Any resource that
    is located exclusively in video memory, such as framebuffer objects
    (FBOs), will be lost. As the OpenGL specification makes no mention
    of events where the video memory is allowed to be cleared, the
    driver attempts to hide this fact from the application, but cannot
    do it for all resources.

If you're doing modesetting on ctrl-shift-D (still not sure I understand what "toggles the presence of a second display" means when running on top of X11, but if this is something that crashes the WM, it sounds like it's doing much more than just creating/destroying X windows), then I guess that's why, and there's probably nothing we can do about it.
Components: -Internals>GPU Internals>GPU>Internals
Labels: -Pri-1 OS-Linux Pri-2
Status: Available (was: Untriaged)

Comment 15 by msw@chromium.org, Jan 10 2018

My WM just crashed again using ctrl-shift-j, is there some logging I can provide?
ctrl-shift-D makes a new X window that ash uses as a second display.

Comment 16 by msw@chromium.org, Jan 11 2018

Cc: osh...@chromium.org
Owner: msw@chromium.org
Status: Assigned (was: Available)
Kyle found that mus [sometimes?] destroys the XWindow before the WindowTreeHostMus/ui::Compositor.
So, the problem is likely that the GPU is referencing something that's been deleted.
Oshima may know why we delay WindowTreeHost destruction for possible re-use.
We might need to delay the DisplaySynchronizer's ash/wm -> mus/ws display destruction notification.
There are probably multiple defects packed into this issue, but I'll try to follow up there.
Basically the ui::Compositors LayerTreeHost should be destroyed before XDestroyWindow() is called. Here are the lines in question:

Put a log after this one:
https://cs.chromium.org/chromium/src/ui/compositor/compositor.cc?l=234&rcl=a3360971dab6da9c2a28f9362dee531c1a1e2ffe

Put a log before this one:
https://cs.chromium.org/chromium/src/ui/platform_window/x11/x11_window_base.cc?l=60&rcl=a3360971dab6da9c2a28f9362dee531c1a1e2ffe

If they're happening in the wrong order that will be problematic.
We shouldn't be destroying an X window when adding a new display ... are we doing that for some reason?

Comment 19 by msw@chromium.org, Jan 11 2018

This happens when removing displays (on ctrl-shift-d to remove an XWindow and simulate the destruction of a display, and *probably* on device when unplugging/disabling a display).

There is also existing quirky behavior that removes the displays and recreates them as part of the configuration to toggle unified and mirror mode. Those initial changes are reported from ash to mus by DisplaySynchronizer similar to removing a display.
Project Member

Comment 20 by bugdroid1@chromium.org, Jan 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5697939468e0812830e0362a3e6c1b44c352cfc6

commit 5697939468e0812830e0362a3e6c1b44c352cfc6
Author: Mike Wasserman <msw@chromium.org>
Date: Wed Jan 17 22:38:58 2018

mus: Fix gpu context lost errors and crashes on display removal

Destroy AshWindowTreeHostMus's compositor on shutdown.
(Mus async destroys the underlying platform window)

Destroy displays on root removal, not config changes.
(revert http://crrev.com/c/761764 to match cash dtor ordering)

Use a stub window for Ash's virtual unified display.
(matches cash to use an offscreen surface for rendering)

Bug:  798538 
Test: No errors/crashes removing displays with --mus, mirroring/unified works.
Change-Id: I064416cfc377bf19ec9afa214baa28a75846acf8
Reviewed-on: https://chromium-review.googlesource.com/862603
Commit-Queue: Michael Wasserman <msw@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#529930}
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/display/mirror_window_controller.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/host/ash_window_tree_host_mus.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/system/night_light/night_light_controller.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/ash/system/night_light/night_light_controller_unittest.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/display.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/display_manager.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/platform_display_default.cc
[modify] https://crrev.com/5697939468e0812830e0362a3e6c1b44c352cfc6/services/ui/ws/window_tree_unittest.cc

Comment 21 by msw@chromium.org, Jan 17 2018

Status: Fixed (was: Assigned)
Components: -Internals>MUS Internals>Services>WindowService

Sign in to add a comment