New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 863049 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug

Blocking:
issue 871763



Sign in to add a comment

Viz Android GL Out Of Memory Error

Project Member Reported by jonr...@chromium.org, Jul 12

Issue description

OS: Android
Bot: luci.chromium.try/android-kitkat-arm-rel
Test suite: content_browsertests
Tests: most of them

Example failing run: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-kitkat-arm-rel/33098

It appears that on this bot we run out of GL memory as GL error 1285 is "GL_OUT_OF_MEMORY 0x0505"

[FATAL:gl_context.cc(323)] Check failed: error == GL_NO_ERROR || error == GL_CONTEXT_LOST_KHR. GL error was: 1285
[ERROR:test_suite.cc(303)] Currently running: DomSerializerTests.SubResourceForElementsInNonHTMLNamespace
Searching for native crashes in: /b/swarming/w/itqgQo5o/tmpOXlAgX
Unknown Android release, consider passing --packed-lib.
Reading Android symbols from: /b/swarming/w/ir
Searching for Chrome symbols from within: /b/swarming/w/ir/out/Release/lib.unstripped:/b/swarming/w/ir/out/Release
Stack Trace:
  RELADDR   FUNCTION                                                                                                                                                                                    FILE:LINE
  024bd9d9  logging::LogMessage::~LogMessage()                                                                                                                                                          ??:0:0
  024891df  gl::GLContext::MakeVirtuallyCurrent(gl::GLContext*, gl::GLSurface*)                                                                                                                         ??:0:0
  02c9797b  gpu::GLContextVirtual::MakeCurrent(gl::GLSurface*)                                                                                                                                          ??:0:0
  02cbd287  gpu::gles2::GLES2DecoderImpl::MakeCurrent()                                                                                                                                                 ??:0:0
  02da7ee7  gpu::CommandBufferStub::MakeCurrent()                                                                                                                                                       ??:0:0
  02da7d4f  gpu::CommandBufferStub::OnMessageReceived(IPC::Message const&)                                                                                                                              ??:0:0
  02dacd6d  gpu::GpuChannel::HandleMessageHelper(IPC::Message const&)                                                                                                                                   ??:0:0
  02dabf33  gpu::GpuChannel::HandleMessage(IPC::Message const&)                                                                                                                                         ??:0:0
  02c79083  gpu::Scheduler::RunNextTask()                                                                                                                                                               ??:0:0
  023aaaad  base::internal::Invoker<base::internal::BindState<void (viz::TestLayerTreeFrameSink::*)(), base::WeakPtr<viz::TestLayerTreeFrameSink> >, void ()>::RunOnce(base::internal::BindStateBase*)  ??:0:0
  024b4921  base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)                                                                                                                        ??:0:0
  024c50e1  base::MessageLoop::RunTask(base::PendingTask*)                                                                                                                                              ??:0:0
  024c534d  base::MessageLoop::DeferOrRunPendingTask(base::PendingTask)                                                                                                                                 ??:0:0
  024c5545  base::MessageLoop::DoWork()                                                                                                                                                                 ??:0:0
  024c723d  base::MessagePumpDefault::Run(base::MessagePump::Delegate*)                                                                                                                                 ??:0:0
  024c4d35  base::MessageLoop::Run(bool)                                                                                                                                                                ??:0:0
  024d6efd  base::RunLoop::Run()                                                                                                                                                                        ??:0:0
  0250659f  base::Thread::Run(base::RunLoop*)                                                                                                                                                           ??:0:0
  02506733  base::Thread::ThreadMain()                                                                                                                                                                  ??:0:0
  025284a3  base::(anonymous namespace)::ThreadFunc(void*)                                                                                                                                              ??:0:0
  0000d173  <UNKNOWN>                                                                                                                                                                                   /system/lib/libc.so
  0000d30b  <UNKNOWN>            
 
Labels: -Pri-3 Pri-1
Status: Started (was: Untriaged)
Taking a look - would like to understand this before launching more widely.
OUT_OF_MEMORY is misleading - it's just thrown because we can't allocate a new Buffer in the android BufferQueue for our surface because the BufferQueue is already torn down, not because the allocation failed.

Probably a shutdown ordering issue - only appears on K, so later OS versions may have become more lenient to these kinds of ordering issues.

Will see if I can make the ordering more deterministic and prevent this.
Labels: Android-OOP-D-Bot-Failures
Note that this is also responsible for the chrome_public_test_vr_apk failures on the KitKat bot.
Blocking: 871763
I've spent a fair amount of time on this, and it's unfortunately a real puzzle - nothing about our GL command stream seems wrong, and moving various components to shutdown first/second doesn't seem to help.

The error seems to be popping up after we switch virtual contexts and restore state - maybe in restoring state we're re-binding something that's been deleted, leading to this error?

Adding glFinish at various points makes this issue go away / become flaky as well, which is a sign of some sort of driver issue...

The cleanest repro case I've seen is:

1) We delete RenderWorker, causing glContext to be un-bound
2) We switch to RenderCompositor, causing a real context switch and a full state restore.
3) We glFinish - no errors at this point.
4) RenderCompositor issues a glFlush - no-op
5) We switch to DisplayCompositor causing a virtual context switch (no actual switch) and a partial state restore
6) We glFinish, triggering OOM.

This is really weird as we switch to the real context and finish in (3), at which point there are no errors. The only thing we do between that point and the error is flush and restore state.

I suspect something about the state restore is hitting a timing issue / driver bug in K. Maybe some external Android resource that's bound to EGL/etc... is being deleted and when we try to restore various buffers/etc... we hit the issue? We don't appear to be restoring a framebuffer though, so not quite sure.

Will keep looking on Monday.
One side point - we only ever have one CompositorImpl for now, and we *never* actually cleanly tear down Chrome in the wild (we always just kill the process), so this is really a test-only issue. It may be fine to just have the tests kill the gpu process preventing / hiding these issues.
Fix in flight.
Project Member

Comment 9 by bugdroid1@chromium.org, Aug 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7c6485d8afdd53f95b44ebc545351cafb835ecaa

commit 7c6485d8afdd53f95b44ebc545351cafb835ecaa
Author: Eric Karl <ericrk@chromium.org>
Date: Thu Aug 23 23:58:15 2018

Android OOP-D: Tear down display when going invisible

When Android goes invisible in OOP-D, it wasn't tearing down the
display, which can lead to GL issues as we continue to use GL
after the window (used to create the GL surface) is destroyed.

In order to tear down the display for Viz, we need to invalidate
our root frame sink ID. This change refactors things so that we
always invalidate the root frame sink ID on going invisible, and
re-register it on becoming visible. This allows both viz/non-viz
to share the same logic.

As registering/unregistering isn't doing much in non-viz case,
this doesn't add significant overhead there.

Bug:  863049 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel
Change-Id: I1589e402185fd9e2cdb007d3d8cd739f303ad48a
Reviewed-on: https://chromium-review.googlesource.com/1184376
Reviewed-by: Khushal <khushalsagar@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Commit-Queue: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#585664}
[modify] https://crrev.com/7c6485d8afdd53f95b44ebc545351cafb835ecaa/components/viz/service/frame_sinks/root_compositor_frame_sink_impl.cc
[modify] https://crrev.com/7c6485d8afdd53f95b44ebc545351cafb835ecaa/components/viz/service/frame_sinks/root_compositor_frame_sink_impl.h
[modify] https://crrev.com/7c6485d8afdd53f95b44ebc545351cafb835ecaa/content/browser/renderer_host/compositor_impl_android.cc
[modify] https://crrev.com/7c6485d8afdd53f95b44ebc545351cafb835ecaa/content/browser/renderer_host/compositor_impl_android.h
[modify] https://crrev.com/7c6485d8afdd53f95b44ebc545351cafb835ecaa/services/viz/privileged/interfaces/compositing/display_private.mojom

Status: Fixed (was: Started)

Sign in to add a comment