New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 738443 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug
Proj-XR



Sign in to add a comment

VR: Scene freezes when presenting webVR from in Vr Shell

Project Member Reported by mthiesse@chromium.org, Jun 30 2017

Issue description

This appears to be some sort of race condition when transitioning from Vr Shell to WebVR presentation (I may also have seen it once entering presentation outside of VR).

It looks like everything is getting initialized correctly, but VrDisplay never gets VSyncs. The mojo connections all appear to be correctly bound without error, but when we fire the vsync callback from vr_shell_gl it never arrives.

I'm suspicious it's a mojo issue because I can't find anything wrong in our state.

It's really difficult to recover from this kind of issue, because we can't fire the callback again, and VrDisplay shouldn't request a second vsync.
 
Cc: sunn...@chromium.org
I got this under GDB, and the main thread (the one that runs javascript) in the renderer process is hung waiting on a completion (proxy_main.cc, ProxyMain::BeginMainFrame::commit).  This may be related to crbug/668892.

Without understanding what that code does, this may be a side effect of our stopping the compositor while presenting in vr.

(as an aside - we are also hitting some "not implemented" code when we start presenting, such as Not implemented reached in virtual void cc::DisplayScheduler::OnBeginFrameSourcePausedChanged(bool)).

After we deal with the freeze in the render process, we'll also have to harden the browser process VR code to deal with missing/delayed messages.  VRShell should render something if WebVR isn't responding.
Components: -Internals>Mojo>Bindings Blink>Scheduling
Sunny, for Android, we are disabling normal vsync/composition while WebVR is rendering fullscreen to avoid unnecessary work. Could this cause the completion in ProxyMain::BeginFrame::commit to hang forever?

Looking at comments in SchedulerStateMachine::PendingDrawsShoudlBeAborted, this appears to be a potential issue that the scheduler state machine is trying to avoid.  There appear to be several existing bugs around hangs here.
Disabling the display compositor can hang renderers. This happens because the renderer compositor gets stuck waiting on an ack from the display compositor for the previous frame.

Can you try running with the --disable-main-frame-before-activation flag? That should make the commit and raster stages of the pipeline synchronous. The renderer won't hang but it won't submit new frames until the previous frame is acked.
I tried that flag, and was unable to repro the hang.

I should clarify that we are pausing things, not disabling them outright (ie - indirectly calling OnBeginFrameSourcePausedChanged(true)).

Is this a scheduler bug that pausing can cause hangs, or is the recommendation to just no pause things?
Are you setting the paused flag on the renderer scheduler or browser scheduler or both? I don't think display scheduler is wired up to do anything with the paused flag.

Setting the paused flag on the renderer should just drain the compositing pipeline, but if it doesn't that's definitely a bug. Dumping out cc::Scheduler::AsValue when the renderer hangs would help in debugging.
Owner: billorr@chromium.org
Status: Assigned (was: Available)
In VrShellImpl.java (in the browser process), we call WindowAndroid#setVSyncPaused, which calls into native to call WindowBeginFrameSource::OnPauseChanged, which iterates through all observers to call OnBeginFrameSourcePausedChanged.

There are several observers dynamically added/removed.  Most do nothing in OnBeginFrameSourcePausedChanged (such as DisplayScheduler and RenderWidgetHostViewAndroid).  cc::Scheduler adds itself in SetupNextBeginFrameIfNeeded.

Is cc::Scheduler in the browser process the browser scheduler?  I'm not familiar enough with schedulers to know the terminology here.
Following up - We pause the scheduler in the browser process, but the renderer process doesn't know it is paused.  The render process stops receiving notifications like ClientLayerTreeFrameSink::OnBeginFrame, so ProxyMain::BeginFrame::commit can block forever.  (Note, when exiting VR this was the notification that unblocked things).

If the render process scheduler knew that things were paused, it could return ACTION_ACTIVATE_SYNC_TREE, and unblock the ProxyMain::BeginFrame::commit completion.

Let's keep the discussion here:

Offline email from billorr@:

I believe I'm starting to understand what is happening in this bug:
When we enter WebVR mode, we call WindowAndroid#setVSyncPaused, which calls WindowBeginFrameSource::OnPauseChanged, and pauses the scheduler in the browser process.  This paused state doesn't make its way over to the renderer process.
On the renderer side, OnBeginFrame is called periodically which will (among other things) flush state changes in the compositor.  This uses new viz mojo interface to push the OnBeginFrame events down from the browser process.  When the browser process vsync is paused, we stop getting these messages, and the scheduler eventually stops having state changes.
Sometimes (when the bug occurs) we don't change to the ACTION_ACTIVATE_SYNC_TREE state, but have a pending ProxyMain::BeginMainFrame::commit outstanding blocking the main thread.
For fixes, I think we need to do the following:
When the browser scheduler is paused, it should inform the renderer scheduler that it should pause.
This would allow the render process scheduler to go through the forced activation codepath, and we'd unblock ourselves.
Given what is happening, it does sound like a vsync/begin frame scheduler bug.

I'm still piecing together my understanding of how the components fit together, so any guidance or help would be appreciated.  If someone on the scheduler team could take this on, that would be great, but I can continue to look at it with some guidance.  We'd like to get this fixed for M61.
Cc: boliu@chromium.org
The paused code path was only used for android webview until this commit where it was changed to be used for WebVR: crrev.com/b10e886bab7da85506656b4bb83e707cc2170209

The paused state doesn't get automatically propagated to the renderer by RenderWidgetHostViewAndroid. See this code: https://cs.chromium.org/chromium/src/content/browser/renderer_host/render_widget_host_view_android.cc?rcl=171b4c6c16ef63a1e4488193037be0a5b6512973&l=2242

If we intend for the paused state to be propagated we should send the ViewMsg_SetBeginFramePaused IPC message to the renderer there. Furthermore, we should DCHECK(using_browser_compositor_ || !paused) there so that we can assert that code path isn't used for android webview.

+ boliu@ to confirm that using_browser_compositor_ == false means android webview.
I tried hooking into RenderWidgetHostViewAndroid, to send the message down on RenderWidgetHostViewAndroid, but hit a few snags.

First off, RenderWidgetHostViewAndroid doesn't seem to be getting notified that we are paused.  Looking at the OnBeginFrame callstack, it appears we go through ExternalBeginFrameSource, which doesn't have an existing way to plumb through paused state.  The stack seems to have some calls optimized out. We could add a way to plumb the paused state through if that is the right approach.

stack:
content::RenderWidgetHostViewAndroid::OnBeginFrame
cc::ExternalBeginFrameSource::OnBeginFrame
ui::WindowAndroid::WindowBeginFrameSource::OnVSync
ui::WindowAndroid::OnVSync
Java_org_chromium_ui_base_WindowAndroid_nativeOnVSync

Second, are there any concerns with message order vs. mojo OnBeginFrame calls?  If we send a message down that we are paused, can we already be experiencing this hang before the message is received?  Is the message received on the main thread in the render process?

Third, it appears that observers are added/removed pretty frequently, so it's possible that some of the time WindowBeginFrameSource doesn't tell any observer that we have paused. AddObserver is called and learns our paused state at some point, but my worry is that we may not call AddObserver if we have vsync paused.
Here is an example callstack where we add observers:
ui::WindowAndroid::WindowBeginFrameSource::AddObserver
cc::FrameSinkManager::RecursivelyAttachBeginFrameSource
cc::FrameSinkManager::RegisterFrameSinkHierarchy
content::CompositorImpl::DidInitializeLayerTreeFrameSink
cc::LayerTreeHost::DidInitializeLayerTreeFrameSink
cc::SingleThreadProxy::SetLayerTreeFrameSink
cc::LayerTreeHost::SetLayerTreeFrameSink
content::CompositorImpl::InitializeDisplay
content::CompositorImpl::OnGpuChannelEstablished

and another example:
ui::WindowAndroid::WindowBeginFrameSource::AddObserver
cc::ExternalBeginFrameSource::AddObserver
content::RenderWidgetHostViewAndroid::SetNeedsBeginFrames
cc::mojom::CompositorFrameSinkStubDispatch::Accept
mojo::InterfaceEndpointClient::HandleValidatedMessage


Comment 12 by boliu@chromium.org, Jul 10 2017

> + boliu@ to confirm that using_browser_compositor_ == false means android webview.

yep

(did not read this bug..) the issue pausing bfs was trying to solve also affects chrome as well. It's just not as catastrophic as webview since usually each tab in chrome is in its own process, so deadlocking one won't affect any other tab.
ExternalBFS does plumb the paused state to its observers: https://cs.chromium.org/chromium/src/cc/scheduler/begin_frame_source.cc?rcl=4908557534d516a48589f36714d91ed835683f58&l=317

I can't say why RWHVAndroid doesn't get notified of this given that it's an observer and gets the OnBeginFrame message fine.

The paused message is received on the compositor thread: https://cs.chromium.org/chromium/src/content/renderer/gpu/compositor_forwarding_message_filter.cc?l=47

So even if the main thread is busy in commit the paused message will be received and the scheduler will force activate and finish the frame.

The first AddObserver stack trace is for the browser compositor. The second one is for the renderer compositor on clank (not android webview).

That code path uses the CompositorFrameSinkSupport to plumb begin frames. DelegatedFrameHostAndroid is the client of an ExternalBeginFrameSource it owns and uses that to forward begin frames to RWHVA. It is also the CompositorFrameSinkSupportClient so it receives OnBeginFrame from there. CFSSupport ignores the pause message and CFSSupportClient has no interface for it. We should fix that if we want to support the paused message for renderers outside of android webview.

https://cs.chromium.org/chromium/src/cc/surfaces/compositor_frame_sink_support.cc?l=331
Err, in my previous comment, the last paragraph explains why RWHVAndroid doesn't get notified of the paused state. Ignore the 2nd paragraph where I say "I can't say why...".
Hmmm... looks like noone is listening for that message on the other side.  The code is there, but CompositorExternalBeginFrameSource::AddObserver and therefore CompositorExternalBeginFrameSource::OnSetBeginFrameSourcePaused are not called.

compositor_forwarding_message_filter.cc does receive the messages.

I'll continue to investigate tomorrow.
Cc: fsam...@chromium.org
We might be using viz::ClientLayerTreeFrameSink (and its ExternalBFS) in the renderer now and that doesn't handle the paused message.

+fsamuel to confirm this
Probably, yes... you need to plumb that through compositor_frame_sink.mojom.
Status: Started (was: Assigned)
Have fix built, will send out for CR after trybots succeed.
Issue 738055 has been merged into this issue.
Project Member

Comment 21 by bugdroid1@chromium.org, Jul 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/372a12674b9744c5a7742c99009358601125d0fc

commit 372a12674b9744c5a7742c99009358601125d0fc
Author: Bill Orr <billorr@chromium.org>
Date: Thu Jul 20 00:18:39 2017

Plumb VSync / BeginFrameSource paused state through RenderWidgetHostViewAndroid.

OnAndroid, VSync processing may be paused for the browser compositor,
which has the effect of not sending down OnBeginFrame events to the
render process, essentially pausing both browser and renderer
compositing.

Sometimes the renderer compositor can get into a bad state where the
main thread is blocked waiting on a completion
(ProxyMain::BeginMainFrame::commit), but we don't meet the conditions
for ACTION_ACTIVATE_SYNC_TREE, so the completion is never triggered.

The fix is to plumb paused state from the browser to the renderer
process, so ACTION_ACTIVATE_SYNC_TREE can be force activated (when
paused), and we can unblock the main thread.

BUG= 738443 

Change-Id: I401d67bdd0f1054d5594575c8800d01a53e67f2b
Reviewed-on: https://chromium-review.googlesource.com/567029
Reviewed-by: Jochen Eisinger <jochen@chromium.org>
Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org>
Reviewed-by: Bo Liu <boliu@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Fady Samuel <fsamuel@chromium.org>
Commit-Queue: Bill Orr <billorr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488059}
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/android_webview/browser/hardware_renderer.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/android_webview/browser/hardware_renderer.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/android_webview/browser/surfaces_instance.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/android_webview/browser/surfaces_instance.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/cc/ipc/compositor_frame_sink.mojom
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/cc/test/fake_compositor_frame_sink_support_client.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/cc/test/fake_compositor_frame_sink_support_client.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/cc/test/mock_compositor_frame_sink_support_client.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/client/client_layer_tree_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/client/client_layer_tree_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/host/host_frame_sink_manager_unittests.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/compositor_frame_sink_support.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/compositor_frame_sink_support_client.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/compositor_frame_sink_support_unittest.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/direct_layer_tree_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/direct_layer_tree_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/gpu_compositor_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/gpu_compositor_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/gpu_root_compositor_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/service/frame_sinks/gpu_root_compositor_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/test/test_layer_tree_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/components/viz/test/test_layer_tree_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/frame_host/render_widget_host_view_child_frame.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/frame_host/render_widget_host_view_child_frame.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/renderer_host/delegated_frame_host.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/renderer_host/delegated_frame_host.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/renderer_host/offscreen_canvas_provider_impl_unittest.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/renderer_host/render_widget_host_view_android.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/browser/renderer_host/render_widget_host_view_android.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/renderer/android/synchronous_layer_tree_frame_sink.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/renderer/android/synchronous_layer_tree_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/content/test/fake_renderer_compositor_frame_sink.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/services/ui/ws/frame_generator.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/third_party/WebKit/Source/platform/graphics/OffscreenCanvasFrameDispatcherImpl.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/ui/android/delegated_frame_host_android.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/ui/android/delegated_frame_host_android.h
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/ui/aura/local/layer_tree_frame_sink_local.cc
[modify] https://crrev.com/372a12674b9744c5a7742c99009358601125d0fc/ui/aura/local/layer_tree_frame_sink_local.h

Status: Fixed (was: Started)
Status: Verified (was: Fixed)
Just to confirm, this is basically verifying that entering WebVR from Chrome VR works, correct? Assuming so, I am marking this as fixed by verifying in 62.0.3178.0
Components: Blink>WebXR

Sign in to add a comment