Surface reference problem with Android |
||||||||
Issue descriptionAn issue with surface references and Android came up in crbug.com/764460 . When CompositorImpl is hidden (eg. when the user switches apps) it destroys the LayerTreeFrameSink. When the LayerTreeFrameSink is destroyed, the CompositorFrameSinkSupport and Surface for the browser is destroyed. When the browser Surface is destroyed, the only reference to the renderer Surface is destroyed with it. The renderer CompositorFrameSinkSupport is kept alive at this point and the Surface isn't destroyed. The renderer SurfaceLayer, owned by DelegatedFrameHostAndroid, still points to the renderer Surface. Surface references operate on a DAG. The browser surface is reachable from the root, which refers to the renderer surface. Before CompositorImpl is hidden this DAG is in a good state. After the browser surface is destroyed the renderer surface becomes unreachable. This is a bad state because the renderer SurfaceLayer still points to the Surface but nothing is referencing it. The renderer Surface could be deleted at this point. When CompositorImpl is shown again, a new LayerTreeFrameSink is created along with a CompositorFrameSinkSupport. The renderer SurfaceLayer is added to CompositorImpls LayerTreeHost and the browser will embed the renderer Surface again. Once the browser submits a CompositorFrame, it adds a new surface reference to the renderer Surface and the reference DAG is in a good state again. The last surface for any CompositorFrameSinkSupport is kept alive, so the situation described in the previous paragraph usually works. When the browser submits a CompositorFrame embedding the renderer a new surface reference is added and the DAG is in a good state. I think there is a potential race though. If the renderer size has changed (eg. orientation change) then it will submit a new CompositorFrame immediately. If the renderer CompositorFrame arrives before the browser CompositorFrame arrives then the old renderer Surface will be deleted (since it has no references) and a new renderer Surface created. I'm less certain of details at this point, but if the browser CompositorFrame embeds the old renderer Surface after it's destroyed that would result in flicker until the next browser CompositorFrame is submitted that embeds the new renderer Surface.
,
Nov 11 2017
Over to cblume@ for investigation
,
Nov 17 2017
The big question I was left with when looking at this problem originally was why do we keep the render Surface in memory after the browser Surface is destroyed. I talked to danakj@ and boliu@ about why we destroy the browser Display/LayerTreeFrameSink/CompositorFrameSinkSupport/Surface and idea is to use fewer resources when Chrome isn't open. The renderer provides content for most of the screen on Android though. I believe we're keeping all textures and things for the renderer Surface in memory still. If we just destroy the renderer Surface + clear SurfaceLayer (eg. call DelegatedFrameHostAndroid::DestroyDelegatedContent()) this is no longer a problem. I don't know if that will impact time to switch back to a tab though? Maybe that was the reason it was done like this originally? I know there is some logic to save and show screenshots of tab content on Android, but I have no idea how that works and if that happens when switching back to Chrome? It might also be keeping the renderer Surface alive wasn't intended and this behaviour reintroduced at some point unnoticed. There was no issue with the behaviour with SurfaceSequences, since sequences are associated with a FrameSinkId instead of SurfaceId.
,
Nov 17 2017
There are two cases here: - background Chrome, foreground Chrome (no rotation) - background Chrome, rotate, foreground Chrome When there is no rotation, we could reuse the previous content (or even just a screenshot) to get something displayed quickly. When there is a rotation, that content will wrong (assuming non-square display). I feel like freeing resources when backgrounded is far more important than resuming slowly. IIRC Android puts more memory pressure on an app which is backgrounded. And a lot of our discard events are really happening then (again, IIRC). If we are willing to sacrifice resume time for background resource usage, resuming with either a screenshot (potentially bg-color filled if rotated) or just bg-color filled seems pretty reasonable to me. Both make good placeholders. But both would also cause a flash. If the user feels "flash" then I think it is fast enough to not need a placeholder at all. So my vote is to just destroy the renderer Surface + clear SurfaceLayer.
,
Nov 17 2017
I'm searching around but I want to ask before East Coast folks go home for the weekend: Is there a way the renderer is notified that the app was backgrounded? Said another way, is there a place I can destroy the renderer Surface + clear SurfaceLayer? The browser is somehow notified. It looks like maybe that happens inside CompositorImpl somewhere (not immediately obvious to me where). Is CompositorImpl also a piece of the renderer and thus also possibly notified?
,
Nov 17 2017
CompositorImpl is the browser, not the renderer. https://chromium-review.googlesource.com/762185 stops doing things in the renderer based on browser visibility, but its the tab not the browser. boliu@ was thinking about doing things in the renderer based on browser visibility..
,
Nov 17 2017
The CompositorFrameSinkSupport/SurfaceLayer for the renderer is part of the browser process in DelegatedFrameHostAndroid. CompositorImpl::SetVisible() is set false and that's where everything for the browser is destroyed. You would need to get the same signal somewhere in this stack of classes: RenderWidgetHostImpl RenderWidgetHostViewAndroid DelegatedFrameHost Does RenderWidgetHostImpl::WasHidden() get called? You can get the view from there and then go from the view to DelegatedFrameHostAndroid.
,
Nov 17 2017
> Does RenderWidgetHostImpl::WasHidden() get called? I don't think we should rely on browser visibility (whether the app is in the foreground or background) to always agree with tab visibility (RenderWidgetHostImpl::WasHidden). They generally work as you would expect, but there are some corner cases, like a tab that is playing music is considered visible even if the app is in the background. Plus these signals are not synchronized. In terms of memory, most of it comes from freeing the browser compositor context, and any frame resources. I don't that much about viz::surface. Is it possible to throw away all of these things without throwing away the browser viz surface?
,
Nov 18 2017
,
Nov 18 2017
I think as a first attempt I'll hijack RWH::WasHidden() just to see if I'm on the right path. Then I'll follow up by introducing a new "surface hidden" IPC to no longer hijack RWH::WasHidden(). An alternative approach might be something Fady is working on. I haven't read the docs on this yet, but IIRC we can have the browser force a new surface sync. Almost like a "Forget your past Surface, I need a new one". fsamuel@ Are there docs I can read? Does surface sync allow us to force a new renderer Surface somehow? And can we be sure it happens after the renderer knows about the rotation?
,
Nov 18 2017
I'm not totally told on the idea of a new "browser visibility" signal to renderer. Browser probably shouldn't be waiting on a renderer response before throwing away resources. But if the browser doesn't wait, then the bad state can still happen, only "transiently" now?
,
Nov 18 2017
,
Nov 20 2017
I don't think this should need any new IPC. It's a bit confusing, but all of the renderer things I mentioned are actually part of the browser process (eg. content/browser/renderer_host). The renderer process gets OnBeginFrame() IPC already when it's needed. The renderer then sends a CompositorFrame to the browser process and that CompositorFrame is passed to DelegatedFrameHostAndroid. DFHA has a CompositorFrameSinkSupport, which owns the Surface, which owns the CompositorFrame. This is the thing that isn't cleared when leaving the Chrome app, and from my understand of how resources work this means that all of the CompositorFrame resources from the renderer are not cleared either?
,
Nov 20 2017
Sure, I'm just saying RWH::WasHidden signal is still not enough, it doesn't matter if it's browser side or renderer side. Hmm.. I'm not entirely sure if it's intentional or accidental that DFHA doesn't throw away that frame. I don't know the full history here, but I can imagine the renderer frame doesn't get dropped to reduce flicker when resuming chrome again. But assuming that's not the case, I support renderer frame (and whatever else that can be cleaned up) as well.
,
Nov 20 2017
If my memory serves me right, the ReleaseLayerTreeFrameSink was added when it used to be ReleaseOutputSurface. The SetVisible in CompositorImpl is actually used when the android window the Display's onscreen context is tied to is destroyed, so the context and everything tied to it has to be cleared. Since the same context is used by browser cc LayerTreeFrameSink, that has to be destroyed as well. I don't think the intention here was to clear resources. viz::FrameEvictionManager should take care of it based on memory pressure. RWHVA does SetVisible(false) on FrameEvictor to unlock the frame, so it can be evicted if need be.
,
Jan 5 2018
Super overdue update (oops, forgot): I looked into this and struggled to make a great repro. Eric and I realized it only happens 2.5 times in a million and there are more common errors we could be working on. (See the bug linked from #1.) We talked about possibly turning this back on and seeing if the numbers are still that low in beta. If so, it would might not need to be a blocker for surface references being enabled on Android. Handing it over to kylechar@ since I believe he just enabled surface references elsewhere.
,
Jan 5 2018
Re #16, this may still be worth understanding, but I think that the current guesses as to the problem might be missing something. If we turn this back on and still see the same issues, we might try taking a temporary reference to the surface on re-foregrounding, preventing it from being dropped until it's drawn with (see here https://cs.chromium.org/chromium/src/components/viz/service/surfaces/surface_manager.cc?rcl=dee4154b5aae30e1af3fff65d3f19ed2b40766ac&l=107)... if this fixes the issue (or doesn't), it will help us narrow down the cause. Either way, it would be good to re-enable this for Android.
,
Jan 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/78d11e5577a5bdd721b658dd000e8286d09f9019 commit 78d11e5577a5bdd721b658dd000e8286d09f9019 Author: kylechar <kylechar@chromium.org> Date: Tue Jan 09 20:23:46 2018 viz: Turn on surface references again for Android. This CL makes surface references the default for Android. The --enable-surface-references flags is flipped to be --disable-surface-references instead. The webvr issue with missing surfaces no longer occurs so I don't see any obvious problems blocking this change. This will require monitoring the number of missing surfaces in canary to see if there are regressions. Bug: 676384 , 782403 Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel Change-Id: Ia2bea38c2deb2fceaa369539c6d637ed21723be5 Reviewed-on: https://chromium-review.googlesource.com/857339 Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/heads/master@{#528087} [modify] https://crrev.com/78d11e5577a5bdd721b658dd000e8286d09f9019/components/viz/common/switches.cc [modify] https://crrev.com/78d11e5577a5bdd721b658dd000e8286d09f9019/components/viz/common/switches.h [modify] https://crrev.com/78d11e5577a5bdd721b658dd000e8286d09f9019/content/browser/renderer_host/compositor_impl_android.cc [modify] https://crrev.com/78d11e5577a5bdd721b658dd000e8286d09f9019/content/browser/renderer_host/render_process_host_impl.cc [modify] https://crrev.com/78d11e5577a5bdd721b658dd000e8286d09f9019/content/renderer/child_frame_compositing_helper.cc
,
Jan 15 2018
I turned surface references back on for Android in 65.0.3317.0. There is a slight increase in the number of missing surfaces but it's a tiny number. Here are some UMA stats, looking at Sat Jan 13 2018 with Channel=Canary+Platform=Android+Version=X and 7 day aggregation. Compositing.SurfaceAggregator.SurfaceDrawQuad.MissingSurface. 65.0.3315.0 = 1 (missing) / 750,535,631 (not missing) 65.0.3316.0 = 0 (missing) / 742,881,562 (not missing) 65.0.3317.0 = 9 (missing) / 720,810,370 (not missing) 65.0.3318.0 = 3 (missing) / 620,542,899 (not missing) I spent some time looking at what changed and we modified when surface garbage collection runs. Surfaces get deleted only after surface aggregation which eliminates most races adding/removing surface references. I think this is no longer an issue. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by laforge@google.com
, Nov 8 2017