New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 701988 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 1
Type: Bug



Sign in to add a comment

RendererCompositorFrameSink should not outlive DelegatedFrameHost?

Project Member Reported by fsam...@chromium.org, Mar 15 2017

Issue description

Saman noticed some interesting behavior while moving where surface ID allocation happens. 

DNSError_GoBack2AndForward and other similar navigation tests fail because the RendererCompositorFrameSink survives the navigation and navigation back whereas the RenderWidgetHostViewAura / DelegatedFrameHost get destroyed.

From Saman:

The crash happens on GoForwardAndWaitForNavigations(2); in the test.

RenderFrameHostImpl::OnSwappedOut is called after FrameTree::CreateRenderViewHost, and so in FrameTree::CreateRenderViewHost the RenderViewHost is still alive and  instead of creating a new view we return the existing one. Thus, we end up reusing the RendererCompositorFrameSink.

Currently the RendererCompositorFrameSink is lost only on gpu context lost. Perhaps LTH should lose the CompositorFrameSink on swapout too?

My worry is we don't end up returning resources to renderers that get reused during navigation, and so we end up leaking gpu resources over time.
 
I should add that this behavior is pre-existing and has nothing to do with Saman's change. Saman's change just revealed this behavior.
In particular, I don't think DelegatedFrameHost will return resources on destruction. 

Comment 3 by creis@chromium.org, Mar 15 2017

Components: Internals>Sandbox>SiteIsolation
Owner: lfg@chromium.org
Yes, that does sound like a bug.  lfg@ would probably know whether we can discard the renderer-side RWHV-related state, since I think he worked on discarding the RWHV in the browser process in these cases.  (He's OOO at the moment, but I'll list him as an owner to help triage it when he gets back.  Anyone should feel free to grab it in the meantime, though.)

Comment 4 by piman@chromium.org, Mar 15 2017

RenderWidget and RenderWidgetHostView go hand-in-hand, I don't think we can reuse one if we don't reuse the other.

Comment 5 by samans@chromium.org, Mar 17 2017

Cc: jbau...@chromium.org
Here is what seems to be happening in the unit test. I'm not entirely sure but this is the best I can come up with.
Imagine we navigate to another page and immediately navigate back. On the first navigation, the DelegatedFrameHost is destroyed and the RenderFrame is asked to swap out, but in RenderFrameImpl::OnSwapOut the RenderWidget is not destroyed. Instead, we call RenderWidget::WasSwappedOut which simply tells the RenderProcess that it’s OK to shut down. The RenderProcess sends a ChildProcessHostMsg_ShutdownRequest, and in the meantime we have navigated back to the first page, so the browser still wants to use the renderer and rejects the shutdown request (See RenderProcessHostImpl::OnShutdownRequest). So we end up using the same RenderWidget object, which still has its old RendererCompositorFrameSink. 
There are much simpler scenarios that lead to RenderWidget reuse, see https://bugs.chromium.org/p/chromium/issues/detail?id=705548.
The bug is quite easy to reproduce in real-world usage.

Comment 7 by samans@chromium.org, Mar 28 2017

lfg@: I think this CL that you landed could have caused the problem.

https://codereview.chromium.org/2496233003/

I think before that only RenderWidgetHostImpl could destroy its view, but now RenderFrameHostManager can also destroy it. The problem seems to go beyond just leaking resources. We could also show an empty page, as described in the bug that ahest@ linked. RenderWidgetHostView has a complicated lifetime and I'm not 100% sure about this, so please correct me if I'm wrong.
Cc: ah...@yandex-team.ru
 Issue 705548  has been merged into this issue.
Labels: -Pri-3 Pri-1
Status: Assigned (was: Untriaged)
I'm marking this as P1 because it also manifests as hangs on navigation

Comment 10 by lfg@chromium.org, Mar 28 2017

Status: Started (was: Assigned)

Comment 11 by lfg@chromium.org, Mar 28 2017

I looked into this issue, and it comes as a result of reusing the RenderWidget without reinitializing it properly. The proper fix is to delete the RenderWidget when a frame is swapped out and re-create it when a frame is swapped back in. Unfortunately, we can't do that yet as RenderView and RenderWidget are the same object (see issue 583347).

I was under the impression that calling RenderWidgetCompositor::setVisible(false) (done here: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/web/WebViewFrameWidget.cpp?l=28) was enough to clean up the state of the compositor, but it doesn't seem to be the case.

So I think we have two options:
1. Send ViewMsg_ReclaimCompositorResources when destroying the DelegatedFrameHost (which is essentially what ahest@'s CL does).
2. Clean up the state from the renderer when destroying the WebViewFrameWidget in WebViewFrameWidget::close().

I'm inclined to go with option 1 until we can have a more permanent solution (splitting RenderView/RenderWidget).

Any thoughts from compositor experts?

There seems to be a long tail of graphics bugs being caused here. I can only imagine what others will come up from reusing a compositor with code that assumes things are in their default states. I think you should destroy things in the renderer instead of just making them not visible.
Labels: ReleaseBlock-Beta M-57 M-58
Also, this is causing renderer hangs in stable (see https://codereview.chromium.org/2779713002/). I've marked it RBB to remind us to merge to beta if we can, but consider stable too.

Comment 14 by piman@chromium.org, Mar 28 2017

Right, we should keep RW, RWHI and RWHV 1:1. There are fairly strong assumptions in the code that they are.

Comment 15 by piman@chromium.org, Mar 28 2017

@lfg: what would be the fallout of reverting that 3-month old CL?
I commented on https://codereview.chromium.org/2496233003/ as well, but it's difficult to create a new RenderView because it holds a lot of not-compositing-related stuff as well: notably, it's how the blink::Page itself is kept alive.

The long-term solution is to split the compositing functionality wholly out of RenderView; in the short-term, is there any sort of hack we can land to reset RV to a clean-enough slate?

Comment 17 by piman@chromium.org, Mar 28 2017

> In the short-term, is there any sort of hack we can land to reset RV to a clean-enough slate?

c&p from the CL:

I'm not sure. There's more than just the compositor, there are a bunch of things such as sizing, input events, IME, accessibility, focus, mouse capture, cursor, etc. where the RWHV is what keeps the state, not the RWHI. Can we recreate all that state? Maybe? Sounds like a lot of work! Should we? I'm not sure...

Comment 18 by piman@chromium.org, Mar 28 2017

FWIW, I tried reverting https://codereview.chromium.org/2496233003 (there's conflicts, but I tried to untangle), and it fixes the repro case from  bug 705548 . I don't know what else it breaks, though.
To answer the other question of "How did it work before? RV and RW have been the same thing forever, why is it suddenly an issue?"

It became an issue when we implemented OOPIF. In this mode, the main frame can become a RemoteFrame (at which point we'd shut down the compositing bits for the main frame) and then become a LocalFrame again in the future (at which point we'd need to start those bits back up).

Comment 20 by kenrb@chromium.org, Mar 28 2017

Cc: kenrb@chromium.org

Comment 21 by lfg@chromium.org, Mar 28 2017

So let's go back to the root cause of  bug 705548 : it's a race between sending the compositor frame ACK and swapping out the RenderView (and therefore deleting the RWHV).

Now, obviously keeping the RWHV around forever will fix that bug -- since the compositor bits that send the ACK never go away, the race won't happen.

It still doesn't change the fact that we are reusing the RW. When a RVH/RWHI (and RV/RW) goes into a swapped out state, it won't receive updates to sizing, input events, IME, accessibility, etc, etc, so we already need to recreate all this state when swapping the RenderView back in. It should be similar to what happens when a RenderProcess is destroyed and recreated -- the RWHV must also be recreated with the same RWHI (though a different RW).

Like I mentioned before, the proper fix is to actually split RenderView/RenderWidget so that the latter can be destroyed and recreated when a RenderView becomes swapped out. There's still a lot of work to be done to get there, if you are interested in helping.

Keeping the old RWHV around after swapping out can cause other bugs, for example, on Mac, the RWHV is associated with a native widget, which will stick around and can cause issues ( bug 526326  for example, where the RWHV for the proxy occludes the RWHV that's drawing the frame).

That's why I mentioned I think we need to have an intermediate solution for this, which I still think it should either be by always sending the ACK (which just fixes the root cause of the issue) or having a way in the renderer to reset the compositor to a clean state.
Labels: OS-Chrome OS-Linux OS-Mac OS-Windows
Per chat with lfg@ this at least affects Chrome OS, Windows, Mac and Linux.  Re: the need to respin 57: "I believe this is a pretty rare hang, the steps involved are hard to reproduce in a real-world scenario, but it's possible."
Labels: OS-Android
Given we already have 58 beta out, should we remove the beta blocker label (perhaps ReleaseBlock-Stable and M-58 only)? It seems reasonable to put in a fix, but this should not really block a release if it is not a regression.
This is a regression in M-57 right? But stable is out so we can't block that really. Idk, adjust the labels as u think is right.

Comment 26 by piman@chromium.org, Mar 30 2017

Labels: -M-57
It's a regression in 57, but talking to release managers for it, we decided to skip fixing the bug there under the assumption that it would affect few users and the effect is medium severity. We can revisit if it ends up affecting more users than estimated and we can respin.

We should fix on trunk and 58 though.
Project Member

Comment 27 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/00cf9403c34aea924ffecd736f93e06421e10e15

commit 00cf9403c34aea924ffecd736f93e06421e10e15
Author: piman <piman@chromium.org>
Date: Mon Apr 03 17:09:20 2017

Revert "Destroy the old RenderWidgetHostView when swapping out a main frame."

It makes the assumption that a RenderWidget can talk to different
RenderWidgetHostView over its lifetime, but this assumption cannot be satisfied
by the code, and so this causes immediate problems (renderer hangs, tests
failures and flakiness) as well as fundamental consistency problems making further
changes to the code intractable.
Note: this is a manual revert because of conflicts

Original description:
>    Destroy the old RenderWidgetHostView when swapping out a main frame.
>
>    This fixes an issue where the old RenderView is reused by a new remote
>    subframe. We shouldn't be leaking resources, because now we are already
>    hiding the unused RenderView in the renderer, when the local main frame
>    gets detached and the WebViewFrameWidget is closed.
>
>    BUG= 638375 
>    CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
>
>    Review-Url: https://codereview.chromium.org/2496233003
>    Cr-Commit-Position: refs/heads/master@{#438516}

BUG= 701988 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/2786443003
Cr-Commit-Position: refs/heads/master@{#461459}

[modify] https://crrev.com/00cf9403c34aea924ffecd736f93e06421e10e15/content/browser/frame_host/render_frame_host_impl.cc
[modify] https://crrev.com/00cf9403c34aea924ffecd736f93e06421e10e15/content/browser/frame_host/render_frame_host_impl.h
[modify] https://crrev.com/00cf9403c34aea924ffecd736f93e06421e10e15/content/browser/frame_host/render_frame_host_manager.cc
[modify] https://crrev.com/00cf9403c34aea924ffecd736f93e06421e10e15/content/browser/site_per_process_browsertest.cc
[modify] https://crrev.com/00cf9403c34aea924ffecd736f93e06421e10e15/content/browser/web_contents/web_contents_impl.cc

Thanks for the revert, please ensure to verify in canary and request a merge to M58.

Beta RC cut @ 4.00 PM PST -Tuesday(04/04)
RC cut is today @4.00 PM PST.Please prioritize.
Labels: Merge-Request-58
Project Member

Comment 31 by sheriffbot@chromium.org, Apr 4 2017

Labels: -Merge-Request-58 Merge-Review-58 Hotlist-Merge-Review
This bug requires manual review: Reverts referenced in bugdroid comments after merge request.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Review-58 Merge-Approved-58
Approved for 58 branch 3029.
Project Member

Comment 33 by bugdroid1@chromium.org, Apr 4 2017

Labels: -merge-approved-58 merge-merged-3029
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/204f85e5d72d7da5c76d6a38045304c4fb594fd0

commit 204f85e5d72d7da5c76d6a38045304c4fb594fd0
Author: Antoine Labour <piman@chromium.org>
Date: Tue Apr 04 19:21:31 2017

Revert "Destroy the old RenderWidgetHostView when swapping out a main frame."

It makes the assumption that a RenderWidget can talk to different
RenderWidgetHostView over its lifetime, but this assumption cannot be satisfied
by the code, and so this causes immediate problems (renderer hangs, tests
failures and flakiness) as well as fundamental consistency problems making further
changes to the code intractable.
Note: this is a manual revert because of conflicts

Original description:
>    Destroy the old RenderWidgetHostView when swapping out a main frame.
>
>    This fixes an issue where the old RenderView is reused by a new remote
>    subframe. We shouldn't be leaking resources, because now we are already
>    hiding the unused RenderView in the renderer, when the local main frame
>    gets detached and the WebViewFrameWidget is closed.
>
>    BUG= 638375 
>    CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation
>
>    Review-Url: https://codereview.chromium.org/2496233003
>    Cr-Commit-Position: refs/heads/master@{#438516}

BUG= 701988 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/2786443003
Cr-Commit-Position: refs/heads/master@{#461459}
(cherry picked from commit 00cf9403c34aea924ffecd736f93e06421e10e15)

Review-Url: https://codereview.chromium.org/2798673002 .
Cr-Commit-Position: refs/branch-heads/3029@{#575}
Cr-Branched-From: 939b32ee5ba05c396eef3fd992822fcca9a2e262-refs/heads/master@{#454471}

[modify] https://crrev.com/204f85e5d72d7da5c76d6a38045304c4fb594fd0/content/browser/frame_host/render_frame_host_impl.cc
[modify] https://crrev.com/204f85e5d72d7da5c76d6a38045304c4fb594fd0/content/browser/frame_host/render_frame_host_impl.h
[modify] https://crrev.com/204f85e5d72d7da5c76d6a38045304c4fb594fd0/content/browser/frame_host/render_frame_host_manager.cc
[modify] https://crrev.com/204f85e5d72d7da5c76d6a38045304c4fb594fd0/content/browser/site_per_process_browsertest.cc
[modify] https://crrev.com/204f85e5d72d7da5c76d6a38045304c4fb594fd0/content/browser/web_contents/web_contents_impl.cc

Comment 34 by lfg@chromium.org, Apr 4 2017

Status: Fixed (was: Started)
Thanks piman@, let's close this issue as fixed.
Project Member

Comment 35 by bugdroid1@chromium.org, Apr 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ec47030b3d06cb198562b4647094191ba83b06e

commit 4ec47030b3d06cb198562b4647094191ba83b06e
Author: ahest <ahest@yandex-team.ru>
Date: Tue Apr 18 19:19:53 2017

Add a test checking that compositor works in a reused renderer.

The bug was happening if the renderer submitted a frame right before
navigation commit. The swap ACK got lost, and if this RenderWidget was
reused later, it was drawing nothing because it was waiting for the ACK.

BUG= 701988 , 705548 
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel

Review-Url: https://codereview.chromium.org/2779713002
Cr-Commit-Position: refs/heads/master@{#465322}

[modify] https://crrev.com/4ec47030b3d06cb198562b4647094191ba83b06e/content/browser/renderer_host/render_widget_host_view_browsertest.cc
[add] https://crrev.com/4ec47030b3d06cb198562b4647094191ba83b06e/content/test/data/page_with_animation.html

Blocking:
Components: -Internals>MUS Internals>Services>WindowService

Sign in to add a comment