New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 681077 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

FrameOwner is left without |contentFrame| after two cross-process navigations in a row

Project Member Reported by engedy@chromium.org, Jan 13 2017

Issue description

Steps to reproduce:

  1.) Assume the familiar cross-site redirection logic and test data from browsertests.
  2.) Navigate to `http://a.com/frame_tree/page_with_one_frame.html`:

<html>
<head></head>
<body>
  This page has one cross-site iframe.
  <iframe id="child0" src="/cross-site/baz.com/title1.html"></iframe>
</body>
</html>

  3.) Navigate the subframe from JS: document.getElementById("child0").src = "http://a.com/cross-site/baz.com/title2.html";
  4.) Evaluate: document.getElementById("child0").contentWindow;

Expectation: The navigation succeeds and `contentWindow` is non-null.

Reality:
 -- With both `--enable-browser-side-navigation` + `--site-per-process` enabled, the navigation succeeds but `contentWindow` becomes null. The HTMLFrameOwnerElement has no |contentFrame|. It seems that it is deleted in response to the renderer receiving a FrameMsg_Delete message (and nothing else afterwards).
 -- With only `--site-per-process`, the navigation takes place, but the loading never terminates on the browser side.

See repro: https://codereview.chromium.org/2635523002/
 

Comment 1 by engedy@chromium.org, Jan 13 2017

Cc: clamy@chromium.org nasko@chromium.org
Summary: FrameOwner is left without |contentFrame| after two cross-process navigations in a row (was: FrameOwner is left without contentFrame after two cross-process navigations in a row)
Local repro:

content_browsertests --gtest_filter=NavigationHandleImplBrowserTest.Bug* \
--disable-gpu --site-per-process --enable-browser-side-navigation

Is this a known issue?

Comment 2 by nasko@chromium.org, Jan 13 2017

Cc: dcheng@chromium.org
Adding dcheng@, as this might be Blink side issue.
Cc: alex...@chromium.org
This probably has something to do with the fact that you are using GetURL("a.com", "/cross-site/baz.com/title2.html") for allowed_subframe_url.   Using just GetURL("baz.com", "/title2.html") works fine in both modes, so I suspect this has something to do with the transfer logic not handling this case properly.

What's happening here is the subframe is at baz.com, then we navigate that to a.com, which transfers back to baz.com.  With only --site-per-process, it appears we get a DidCommit message for the redirected URL (baz.com/title2.html), but no DidStopLoading.

Comment 4 by engedy@chromium.org, Jan 13 2017

Yes, that matches my observations. Also note that this occurs with a subframe navigation sequence: [a.com -> baz.com, a.com -> foo.com] as well.

Comment 5 by engedy@chromium.org, Jan 16 2017

Owner: dcheng@chromium.org
Status: Available (was: Untriaged)
So did some investigating. Considerd the starting scenario:
 main frame: a.com
 child0: b.com

Trigger navigation in `child0`: a.com --redirect--> c.com:

 1.) The speculative RFH for the navigation in `child0` is created for site instance `a.com`.
 2.) On the renderer side, this corresponds to creating a provisional WebLocalFrame in the parent's process.
   -- This is initially created with a DummyOwner.
   -- But soon afterwards, the provisional WebLocalFrame's frame owner is set to the real HTMLIFrameElement (whose contentFrame is the real `child0` showing b.com).
 3.) After the redirect, it is discovered that the speculative RFH is no good, and is discarded.
   -- FrameMsg_Delete sent to the provisional WebLocalFrame.
   -- Frame::detach() -> Frame::disconnectOwnerElement() -> FrameOwner::clearContentFrame() on the real owner. :(
 4.) Ultimately, this turns out to be a remote -> remote subframe navigation, so no further administration is needed or done on frames/proxies on the renderer side (I think).

Comment 6 by dcheng@chromium.org, Jan 16 2017

That's unfortunate... I guess we can make clearContentFrame() become a no-op if it doesn't point back at the frame that called it, but that becomes pretty hacky.

clamy/nasko: any estimate of when we'll be able to get rid of provisional frames? =P

Comment 7 by dcheng@chromium.org, Jan 16 2017

Status: Started (was: Available)

Comment 8 by dcheng@chromium.org, Jan 17 2017

OK I've prepared a patch for the Blink-side issue, but it looks like it's still broken without --enable-browser-side-navigation... hmm...
Project Member

Comment 9 by bugdroid1@chromium.org, Jan 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0401e95afc66f0b87733d04fad5d233c04d35aa3

commit 0401e95afc66f0b87733d04fad5d233c04d35aa3
Author: dcheng <dcheng@chromium.org>
Date: Tue Jan 17 11:07:01 2017

Avoid mutating frame owner's when detaching a provisional frame.

When detaching a provisional frame, Blink was incorrectly clearing the
content frame of the provisional frame's frame owner. A provisional
frame is only partially attached to a frame tree. While it may have a
frame owner set, the frame owner's content frame will not point back
at the provisional frame. Similarly, though it may have a parent frame,
the provisional frame will not appear in the parent frame's list of
child nodes. Thus, it is important not to affect the frame tree
structure when detaching a provisional frame.

BUG= 681077 ,578349

Review-Url: https://codereview.chromium.org/2631233003
Cr-Commit-Position: refs/heads/master@{#444021}

[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/core/frame/Frame.cpp
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/core/frame/FrameOwner.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/core/html/HTMLFrameElementBase.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/core/html/HTMLFrameOwnerElement.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/core/html/HTMLIFrameElement.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/web/RemoteFrameOwner.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/web/tests/FrameTestHelpers.cpp
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/web/tests/FrameTestHelpers.h
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/Source/web/tests/WebFrameTest.cpp
[modify] https://crrev.com/0401e95afc66f0b87733d04fad5d233c04d35aa3/third_party/WebKit/public/web/WebLocalFrame.h

Cc: lukasza@chromium.org creis@chromium.org
Following up on #8: I looked into the browser-side issues for this with --site-per-process but without --enable-browser-side-navigation a bit more, and found a couple of interesting things, which help explain why this still doesn't work after Daniel's fix.

The setup is: we've got a a current RFH for baz.com in the subframe, let's say RFH1.  We navigate that to a.com/cross-site/baz.com/title2.html.  This creates a pending RFH, say RFH2, for a.com, which transfers back to baz.com.

1) During the transfer, we end up picking the RFH1 as the new navigation target.  That happens as the first thing in UpdateStateForNavigate:

  if (!frame_tree_node_->IsMainFrame() &&
      !CanSubframeSwapProcess(dest_url, source_instance, dest_instance)) {
    // Note: Do not add code here to determine whether the subframe should swap
    // or not. Add it to CanSubframeSwapProcess instead.
    return render_frame_host_.get();
  }

This (correctly) returns RFH1, but it seems to skips some crucial steps just below, namely, calling Transfer() on the transfer_navigation_handle_ (we're in the middle of transfer here after all), and canceling the pending RFH.  Not canceling the pending RFH possibly leads to leaking RFH2, AFAICT.  Skipping the Transfer() call is more interesting, as it later leads to RFH1's transferred baz.com request to be dropped in ResourceDispatcherHostImpl::CompleteTransfer, here:

  if (it == pending_loaders_.end()) {
    // Renderer sent transferred_request_request_id and/or
    // transferred_request_child_id that doesn't have a corresponding entry on
    // the browser side.
    // TODO(lukasza): https://crbug.com/659613: Need to understand the scenario
    // that can lead here (and then attempt to reintroduce a renderer kill
    // below).
    return;
  }

So this could be one explanation for why we saw the RDH_TRANSFERRING_REQUEST_NOT_FOUND kill in issues 659613 and 660407.  CC-ing Charlie and Lukasz for that.

I've verified that doing the Transfer() work and canceling the pending RFH for the !CanSubframeSwapProcess case in UpdateStateForNavigate seems to fix all this, and makes Balasz's test pass with --site-per-process.  I'll work on a CL for this.
This also explains why the `contentWindow` issue did not manifest without PlzNavigate. Based on my observations, RFH2 (which was a provisional frame in the parent's SiteInstance) was so much not canceled that it ended up loading the document, and it was RFH1 that got swapped out. So we essentially transferred back to the parent frame, meaning that RFH2 was never deleted, thus the issue with deleting a provisional frame was masked.

After patching in https://codereview.chromium.org/2636193003/, I can confirm that now we correctly discard RFH2, and complete the navigation in RFH1. Blink side frame structure looks good too.

Something still seems to be off with compositing/rendering, though. The frame is not interactive after the navigation, and keeps displaying the old content. Who would be the best person to look into that aspect?
Status: Fixed (was: Started)
I've just filed  issue 686184  for the remaining work here to fix the transfers and look at any compositing/rendering issues.  With that, I think we can close this one as fixed.
Project Member

Comment 13 by bugdroid1@chromium.org, Mar 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0431b6bc6c9bf5624cccf1121285392b07b74ec6

commit 0431b6bc6c9bf5624cccf1121285392b07b74ec6
Author: alexmos <alexmos@chromium.org>
Date: Wed Mar 01 02:52:43 2017

Fix cross-site subframe navigations that transfer back to original RFH.

This fixes a bug where if an OOPIF at site A navigates to site B which
redirects back to A, the navigation timed out because Transfer() was
never called on transfer_navigation_handle_, and the pending RFH for B
was never canceled.  This is because UpdateStateForNavigate returned
the current RFH early, due to checking CanSubframeSwapProcess before
reaching those steps.  Not calling Transfer() later led to the
transferred request being dropped in
ResourceDispatcherHostImpl::CompleteTransfer, due to not finding a
matching entry in pending_loaders_.  This CL fixes the
CanSubframeSwapProcess check to be performed after calling Transfer()
and canceling the pending RFH (if necessary).

This uncovered another complication, which is that while redirects
to data: and about: URLs are explicitly disallowed as unsafe (e.g.,
see DataProtocolHandler::IsSafeRedirectTarget), they are allowed when
an extension uses the webRequest API to redirect certain URLs.
One of the tests, ExtensionWebRequestApiTest.WebRequestDeclarative1,
was exercising this: it had a subframe which made a same-site request
that was redirected to data:,.  Previously, in --site-per-process, it
worked due to the ordering problem that this CL fixes: we first
decided to transfer the request (IsRendererTransferNeededForNavigation
returned true), then we returned the current RFH early due to
CanSubframeSwapProcess returning |false| before calling Transfer().
With the fixed ordering, we were now also calling Transfer(), as we
were expecting the data: request to be put into a new SiteInstance
which didn't match the current one, yet the request was never
transferred (we weren't spinning up a new pending RFH, and we weren't
sending a Navigate IPC to the current RFH due to the
is_transfer_to_same case in NavigateToEntry, so no one actually made a
request to the data: URL).

To address this, CanSubframeSwapProcess is changed to allow a process
swap for such transfers to data: and about:.  This changes behavior
where these redirects were previously kept in the original
SiteInstance, but this should be safer, and it should not break any
scripting relationships (the redirect URLs are provided by an
extension; the site that made the request that's being redirected
can't expect to script the redirect target).

BUG= 681077 ,660407,659613
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/2636193003
Cr-Commit-Position: refs/heads/master@{#453800}

[modify] https://crrev.com/0431b6bc6c9bf5624cccf1121285392b07b74ec6/content/browser/frame_host/render_frame_host_manager.cc
[modify] https://crrev.com/0431b6bc6c9bf5624cccf1121285392b07b74ec6/content/browser/frame_host/render_frame_host_manager.h
[modify] https://crrev.com/0431b6bc6c9bf5624cccf1121285392b07b74ec6/content/browser/site_per_process_browsertest.cc

Sign in to add a comment