New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 738634 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocking:
issue 773140
issue 773809



Sign in to add a comment

Race condition allows a regular site to reuse the process of a site requiring a dedicated process.

Project Member Reported by alex...@chromium.org, Jul 1 2017

Issue description

What steps will reproduce the problem?
(1) Start chrome with:
--renderer-process-limit=1 --isolate-origins=https://csreis.github.io/ about:blank about:blank
(2) In first tab, go to http://tests.netsekure.org/slow.php?seconds=20
(3) In the next 20 seconds, switch to second tab and navigate to https://csreis.github.io.
(4) Switch back to first tab and wait for it to commit.
(5) Check task manager and see that the two sites ended up committing in the same process.

The two sites shouldn't share a process, even when over process limit, because https://csreis.github.io requires a dedicated process.

Here's my guess as to what's happening.  Both tabs start out with blank pages that have site-less SiteInstances, sharing the same process that has IsUnused() still set to true.  In (2), we pick that process for the navigation, but we'll call SetIsUsed() only on commit, which in this case will only happen after (3) also picks the same process for the dedicated site.  To fix this, we'll probably have to consider pending navigations as part of the IsUnused checks.

 
Labels: -Pri-3 Pri-2
I found that this bug only occurs without PlzNavigate.  With PlzNavigate, when the first tab commits, we notice that it's committing in a wrong process and kick it out to a different SiteInstance and process.  This happens when RPHI::IsSuitableHost notices that the process where we are committing the unisolated URL in step 4 has a wrong origin lock (corresponding to the isolated origin from step 3).  Here's how we get to it: 

#0  content::RenderProcessHostImpl::IsSuitableHost
#1  content::SiteInstanceImpl::HasWrongProcessForURL
#2  content::RenderFrameHostManager::DetermineSiteInstanceForURL
#3  content::RenderFrameHostManager::GetSiteInstanceForNavigation
#4  content::RenderFrameHostManager::GetSiteInstanceForNavigationRequest
#5  content::RenderFrameHostManager::GetFrameHostForNavigation
#6  content::NavigationRequest::OnResponseStarted
#7  content::NavigationURLLoaderImpl::NotifyResponseStarted

Status: Started (was: Assigned)
OK, I found that this can still be a problem even with PlzNavigate.  The problem is that there could be a change in process->IsUnused() status in between the final IsSuitableHost() check performed as part of GetFrameHostForNavigation() from OnResponseStarted(), and the subsequent DidCommitProvisionalLoad.  I.e., with process reuse, another site could still commit in the process we've picked for the isolated origin, after we've decided to commit in it, but before the DidCommit message arrives.  I've got a browser test that hits the DCHECK(was_unused) that I introduced in SiteInstanceImpl::LockToOriginIfNeeded() [1], even with PlzNavigate.  

I think I can use a similar technique to also hit the CHECK_EQ that ensures that a non-isolated site doesn't commit in an isolated site's process, which I've just introduced in r507700.  I currently have a test that does this flakily.

These would be ways to shove a non-isolated origin into an isolated origin's process, so they might be a plausible explanation for issue 773809.  

[1] https://cs.chromium.org/chromium/src/content/browser/site_instance_impl.cc?l=522&rcl=6874b50208d2fa24f74170b67714bed68c63bcf1
Blocking: 773809
Blocking: 773140

Comment 6 by creis@chromium.org, Oct 12 2017

Cc: clamy@chromium.org nasko@chromium.org arthurso...@chromium.org
Components: UI>Browser>Navigation
Nice find!  Should we mark the process as used during the OnResponseStarted() phase, so that no one else picks it once we've decided to use it for an isolated origin?
Yes, I've got a CL in progress to do that.  I'm trying to mark the process as used a bit later than OnResponseStarted - in RFHI::CommitNavigation, once we pass all the throttles and know that the navigation won't be canceled.

Comment 8 by nasko@chromium.org, Oct 12 2017

Should we also cover OnRequestFailed? We pick a RenderFrameHost in which to render the error page within that handler, so for completeness we probably want to ensure that one is covered too.
Project Member

Comment 9 by bugdroid1@chromium.org, Oct 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/54bf1304adeb7761b0d02e10f3d902f38984f02e

commit 54bf1304adeb7761b0d02e10f3d902f38984f02e
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Fri Oct 13 04:25:29 2017

Fix races leading to incorrect process reuse decisions.

This CL fixes a race that made it possible for an isolated origin to
end up incorrectly sharing a process with a non-isolated origin.  The
race involved one of the origins reaching
NavigationRequest::OnResponseStarted, which in PlzNavigate would
make the final determination on which process the response should
commit in, but before the DidCommit message arrived, another
navigation ended up incorrectly reusing the same process.  This is
because the process was marked as "used" in DidCommit.

There are actually two separate races here, depending on which origin
reaches OnResponseStarted first, both fixed in this CL:

1. The non-isolated origin reaches OnResponseStarted in one tab, and
then before it receives its DidCommit, we navigate another tab to an
isolated origin with process reuse in effect.  This should *not* reuse
the first tab's process.  To do this, we mark the process as "used" in
OnResponseStarted.  Then, when the isolated origin picks its process,
IsSuitableHost() will return false for that process because IsUnused()
will be false.

2. The isolated origin reaches OnResponseStarted in one tab, and then
before it receives its DidCommit, we navigate another tab to a
non-isolated origin with process reuse in effect.  This is problematic
because we were allowing lazily-initialized SiteInstances for isolated
origins, which would only execute LockToOrigin at DidCommit time.
Marking the process as used in OnResponseStarted isn't sufficient
here: because the first tab process's origin lock isn't set when the
second tab navigates, IsSuitableHost() doesn't see a problem because
the non-isolated origin doesn't require a dedicated process, even
though the process is already marked as used.  This race shows that
delaying LockToOrigin until DidCommit is not safe, so this CL also
ensures that SiteInstances for sites requiring a dedicated process
initialize their site URLs (and hence perform LockToOrigin()) earlier,
in OnResponseStarted, once the process is "committed" to the site.

Bug:  738634 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation
Change-Id: Iaf04204b6c21f411924be5eaf5372909230e6545
Reviewed-on: https://chromium-review.googlesource.com/636787
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Commit-Queue: Charlie Reis <creis@chromium.org>
Reviewed-by: Charlie Reis <creis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#508590}
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigation_request.cc
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigator_impl.cc
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigator_impl.h
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/isolated_origin_browsertest.cc
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/site_instance_impl.cc
[modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/site_instance_impl.h

For #8: I actually found that doing this in CommitNavigation was too late, since a throttle could still defer the navigation after performing the final process assignment at the beginning of OnResponseStarted, opening up another window for the race.  So I ended up doing this in OnResponseStarted, which should handle the error page cases as well.
Labels: Merge-Request-63
Looks like r508590 in #9 missed the branch cut by a few commits; it didn't help that the CQ took four hours to process the final PS.  I'd like to request to merge it to M63: 
- this should fix various crashes in issues 773140, 773809, and 774027.
- this is a well-contained, low-risk change that only affects site isolation 
 scenarios.  Should be safe especially given it's so early in M63 release cycle.
I'll also verify it on the next canary once it's out and report back here.
Project Member

Comment 12 by sheriffbot@chromium.org, Oct 14 2017

Labels: -Merge-Request-63 Hotlist-Merge-Approved Merge-Approved-63
Your change meets the bar and is auto-approved for M63. Please go ahead and merge the CL to branch 3239 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
** Bulk Edit **

Please merge your change to M63 branch 3239 before 5:00 PM PT Monday (10/16) so we can take it in for next dev release. Thank you.
I've verified that r508590 behaves as expected on Mac canary 64.0.3241.0, so will proceed with the merge shortly.  It was initially in 64.0.3240.0, and although it's still early to tell for sure, so far there have been no CHECKs in LockToOriginIfNeeded from issue 773809 or the renderer kills from issue 773140 since that version.
Project Member

Comment 15 by bugdroid1@chromium.org, Oct 16 2017

Labels: -merge-approved-63 merge-merged-3239
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/22dd92d49333f240b1fd905ee7ad45fb17893ac1

commit 22dd92d49333f240b1fd905ee7ad45fb17893ac1
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Mon Oct 16 16:57:07 2017

Fix races leading to incorrect process reuse decisions (Merge to M63)

This CL fixes a race that made it possible for an isolated origin to
end up incorrectly sharing a process with a non-isolated origin.  The
race involved one of the origins reaching
NavigationRequest::OnResponseStarted, which in PlzNavigate would
make the final determination on which process the response should
commit in, but before the DidCommit message arrived, another
navigation ended up incorrectly reusing the same process.  This is
because the process was marked as "used" in DidCommit.

There are actually two separate races here, depending on which origin
reaches OnResponseStarted first, both fixed in this CL:

1. The non-isolated origin reaches OnResponseStarted in one tab, and
then before it receives its DidCommit, we navigate another tab to an
isolated origin with process reuse in effect.  This should *not* reuse
the first tab's process.  To do this, we mark the process as "used" in
OnResponseStarted.  Then, when the isolated origin picks its process,
IsSuitableHost() will return false for that process because IsUnused()
will be false.

2. The isolated origin reaches OnResponseStarted in one tab, and then
before it receives its DidCommit, we navigate another tab to a
non-isolated origin with process reuse in effect.  This is problematic
because we were allowing lazily-initialized SiteInstances for isolated
origins, which would only execute LockToOrigin at DidCommit time.
Marking the process as used in OnResponseStarted isn't sufficient
here: because the first tab process's origin lock isn't set when the
second tab navigates, IsSuitableHost() doesn't see a problem because
the non-isolated origin doesn't require a dedicated process, even
though the process is already marked as used.  This race shows that
delaying LockToOrigin until DidCommit is not safe, so this CL also
ensures that SiteInstances for sites requiring a dedicated process
initialize their site URLs (and hence perform LockToOrigin()) earlier,
in OnResponseStarted, once the process is "committed" to the site.

TBR=alexmos@chromium.org

(cherry picked from commit 54bf1304adeb7761b0d02e10f3d902f38984f02e)

Bug:  738634 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation
Change-Id: Iaf04204b6c21f411924be5eaf5372909230e6545
Reviewed-on: https://chromium-review.googlesource.com/636787
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Commit-Queue: Charlie Reis <creis@chromium.org>
Reviewed-by: Charlie Reis <creis@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#508590}
Reviewed-on: https://chromium-review.googlesource.com/721763
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/branch-heads/3239@{#12}
Cr-Branched-From: adb61db19020ed8ecee5e91b1a0ea4c924ae2988-refs/heads/master@{#508578}
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigation_request.cc
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigator_impl.cc
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigator_impl.h
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/isolated_origin_browsertest.cc
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/site_instance_impl.cc
[modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/site_instance_impl.h

Status: Fixed (was: Started)
Project Member

Comment 17 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bce609e812bce6b02e1326773f06fb639020a847

commit bce609e812bce6b02e1326773f06fb639020a847
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Fri Oct 20 07:42:17 2017

Clean up ShouldAssignSiteURL usage in DetermineSiteInstanceForURL.

This is following up on https://chromium-review.googlesource.com/c/chromium/src/+/636787/9/content/browser/frame_host/render_frame_host_manager.cc#1370.

This seems better for consistency, and creis@ mentioned it shouldn't
be a problem to skip about:blank site assignment on restore.

Bug:  738634 
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation
Change-Id: I7e9006034b70de4a10a488234abe6cd168cc8414
Reviewed-on: https://chromium-review.googlesource.com/726365
Reviewed-by: Charlie Reis <creis@chromium.org>
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#510370}
[modify] https://crrev.com/bce609e812bce6b02e1326773f06fb639020a847/content/browser/frame_host/render_frame_host_manager.cc

Labels: TE-Verified-M64 TE-Verified-64.0.3247.0
Verified the fix on Mac 10.12.6, Win-10 and Ubuntu 14.04 using latest chrome version #64.0.3247.0 as per the comment #0.
Attaching screen cast and screen shot for reference.
Observed that two sites have different process id.
Hence, the fix is working as expected. 
Adding the verified labels.

Thanks...!!
738634.webm
8.7 MB View Download
738634_img.JPG
41.4 KB View Download

Sign in to add a comment