Race condition allows a regular site to reuse the process of a site requiring a dedicated process. |
||||||||||
Issue descriptionWhat steps will reproduce the problem? (1) Start chrome with: --renderer-process-limit=1 --isolate-origins=https://csreis.github.io/ about:blank about:blank (2) In first tab, go to http://tests.netsekure.org/slow.php?seconds=20 (3) In the next 20 seconds, switch to second tab and navigate to https://csreis.github.io. (4) Switch back to first tab and wait for it to commit. (5) Check task manager and see that the two sites ended up committing in the same process. The two sites shouldn't share a process, even when over process limit, because https://csreis.github.io requires a dedicated process. Here's my guess as to what's happening. Both tabs start out with blank pages that have site-less SiteInstances, sharing the same process that has IsUnused() still set to true. In (2), we pick that process for the navigation, but we'll call SetIsUsed() only on commit, which in this case will only happen after (3) also picks the same process for the dedicated site. To fix this, we'll probably have to consider pending navigations as part of the IsUnused checks.
,
Aug 15 2017
I found that this bug only occurs without PlzNavigate. With PlzNavigate, when the first tab commits, we notice that it's committing in a wrong process and kick it out to a different SiteInstance and process. This happens when RPHI::IsSuitableHost notices that the process where we are committing the unisolated URL in step 4 has a wrong origin lock (corresponding to the isolated origin from step 3). Here's how we get to it: #0 content::RenderProcessHostImpl::IsSuitableHost #1 content::SiteInstanceImpl::HasWrongProcessForURL #2 content::RenderFrameHostManager::DetermineSiteInstanceForURL #3 content::RenderFrameHostManager::GetSiteInstanceForNavigation #4 content::RenderFrameHostManager::GetSiteInstanceForNavigationRequest #5 content::RenderFrameHostManager::GetFrameHostForNavigation #6 content::NavigationRequest::OnResponseStarted #7 content::NavigationURLLoaderImpl::NotifyResponseStarted
,
Oct 12 2017
OK, I found that this can still be a problem even with PlzNavigate. The problem is that there could be a change in process->IsUnused() status in between the final IsSuitableHost() check performed as part of GetFrameHostForNavigation() from OnResponseStarted(), and the subsequent DidCommitProvisionalLoad. I.e., with process reuse, another site could still commit in the process we've picked for the isolated origin, after we've decided to commit in it, but before the DidCommit message arrives. I've got a browser test that hits the DCHECK(was_unused) that I introduced in SiteInstanceImpl::LockToOriginIfNeeded() [1], even with PlzNavigate. I think I can use a similar technique to also hit the CHECK_EQ that ensures that a non-isolated site doesn't commit in an isolated site's process, which I've just introduced in r507700. I currently have a test that does this flakily. These would be ways to shove a non-isolated origin into an isolated origin's process, so they might be a plausible explanation for issue 773809. [1] https://cs.chromium.org/chromium/src/content/browser/site_instance_impl.cc?l=522&rcl=6874b50208d2fa24f74170b67714bed68c63bcf1
,
Oct 12 2017
,
Oct 12 2017
,
Oct 12 2017
Nice find! Should we mark the process as used during the OnResponseStarted() phase, so that no one else picks it once we've decided to use it for an isolated origin?
,
Oct 12 2017
Yes, I've got a CL in progress to do that. I'm trying to mark the process as used a bit later than OnResponseStarted - in RFHI::CommitNavigation, once we pass all the throttles and know that the navigation won't be canceled.
,
Oct 12 2017
Should we also cover OnRequestFailed? We pick a RenderFrameHost in which to render the error page within that handler, so for completeness we probably want to ensure that one is covered too.
,
Oct 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/54bf1304adeb7761b0d02e10f3d902f38984f02e commit 54bf1304adeb7761b0d02e10f3d902f38984f02e Author: Alex Moshchuk <alexmos@chromium.org> Date: Fri Oct 13 04:25:29 2017 Fix races leading to incorrect process reuse decisions. This CL fixes a race that made it possible for an isolated origin to end up incorrectly sharing a process with a non-isolated origin. The race involved one of the origins reaching NavigationRequest::OnResponseStarted, which in PlzNavigate would make the final determination on which process the response should commit in, but before the DidCommit message arrived, another navigation ended up incorrectly reusing the same process. This is because the process was marked as "used" in DidCommit. There are actually two separate races here, depending on which origin reaches OnResponseStarted first, both fixed in this CL: 1. The non-isolated origin reaches OnResponseStarted in one tab, and then before it receives its DidCommit, we navigate another tab to an isolated origin with process reuse in effect. This should *not* reuse the first tab's process. To do this, we mark the process as "used" in OnResponseStarted. Then, when the isolated origin picks its process, IsSuitableHost() will return false for that process because IsUnused() will be false. 2. The isolated origin reaches OnResponseStarted in one tab, and then before it receives its DidCommit, we navigate another tab to a non-isolated origin with process reuse in effect. This is problematic because we were allowing lazily-initialized SiteInstances for isolated origins, which would only execute LockToOrigin at DidCommit time. Marking the process as used in OnResponseStarted isn't sufficient here: because the first tab process's origin lock isn't set when the second tab navigates, IsSuitableHost() doesn't see a problem because the non-isolated origin doesn't require a dedicated process, even though the process is already marked as used. This race shows that delaying LockToOrigin until DidCommit is not safe, so this CL also ensures that SiteInstances for sites requiring a dedicated process initialize their site URLs (and hence perform LockToOrigin()) earlier, in OnResponseStarted, once the process is "committed" to the site. Bug: 738634 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation Change-Id: Iaf04204b6c21f411924be5eaf5372909230e6545 Reviewed-on: https://chromium-review.googlesource.com/636787 Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Commit-Queue: Charlie Reis <creis@chromium.org> Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Commit-Position: refs/heads/master@{#508590} [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigation_request.cc [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigator_impl.cc [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/frame_host/navigator_impl.h [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/isolated_origin_browsertest.cc [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/site_instance_impl.cc [modify] https://crrev.com/54bf1304adeb7761b0d02e10f3d902f38984f02e/content/browser/site_instance_impl.h
,
Oct 13 2017
For #8: I actually found that doing this in CommitNavigation was too late, since a throttle could still defer the navigation after performing the final process assignment at the beginning of OnResponseStarted, opening up another window for the race. So I ended up doing this in OnResponseStarted, which should handle the error page cases as well.
,
Oct 13 2017
Looks like r508590 in #9 missed the branch cut by a few commits; it didn't help that the CQ took four hours to process the final PS. I'd like to request to merge it to M63: - this should fix various crashes in issues 773140, 773809, and 774027. - this is a well-contained, low-risk change that only affects site isolation scenarios. Should be safe especially given it's so early in M63 release cycle. I'll also verify it on the next canary once it's out and report back here.
,
Oct 14 2017
Your change meets the bar and is auto-approved for M63. Please go ahead and merge the CL to branch 3239 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 16 2017
** Bulk Edit ** Please merge your change to M63 branch 3239 before 5:00 PM PT Monday (10/16) so we can take it in for next dev release. Thank you.
,
Oct 16 2017
I've verified that r508590 behaves as expected on Mac canary 64.0.3241.0, so will proceed with the merge shortly. It was initially in 64.0.3240.0, and although it's still early to tell for sure, so far there have been no CHECKs in LockToOriginIfNeeded from issue 773809 or the renderer kills from issue 773140 since that version.
,
Oct 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/22dd92d49333f240b1fd905ee7ad45fb17893ac1 commit 22dd92d49333f240b1fd905ee7ad45fb17893ac1 Author: Alex Moshchuk <alexmos@chromium.org> Date: Mon Oct 16 16:57:07 2017 Fix races leading to incorrect process reuse decisions (Merge to M63) This CL fixes a race that made it possible for an isolated origin to end up incorrectly sharing a process with a non-isolated origin. The race involved one of the origins reaching NavigationRequest::OnResponseStarted, which in PlzNavigate would make the final determination on which process the response should commit in, but before the DidCommit message arrived, another navigation ended up incorrectly reusing the same process. This is because the process was marked as "used" in DidCommit. There are actually two separate races here, depending on which origin reaches OnResponseStarted first, both fixed in this CL: 1. The non-isolated origin reaches OnResponseStarted in one tab, and then before it receives its DidCommit, we navigate another tab to an isolated origin with process reuse in effect. This should *not* reuse the first tab's process. To do this, we mark the process as "used" in OnResponseStarted. Then, when the isolated origin picks its process, IsSuitableHost() will return false for that process because IsUnused() will be false. 2. The isolated origin reaches OnResponseStarted in one tab, and then before it receives its DidCommit, we navigate another tab to a non-isolated origin with process reuse in effect. This is problematic because we were allowing lazily-initialized SiteInstances for isolated origins, which would only execute LockToOrigin at DidCommit time. Marking the process as used in OnResponseStarted isn't sufficient here: because the first tab process's origin lock isn't set when the second tab navigates, IsSuitableHost() doesn't see a problem because the non-isolated origin doesn't require a dedicated process, even though the process is already marked as used. This race shows that delaying LockToOrigin until DidCommit is not safe, so this CL also ensures that SiteInstances for sites requiring a dedicated process initialize their site URLs (and hence perform LockToOrigin()) earlier, in OnResponseStarted, once the process is "committed" to the site. TBR=alexmos@chromium.org (cherry picked from commit 54bf1304adeb7761b0d02e10f3d902f38984f02e) Bug: 738634 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation Change-Id: Iaf04204b6c21f411924be5eaf5372909230e6545 Reviewed-on: https://chromium-review.googlesource.com/636787 Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Commit-Queue: Charlie Reis <creis@chromium.org> Reviewed-by: Charlie Reis <creis@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#508590} Reviewed-on: https://chromium-review.googlesource.com/721763 Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/branch-heads/3239@{#12} Cr-Branched-From: adb61db19020ed8ecee5e91b1a0ea4c924ae2988-refs/heads/master@{#508578} [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigation_request.cc [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigator_impl.cc [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/frame_host/navigator_impl.h [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/isolated_origin_browsertest.cc [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/site_instance_impl.cc [modify] https://crrev.com/22dd92d49333f240b1fd905ee7ad45fb17893ac1/content/browser/site_instance_impl.h
,
Oct 16 2017
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bce609e812bce6b02e1326773f06fb639020a847 commit bce609e812bce6b02e1326773f06fb639020a847 Author: Alex Moshchuk <alexmos@chromium.org> Date: Fri Oct 20 07:42:17 2017 Clean up ShouldAssignSiteURL usage in DetermineSiteInstanceForURL. This is following up on https://chromium-review.googlesource.com/c/chromium/src/+/636787/9/content/browser/frame_host/render_frame_host_manager.cc#1370. This seems better for consistency, and creis@ mentioned it shouldn't be a problem to skip about:blank site assignment on restore. Bug: 738634 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_site_isolation Change-Id: I7e9006034b70de4a10a488234abe6cd168cc8414 Reviewed-on: https://chromium-review.googlesource.com/726365 Reviewed-by: Charlie Reis <creis@chromium.org> Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#510370} [modify] https://crrev.com/bce609e812bce6b02e1326773f06fb639020a847/content/browser/frame_host/render_frame_host_manager.cc
,
Oct 23 2017
Verified the fix on Mac 10.12.6, Win-10 and Ubuntu 14.04 using latest chrome version #64.0.3247.0 as per the comment #0. Attaching screen cast and screen shot for reference. Observed that two sites have different process id. Hence, the fix is working as expected. Adding the verified labels. Thanks...!! |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by alex...@chromium.org
, Aug 11 2017