New issue
Advanced search Search tips

Issue 765711 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

IsolatedOriginTest.ProcessLimit intermittently fails to reuse a process (Linux Tests Builder)

Project Member Reported by kelv...@chromium.org, Sep 15 2017

Issue description

Labels: Sheriff-Chromium
Summary: IsolatedOriginTest.ProcessLimit intermittently fails to reuse a process (Linux Tests Builder) (was: IsolatedOriginTest.ProcessLimit fails consistently on Linux Tests Builder)
So far I am not able to repro locally, but the failing test assertion seems to indicate that a process is not getting reused:

  // Navigate iframe back to isolated origin.  See that it reuses the
  // |new_shell| process.
  NavigateIframeToURL(web_contents(), "test_iframe", isolated_foo_url);
  EXPECT_NE(foo_process, child->current_frame_host()->GetProcess());
  EXPECT_EQ(isolated_foo_process, child->current_frame_host()->GetProcess());  <- THIS FAILS

Cc: clamy@chromium.org engedy@chromium.org
Components: UI>Browser>Navigation
+engedy@ in case this is related to r502254 (Mojofy DidCommitProvisionalLoad)

+clamy@ in case this is related to r502251 (enabling of PlzNavigate on trunk), although  as I said above I cannot repro at ToT...
Cc: kenrb@chromium.org
+kenrb@ who had better luck than me with reproing recent OOPIF-related regressions in layout tests - maybe the luck will also apply to this failure

I've talked with alexmos@ and I'll put together a CL that disables the test on Linux - this should get this failure off of sheriff's dashboard and at the same time keep the coverage via Windows/Mac/ChromeOS.

Comment 5 by kenrb@chromium.org, Sep 15 2017

I have locally verified that reverting r502251 fixes the layout test failures. Apparently this is an issue with HTTP layout tests with both PlzNavigate and --site-per-process turned on.

Comment 6 by kenrb@chromium.org, Sep 15 2017

Oh, to clarify, I was looking specifically at the failures on the Site Isolation bot, which I presume are related: https://build.chromium.org/p/chromium.fyi/builders/Site%20Isolation%20Linux/builds/20859
This flakiness also appears to be specific to just site_per_process_content_browsertests.  I also see the test being flaky on linux_site_isolation.  So let's disable it for --site-per-process mode and investigate?

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=site_per_process_content_browsertests&tests=IsolatedOriginTest.ProcessLimit

vs

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&tests=IsolatedOriginTest.ProcessLimit

Thanks for catching this Alex (and helping narrow down the scenarios in which the test will be disabled).  I note that the flakiness dashboard only shows failures on Linux bots, but this is most likely an artifact - CQ simply doesn't run tests with --site-per-process outside of the linux bots.

CL for disabling the test when --site-per-process is on: https://chromium-review.googlesource.com/c/chromium/src/+/669037
Project Member

Comment 9 by bugdroid1@chromium.org, Sep 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5f9683f8168038b8b6894169cc0bda3e59461e97

commit 5f9683f8168038b8b6894169cc0bda3e59461e97
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Fri Sep 15 19:55:45 2017

Disable IsolatedOriginTest.ProcessLimit test if --site-per-process is on

Bug:  765711 
Change-Id: Ia9068647f95847b2884c3488967cd3da5c96c58f
Reviewed-on: https://chromium-review.googlesource.com/669037
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502335}
[modify] https://crrev.com/5f9683f8168038b8b6894169cc0bda3e59461e97/content/browser/isolated_origin_browsertest.cc

Labels: -Sheriff-Chromium
Removing Sheriff-Chromium as the test is disabled.
Cc: -alex...@chromium.org lukasza@chromium.org
Owner: alex...@chromium.org
I also couldn't repro this locally, but here's one theory on why this could be happening.  There's a site-per-process-specific behavior quirk that's described a little higher up in the test, which causes the isolated.foo.com navigation in the second tab to create a new process when run with --site-per-process:

  // TODO(alexmos): with --site-per-process, this won't currently reuse the
  // subframe process, because the new SiteInstance will initialize its
  // process while it still has no site (during CreateBrowser()), and since
  // dedicated processes can't currently be reused for a SiteInstance with no
  // site, this creates a new process.  The subsequent navigation to
  // |isolated_foo_url| stays in that new process without consulting whether it
  // can now reuse a different process.  This should be fixed; see
  //  https://crbug.com/513036 .   Without --site-per-process, this works because
  // the site-less SiteInstance is allowed to reuse the first tab's foo.com
  // process (which isn't dedicated), and then it swaps to the isolated.foo.com
  // process during navigation.
  if (!AreAllSitesIsolatedForTesting())
    EXPECT_EQ(child->current_frame_host()->GetProcess(), isolated_foo_process);

This basically means that at this point, with --site-per-process, we might have two isolated origin processes.  When we navigate the subframe in the first tab away from isolated.foo.com [1], the first of those processes should be destroyed, but maybe it's not destroyed in time (if the pending delete RFH is still around) before we navigate that subframe back to isolated.foo.com in [2].  Then, we will have two isolated.foo.com processes around, and might pick the unexpected one for process reuse.  If that's the case, this is a race in the test and not a new bug in the process reuse logic.


[1] https://cs.chromium.org/chromium/src/content/browser/isolated_origin_browsertest.cc?l=429&rcl=40ab7f76c55635190680c68a93e727b3684e1d37

[2] https://cs.chromium.org/chromium/src/content/browser/isolated_origin_browsertest.cc?l=437&rcl=40ab7f76c55635190680c68a93e727b3684e1d37
Status: Started (was: Assigned)
Oh, I now have a local repro (I think I might've forgotten to reenable the test for --site-per-process before running, duh) and confirmed that the subframe's old isolated.foo.com process is getting reused, instead of the new_shell isolated.foo.com process.  So this is indeed a race in the test.  Fix on the way.
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fadadd12dec0f327ca792cc996c248eb0222c676

commit fadadd12dec0f327ca792cc996c248eb0222c676
Author: Alex Moshchuk <alexmos@chromium.org>
Date: Thu Oct 05 16:00:55 2017

Fix race with --site-per-process in IsolatedOriginTest.ProcessLimit

See  https://crbug.com/765711#c12  for the description of the race.
tldr is that with --site-per-process, we might have two
isolated.foo.com processes eligible for reuse if one of them doesn't
die in time, and we pick the one that the test doesn't expect for
navigating the subframe back to isolated.foo.com.  The fix is to wait
for the subframe's old isolated.foo.com RFH to be destroyed before
navigating it back to isolated.foo.com.  This ensures there's only
one process that can be reused.

Bug:  765711 
Change-Id: I23740e1f4e1aa7882f977300a5000630d49d9efe
Reviewed-on: https://chromium-review.googlesource.com/701289
Reviewed-by: Łukasz Anforowicz <lukasza@chromium.org>
Commit-Queue: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#506743}
[modify] https://crrev.com/fadadd12dec0f327ca792cc996c248eb0222c676/content/browser/isolated_origin_browsertest.cc

Status: Fixed (was: Started)
The test has been all green since being fixed and re-enabled for --site-per-process in r506743: 

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=site_per_process_content_browsertests&tests=IsolatedOriginTest.ProcessLimit

Sign in to add a comment