Address sad tab regression with site isolation |
||||||||||
Issue descriptionThe site isolation experiment showed regression in frequency of sad tabs, and frequency of GPU OOM. In theory, frequency of sad tabs should be reduced by site isolation, since android as the opportunity to kill frames first. This suggests android is killing services out of order. Gather data on this, and then see if it's possible provide a strong hint to android. Current idea is maybe always keep the lowest ranked process at waived binding, even if it's visible. ⛆ |
|
|
,
Jun 28 2018
two days of data form canary. This is the entire population, not just those that has site isolation enabled. It's recorded when a child process with strong binding is oom killed when the app is in the foreground. It records the binding state of *remaining* processes. No waived no moderate no strong 43 01.81% No waived no moderate has strong 765 32.17% No waived has moderate no strong 38 01.60% No waived has moderate has strong 351 14.76% Has waived no moderate no strong 23 00.97% Has waived no moderate has strong 448 18.84% Has waived has moderate no strong 59 02.48% Has waived has moderate has strong 651 27.38% Assuming we trust the count, that's pretty conclusive evidence that android oom killer doesn't actually respect the order we hint. 60+% of the time, the process with strong binding is killed when there are processes with lower priority bindings. Not really enough data to separate out different android versions, but should do that once there is more data. But other than that, I'm not sure if there is anything useful to draw from the data (so far), because we don't have a measure of the prior. Eg one interpretation is that there isn't much of difference between moderate and waived: it's about 50-50 whether there is waived binding left, and 46-54 whether there is a moderate binding left, and going from one to the other doesn't seem like a big win. However I think that interpretation is wrong, since assuming moderate binding management is working as expected, all the moderate binding processes should be gone (if site isolation is disabled), so waived has a much larger prior. I'm not sure how to measure the prior though. Maybe count oom kills for waived and moderate and check if there are any strongly bound process left? Seems overkill, maybe I'll just do something simple like disable moderate binding management altogether for site isolation and see if there's any impact on sad tabs.
,
Jun 28 2018
Another idea for tweaking bindings: turns out we know whether a frame is in the viewport in the browser. Maybe we can use that treat frames that are not in the viewport as waived.
,
Jun 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f4924c30f7d0b6dc8ab57018c534e499052d453e commit f4924c30f7d0b6dc8ab57018c534e499052d453e Author: Bo Liu <boliu@chromium.org> Date: Fri Jun 29 21:58:18 2018 android: Disable binding mangement for site isolation This is a speculative experiment (since it's a simple change) to see if it has any effect on the rate of sad tabs when site isolation is enabled. This should be reverted before M69 branch point, and definitely before site isolation actually ships to beta. Bug: 855037 Bug: 859208 Change-Id: I90c7cce4a03603d8bd9772ead804eeabb4114bad Reviewed-on: https://chromium-review.googlesource.com/1120864 Commit-Queue: Bo <boliu@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#571660} [modify] https://crrev.com/f4924c30f7d0b6dc8ab57018c534e499052d453e/content/browser/child_process_launcher_helper_android.cc [modify] https://crrev.com/f4924c30f7d0b6dc8ab57018c534e499052d453e/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java
,
Jun 29 2018
Thanks-- I'm interested to see the results, to see whether that prioritization really isn't having any effect. Setting a reminder to make sure we revert before branch (which is July 19).
,
Jun 29 2018
Oh, ha, did that too in crbug.com/859208 . We even picked the same date :)
,
Jul 10
For that remaining process UMA in #2, the data on dev looks quite different from canary, even if I only look at the control population on canary which do not have site isolation enabled. From dev: No waived no moderate no strong 1,446 03.62% No waived no moderate has strong 19,097 47.76% No waived has moderate no strong 929 02.32% No waived has moderate has strong 8,040 20.11% Has waived no moderate no strong 179 00.45% Has waived no moderate has strong 5,281 13.21% Has waived has moderate no strong 241 00.60% Has waived has moderate has strong 4,772 11.93% That's in a better direction, as the "No waived no moderate" category is higher, indicating that the strong binding is more likely to be killed after everything else. So that makes me more hopeful. Not sure about the discrepancy though, just different population I guess?
,
Jul 10
wrote up the idea in #3: https://chromium-review.googlesource.com/c/chromium/src/+/1132468 It reuses (I hope correctly) logic for intersection observer. Intersection observer is using the outer viewport, not the inner viewport though, which came as somewhat of a surprise when I was testing it manually. But shouldn't matter much in practice, hopefully.
,
Jul 11
,
Jul 13
Result from disabling binding management: With binding management: GPU OOM: 215 CPM Renderer OOM: 234 CPM Over 9.7M page loads Without binding management: GPU OOM: 147 CPM Renderer OOM: 425 CPM Over 8.4M page loads GPM OOM going down is nice. But renderer OOM doubled. Moderate binding only affects renderers, and my guess is that there are race conditions when android killed a renderer process with waived binding right when we are adding strong binding to it. And looking at "remaining binding state", the "has waived no moderate" category jumped from 21% to 38%. That's probably expected. It's "changing the prior" I mentioned in #2. I think this is all good news. Even though android doesn't absolutely respect the hint with provide with bindings, it does have some effect. Gonna revert the CL now.
,
Jul 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c1b78a91a591d35bf5da3a76158e7868ca4b7522 commit c1b78a91a591d35bf5da3a76158e7868ca4b7522 Author: Bo <boliu@chromium.org> Date: Fri Jul 13 16:02:58 2018 Revert "android: Disable binding mangement for site isolation" This reverts commit f4924c30f7d0b6dc8ab57018c534e499052d453e. Reason for revert: Done with experiment. Original change's description: > android: Disable binding mangement for site isolation > > This is a speculative experiment (since it's a simple change) to see if > it has any effect on the rate of sad tabs when site isolation is > enabled. > > This should be reverted before M69 branch point, and definitely before > site isolation actually ships to beta. > > Bug: 855037 > Bug: 859208 > Change-Id: I90c7cce4a03603d8bd9772ead804eeabb4114bad > Reviewed-on: https://chromium-review.googlesource.com/1120864 > Commit-Queue: Bo <boliu@chromium.org> > Reviewed-by: Alex Moshchuk <alexmos@chromium.org> > Cr-Commit-Position: refs/heads/master@{#571660} TBR=boliu@chromium.org,alexmos@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 855037, 859208 Change-Id: I70917a54a7f3a4c02baa95ddba5016ce7c8948ad Reviewed-on: https://chromium-review.googlesource.com/1136611 Reviewed-by: Bo <boliu@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#574936} [modify] https://crrev.com/c1b78a91a591d35bf5da3a76158e7868ca4b7522/content/browser/child_process_launcher_helper_android.cc [modify] https://crrev.com/c1b78a91a591d35bf5da3a76158e7868ca4b7522/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java
,
Jul 13
,
Jul 27
,
Jul 28
,
Jul 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f3a5a641fa5b8cf069122bfcc5951c423711ddfc commit f3a5a641fa5b8cf069122bfcc5951c423711ddfc Author: Bo Liu <boliu@chromium.org> Date: Mon Jul 30 17:28:44 2018 Add intersects_viewport to renderer process priority "intersects_viewport" is implemented for sub frames as whether it intersects with the root document viewport. Note this is independent of page/tab visibility, a frame can be in a background tab can still have intersects_viewport be true. It's always true for the root frame. On Android, drop processes that are visible but does not intersect with viewport down to waived binding (ignoring binding management). Bug: 855037 Change-Id: Ib912418bb0cf82d001b6fd50077fb81425093436 Reviewed-on: https://chromium-review.googlesource.com/1132468 Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#579078} [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/child_process_launcher.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/child_process_launcher.h [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/child_process_launcher_helper_android.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/frame_host/render_frame_host_impl.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/renderer_host/render_process_host_impl.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/renderer_host/render_process_host_impl.h [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/renderer_host/render_widget_host_impl.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/renderer_host/render_widget_host_impl.h [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/renderer_host/render_widget_host_view_child_frame.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/browser/site_per_process_browsertest.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/android/java/src/org/chromium/content/browser/ChildProcessRanking.java [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/android/junit/src/org/chromium/content/browser/ChildProcessRankingTest.java [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/browser/render_process_host.h [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/test/mock_render_process_host.cc [modify] https://crrev.com/f3a5a641fa5b8cf069122bfcc5951c423711ddfc/content/public/test/mock_render_process_host.h
,
Aug 6
unscientific experiment for the CL above, by comparing stats before and after it landed. Just quoting some numbers rather than linking to uma, since this is a public bug. Using 8.4M page loads for before CL, and 3.4M page loads for after CL to normalize stats. All numbers in CPM unless otherwise noted GPU OOM: 230 vs 209 Foreground visible main frame renderer OOM: 140 vs 125 Foreground visible sub frame renderer OOM: 331 vs 10017 That sub frame OOM might look startling, but that's actually expected. That metric only accounts for page visibility, but does not take into account whether a sub frame intersects with viewport. And subframe that does not intersect with viewport only has waived bindings. We have Stability.ChildFrameCrash.Visibility that break down subframe depth: Crash while visible (ie intersect with viewport): 143 vs 91 Shown after crashing: 1647 vs 2464 Never visible after crash: 24059 vs 31988 So most of the time when a frame that isn't in the viewport is killed, the user never sees that frame again. Also comparing % increase of some metrics compared to when site isolation is not enabled, piggy backing on the v8 experiment (these are not adjusted by page loads): Gpu OOM: +170% vs +80% Foreground visible main frame OOM: +84% vs +34% IMO this is the right trade off, and a step in the right direction :)
,
Aug 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5f1ae0ce131b847b7851e0ccf22f810fd6d92df8 commit 5f1ae0ce131b847b7851e0ccf22f810fd6d92df8 Author: Siddhartha <ssid@chromium.org> Date: Mon Aug 06 22:42:19 2018 Android: Record the number of strong bindings at OOMs BUG=855037 Change-Id: Ifb2d3298db6cc368ceb5f7ed4de7cf01f412be81 Reviewed-on: https://chromium-review.googlesource.com/1161381 Reviewed-by: Bo <boliu@chromium.org> Reviewed-by: Steven Holte <holte@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Siddhartha S <ssid@chromium.org> Cr-Commit-Position: refs/heads/master@{#581019} [modify] https://crrev.com/5f1ae0ce131b847b7851e0ccf22f810fd6d92df8/components/crash/content/browser/crash_metrics_reporter_android.cc [modify] https://crrev.com/5f1ae0ce131b847b7851e0ccf22f810fd6d92df8/tools/metrics/histograms/histograms.xml
,
Sep 6
There are two more things I want to try here. Rank invisible tab main frame higher than visible tab iframe that's outside the viewport. Current order is 1) visible main frame 2) visible subframe in viewport 3) visible subframe outside of viewport 4) invisible main frame 5) invisible subframe in viewport 6) invisible subframe outside of viewport So I'm thinking of swapping 3) and 4) in that list. Now that binding manager uses the same order (more or less), I expect this should help with sad tab, but regress sad frame. Make sure there is at least one running process not in the binding manager (ie have moderate binding added). We learned that "important" and "default" bindings are actually the same wrt OOM kill order, so the only way to ensure the important processes remain running is to drop the lowest one to waived.
,
Sep 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ef28294169f3e729b1e103f99d0eeecbc4447970 commit ef28294169f3e729b1e103f99d0eeecbc4447970 Author: Bo Liu <boliu@chromium.org> Date: Sat Sep 08 00:34:23 2018 android: Raise invisible main frame rank Before this CL, the ranking order is: 1) visible main frame 2) visible subframe in viewport 3) visible subframe outside of viewport 4) invisible main frame 5) invisible subframe in viewport 6) invisible subframe outside of viewport This CL moves 4) above 3). This complicates the comparison algorithm quite a bit, since it's no longer ok to compare fields independently. Expand testRanking to cover all cases in the list above. Also make failures in unit test easier to debug by using assertArrayEquals. Bug: 855037 Change-Id: I4aa1c785fcdb6978cfeca9bf8e6989dd6d3153ea Reviewed-on: https://chromium-review.googlesource.com/1213924 Reviewed-by: Siddhartha S <ssid@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#589736} [modify] https://crrev.com/ef28294169f3e729b1e103f99d0eeecbc4447970/content/public/android/java/src/org/chromium/content/browser/ChildProcessRanking.java [modify] https://crrev.com/ef28294169f3e729b1e103f99d0eeecbc4447970/content/public/android/junit/src/org/chromium/content/browser/ChildProcessRankingTest.java
,
Sep 15
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/686c9d425ef74d1021d31ca386f4f412d867ff32 commit 686c9d425ef74d1021d31ca386f4f412d867ff32 Author: Bo Liu <boliu@chromium.org> Date: Sat Sep 15 01:42:23 2018 android: Lower out of viewport iframe priority Before this change, processes that gains visibility are added to the binding manager. This means when switching tabs, iframes that are out of the viewport are added to binding manager as well. This changes the condition for adding to binding manager to be gaining both visibility and intersecting viewport. This should reduce number of processes being added to binding manager if user is simply switching between tabs without scrolling. Bug: 855037 Change-Id: Iac971ec2ce8e6e36c081bbf26c5cf3f3626ff9dc Reviewed-on: https://chromium-review.googlesource.com/1227360 Reviewed-by: Siddhartha S <ssid@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#591555} [modify] https://crrev.com/686c9d425ef74d1021d31ca386f4f412d867ff32/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java
,
Sep 18
Looking at the 1-week of data on canary for after #19 "android: Raise invisible main frame rank" landed. There doesn't appear to be much of an effect on ooms or sad frames. Although there's a lot of noise, and the experiment groups switched mid way through the week.
,
Oct 5
When looking at OOM metrics in ProcessedCrashCounts, we should exclude kitkat due to crbug.com/879259. Looking at dev experiment over the last 28 days and excluding kitkat, for full site isolation, GPU OOM CPM is up +60%, and main renderer OOM is up 27%. That's a lot better than the original canary experiment where main renderer OOM was up 180%. So making progress here :p I'm gonna revert #20. klobag reported seeing sad frames after an initial load and scrolling to an iframe, and it's probably caused by that. That is a bit too aggressive I guess. Instead, I'm going to try implement the other idea: always make sure there is one offscreen process that's in waived binding for android to kill. Hopefully that's less aggressive
,
Oct 5
Actually, Stability.Android.RendererCrash (ie sad tabs) which can be caused by either oom or crash are down compared to control. If we could reload iframes so user don't see sad frames, then we'd be golden :p
,
Oct 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/33038b5100d299d348819478d8469e653bf4583b commit 33038b5100d299d348819478d8469e653bf4583b Author: Bo <boliu@chromium.org> Date: Mon Oct 08 23:11:49 2018 Revert "android: Lower out of viewport iframe priority" This reverts commit 686c9d425ef74d1021d31ca386f4f412d867ff32. Reason for revert: Policy is too aggressive and causes too many sad frames on newly loaded pages. This affects user experience too much. Original change's description: > android: Lower out of viewport iframe priority > > Before this change, processes that gains visibility are added to the > binding manager. This means when switching tabs, iframes that are out of > the viewport are added to binding manager as well. > > This changes the condition for adding to binding manager to be gaining > both visibility and intersecting viewport. This should reduce number of > processes being added to binding manager if user is simply switching > between tabs without scrolling. > > Bug: 855037 > Change-Id: Iac971ec2ce8e6e36c081bbf26c5cf3f3626ff9dc > Reviewed-on: https://chromium-review.googlesource.com/1227360 > Reviewed-by: Siddhartha S <ssid@chromium.org> > Commit-Queue: Bo <boliu@chromium.org> > Cr-Commit-Position: refs/heads/master@{#591555} TBR=boliu@chromium.org,ssid@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 855037 Change-Id: I0f0471e9e071d52981098d7df1dc63d55e4c9152 Reviewed-on: https://chromium-review.googlesource.com/c/1269626 Reviewed-by: Bo <boliu@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#597722} [modify] https://crrev.com/33038b5100d299d348819478d8469e653bf4583b/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0def1f891be90fb401b11db8ed77577981564e46 commit 0def1f891be90fb401b11db8ed77577981564e46 Author: Bo Liu <boliu@chromium.org> Date: Wed Oct 10 01:30:18 2018 android: Ensure one waived process Idea is that if the lowest ranked connection is in the BindingManager, then BindingManager will remove moderate binding from the connection without removing the connection from the BindingManager. Client is responsible for calling rankingChanged when ranking changes. Bug: 855037 Change-Id: I677c6ee5bdd6a5094f087f0469fa6309bf6ee31b Reviewed-on: https://chromium-review.googlesource.com/c/1266197 Commit-Queue: Bo <boliu@chromium.org> Reviewed-by: Siddhartha S <ssid@chromium.org> Cr-Commit-Position: refs/heads/master@{#598168} [modify] https://crrev.com/0def1f891be90fb401b11db8ed77577981564e46/content/public/android/java/src/org/chromium/content/browser/BindingManager.java [modify] https://crrev.com/0def1f891be90fb401b11db8ed77577981564e46/content/public/android/java/src/org/chromium/content/browser/ChildProcessLauncherHelperImpl.java [modify] https://crrev.com/0def1f891be90fb401b11db8ed77577981564e46/content/public/android/junit/src/org/chromium/content/browser/BindingManagerTest.java
,
Jan 9
Adding label to indicate that this is important for shipping site isolation on Android. As #c23 mentions, in the latest trials, sad tabs were actually down with site isolation compared to control, so Bo's work here may already be all that we need, and we'll just need to finish figuring out what to do with the increased number of sad frames in issue 841572. We should also revisit the sad tab/frame data once we finish implementing dynamic isolation of important sites (e.g., after a entering a password) in issue 905513, and once we improve process consolidation in issue 787576.
,
Jan 9
Same tabs are down, but that's because sad tabs due to crash is down. If we only look at sad tab due to OOM, it's still up. But I think at this point, I don't think we should pursue more things, since I did actually go to far and caused too many side frames that caused a general page load dip. I guess can leave this open to check things after dynamic isolation is done. |
|||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by bugdroid1@chromium.org
, Jun 22 2018