Destroying/recreating processes for i-frames can significantly slow navigation within a site |
|||
Issue descriptionThe performance of navigating within a site that has many ad domains is avoidably reduced because the processes associated with the ad domains on the page are destroyed and then must be recreated after navigation. Since process creation is particularly expensive on Windows I have tagged this as a Windows bug, but it is assumed to affect all operating systems where site isolation is enabled. I noticed this when testing on smbc-comics.com, specifically https://www.smbc-comics.com/comic/2013-09-06. I chose this site because I noticed that the scrolling stuttered a lot after navigating to a new comic, and it turns out it uses a lot of ad domains. Each page has i-frames from about ten different ad domains, so a single tab of smbc-comics.com may have eleven renderer processes. When I navigate to a different comic the main renderer process is retained, but the i-frame processes are all destroyed and then recreated. Perversely enough this means that if I keep another tab open to smbc-comics.com then navigation performance is improved because far fewer process need re-creating. In one pathological case each process was taking 5 seconds of CPU time at startup. A reboot resolved that odd case, but process creation appears to take at least 100 ms of CPU time, probably more. and on the Surface Go I was testing on the navigation scenario was CPU bound (all four hardware threads running at 100%) for about four seconds, so any reduction in CPU usage has the potential to speed navigation. If we briefly retained those i-frame processes (I know, CPU/memory tradeoffs) when we are retaining the main process then we might make these navigations noticeably faster, especially on slow computers. The main CPU savings I saw in my three-navigation test were in chrome.exe (13.6 s), but System (1.4 s) and MsMpEng.exe (1.0 s) also consumed less CPU time. Note that the CPU savings were probably reduced by having a background tab that was consuming cycles, so an actual fix would be even better, perhaps significantly better. In my three-navigation test I saw 24 process creations normally, but with a background tab containing another smbc-comics.com page (to keep the i-frame processes around) I saw just seven process creations. A first step towards investigating/addressing this would be to avoid shutting down the subframe processes (while continuing to delete the RenderFrameHosts) for a short delay after same-site navigations. We could make this delay configurable with a command-line option during the investigatory phase. We already delay subframe process shutdown for 1 second (kUnloadTimeoutMS ) to let subframes run unload handlers and this delay could be conditionally increased, for same-site navigations or (initially) for all navigations. We would probably want to skip this when under memory pressure (similar to https://crrev.com/590360), and we would want to find a good timeout for when the process can go away when unused. In my Surface Go tests it took a maximum of 5.9 seconds from when the old page's processes were destroyed to when all of the new processes were created, so increasing the timeout by 6 seconds would be enough, just barely, in this case. Tagged as blocking bug 890032 because this issue was found while investigating why scrolling was choppy shortly after navigation.
,
Oct 11
Comment 1: I imagine that the renderer process shuts down when either the timer expires or when all the frames have been cleaned up, whichever comes first. alexmos@: Is that correct, or do we always wait for the timer? I wonder if we can experiment with always waiting for the timer for evaluating how to improve performance here.
,
Oct 11
#2: I'd have expected us to always wait for the timer, since we don't wait for subframe unload ACKs yet (that'll only happen once arthursonzogni@ fixes issue 609963 and when we actually keep the subframe RFHs around), and we have also removed the renderer-initiated shutdown paths in r584502 (though that is only in M70+). If that's not happening, I'm curious to know why - I'll try to find some time to poke around.
,
Oct 11
I can't repro the subframe process disappearing instantly. Here's what I tried (on Linux ToT): - Modify kUnloadTimeoutMS to 10000 in render_view_host_impl.cc. - Comment out "&& GetSuddenTerminationDisablerState(blink::kUnloadHandler)" in ~RFHI. Going to http://csreis.github.io/tests/cross-site-iframe-simple.html, and then to going to http://csreis.github.io/tests/cross-site-iframe.html, I see the subframe process shutdown getting delayed properly, and if you click "Go cross-site (simple page)" within 10 seconds, it'll reuse the old process with the same PID. Bruce, is there a difference between this and what you tried? Also tried going to https://www.smbc-comics.com/comic/2013-09-06 and then navigating to another same-site comic. I saw that some subframe processes were now shared across the two pages (e.g., kickstarter.com, googlesyndication.com - yay), but others weren't, since some of the ad subframe sites appear to dynamically rotate fairly frequently, and the new page didn't load the same set of sites. So in practice, this might limit the process reuse benefit we get.
,
Oct 12
I think I misinterpreted the data. When I looked at task manager I saw the processes disappear, but when the navigation completed the processes came back with the same PIDs so all I was seeing was that task manager was temporarily not showing them. When I looked at an ETW trace I saw a high process count, but that appears to be a separate issue, caused by doing my testing on my corporate machine. I'll do some testing with a custom binary on my Surface Go. The rotating set of domains will reduce the effectiveness of this somewhat, but I think there's still some good potential. And yes, googlesyndication.com keep showing up as one of the expensive processes.
,
Oct 12
Comment 5: As a side note, Chrome's task manager indeed only shows processes that it makes the effort to track. We had an intern helping us fix that in issue 739782, but the work hasn't been finished. You might be able to see them stick around with the --task-manager-show-extra-renderers flag from issue 716609.
,
Oct 12
The --task-manager-show-extra-renderers flag is quite useful and shows me the renderers hanging around for twenty seconds, as I requested. However I can no longer get process reuse. That is, the renderer processes hang around while new processes are created for the ad domains. I've monitored googlesyndication.com and it seems to get a new PID each time. Screenshot of task manager with the previous ad domains hanging around as unlabeled renders is attached. I'm not sure why this was working (or why I thought it was working) yesterday when I can't get it to work today. Very odd. My test change is uploaded at http://crrev.com/c/1279198.
,
Oct 12
c#7: I did a bit more digging to understand when the reuse will and will not happen. Turns out that currently, process reuse of these "pending-delete" processes works only when we stay within the same BrowsingInstance. I.e., sharing seems to happen for me is when you follow a link to a same-site URL in the same tab, or type a same-site URL into the omnibox. However, sharing doesn't seem to happen if you open a same-site comic in a separate tab, or if you type a cross-site URL into the omnibox and then navigate to the comic site again. Does that match what you're seeing? A few more details about why this happens. Cross-process subframes use a REUSE_PENDING_OR_COMMITTED_SITE process reuse policy, which looks for an existing process that has pending or committed navigations with a matching URL. I found that the subframe processes that are in delayed shutdown mode are no longer associated with any navigations (a consequence of destroying the subframe RenderFrameHosts), so they won't even be considered for reuse by that mechanism. However, when we're staying in the same BrowsingInstance, I'm guessing that the SiteInstances of the subframes that we've just destroyed are still around, kept alive by session history entries, and they're still associated with the pending shutdown processes. If we try to recreate a SiteInstance for a subframe on a new page, we'll find and reuse the old SiteInstance, which still holds on to the old process, so we get to reuse that process. I think it may be worthwhile for us to fix REUSE_PENDING_OR_COMMITTED_SITE to also look for processes that have a matching process lock, which would allow more process reuse in cases like this (regardless of this issue).
,
Oct 13
Thanks for the pointers. I took a look at RenderProcessHostImpl::FindReusableProcessHostForSiteInstance. As you say, subframe processes that are in delayed shutdown mode are no longer associated with any navigations so they won't even be considered for reuse by that mechanism. I think there is a (benign) race condition where if the navigation and reload happens quickly enough then the old processes will still be associated with the old site_instance and will get reused (hence why it seemed to work yesterday) but in most/many cases they will not be. Once a process is not associated with any site_instance/site_url is there any mechanism whereby it can be reused, currently? If not then it seems like we either need to allow those processes to be reused by any site (possibly security risks) or we need to continue to record which site_url they were associated with so that they can be brought back from the dead. My crude attempt to do this failed because the RenderProcessHost was destroyed, so leaving it in map_ isn't very helpful.
,
Oct 13
Currently, the only mechanism that would reuse such processes would be when we're over the process limit (see RenderProcessHostImpl::GetExistingProcessHost). You should be able to test that by running with --renderer-process-limit=1. The limit is a soft limit anyway when running with site isolation. This seems to allow reuse in the cases that didn't work for me in c#8, and seems to also work well on the comics site (almost no extra "Renderer:" processes remaining when running with --task-manager-show-extra-renderers). One caveat is that this is checked after reusing the spare process, if available, so one of the frames might reuse the spare instead. If you want to continue playing with this, the main entrypoint into picking the target process is RenderProcessHostImpl::GetProcessHostForSiteInstance(). I think we could potentially get the REUSE_PENDING_OR_COMMITTED_SITE cases to also use GetExistingProcessHost().
,
Oct 15
Setting --renderer-process-limit=1 together with the other changes (kUnloadTimeoutMS now set to 8,000) works well. Doing a new navigation every ten seconds I saw just ten new processes created over the course of five navigations - that is about 80% lower than before. I'll do some more experiments to try to measure the CPU-time savings to help decide whether/when/how to ship this change for real.
,
Oct 15
I did a couple of runs on the Surface Go with and without the hack fixes, doing five navigations from https://www.smbc-comics.com/comic/2013-09-14 over a ~55 second time period: Dev-build without fixes: MsMpEng.exe - 74834 context switches, 7552.92 ms CPU System - 76275 context switches, 4669.26 ms CPU chrome.exe - 558124 context switches, 121445.35 ms CPU, 55 processes MsMpEng.exe - 79379 context switches, 7912.87 ms CPU System - 84476 context switches, 5289.91 ms CPU chrome.exe - 567404 context switches, 119957.37 ms CPU, 56 processes Dev-build with fixes: MsMpEng.exe - 32249 context switches, 4075.00 ms CPU System - 60975 context switches, 3475.10 ms CPU chrome.exe - 467083 context switches, 96482.00 ms CPU, 17 processes MsMpEng.exe - 36929 context switches, 4377.76 ms CPU System - 71323 context switches, 4231.64 ms CPU chrome.exe - 506767 context switches, 106959.43 ms CPU, 20 processes The results are a bit noisy, but the savings over five navigations are typically: - ~3.5 s less MsMpEng.exe CPU time - ~1.0 s less System CPU time - ~19 s less chrome.exe CPU time - ~36 fewer chrome.exe processes - way fewer context switches (as a consequence of less CPU time and fewer processes) The data is a bit noisy but shows a fairly clear savings of 4-5 s of CPU time per navigation. The CPU usage graphs show a much smaller period of time where the CPU is 100% utilized. As a percentage the total CPU time reduction is about 20%. I'm now looking at less hacky changes.
,
Oct 18
Work in progress... |
|||
►
Sign in to add a comment |
|||
Comment 1 by brucedaw...@chromium.org
, Oct 10Aside: to slightly confuse things, typing Shift+ESC on that site (to bring up Chrome's task manager) also takes you to the site's main page - this confused me for a while. Adjusting kUnloadTimeoutMS isn't sufficient because it only triggers when GetSuddenTerminationDisablerState(blink::kUnloadHandler) is set, but even with that check commented out and DelayProcessShutdownForUnload being called with twenty seconds I still see the i-frame process disappearing instantly (in less than a second afterclicking the link). Any thoughts on how to make them stick around at least for test purposes? keep_alive_ref_count_ seems to have some restrictions that make it not work well for these purposes. Another aside: as an experiment I recorded an ETW trace of twelve navigations on this site on my X840 and then UIforETW generated the standard Chrome process summary. This is with one tab open. The renderer process count is impressive: Chrome PIDs by process type: C:\Users\brucedawson\AppData\Local\Google\Chrome SxS\Application\chrome.exe (42772) browser : 42772 crashpad : 15908 43348 extension : 480 20524 24120 28088 36544 44352 gpu-process : 50392 renderer : 820 1260 1484 1812 3164 3300 3340 5196 7888 8008 8392 10328 10456 10532 10716 11456 13220 13336 14044 15212 16148 16352 17436 18896 19112 20800 21228 21368 21996 22208 22236 22376 22708 22716 23996 24008 24388 28080 28476 29000 29524 29588 29828 30664 31288 31360 31592 31832 32192 32356 32976 33180 33616 34484 34760 36208 37532 37660 37684 38148 38212 38524 39000 40160 40232 40660 40800 41372 41528 42824 42888 43972 44004 44164 44324 44748 44752 44892 44912 45012 45424 45868 45880 46092 46728 47928 47964 49096 50324 50520 50584 51156 51552 51620 52220 52260 52864 53476 53604 54088 55460 55760 56440 57480 57652 57700 utility : 10012 19168 22908 27172 32032 37696 40204 53540 54036 57440 watcher : 6572