Reload crashed iframes |
|||||||
Issue descriptionOn Android, child processes are killed to free up memory. So child process can die through no fault of chrome. Without OOPIF, chrome on android handled this by reloading a crashed tab when user switches to it again. Probably need to do something similar for crashed frame when OOPIF is enabled. Options: 1) do nothing, show sad frame 2) automatically reload frame when it becomes visible 3) let user click on sad frame to reload it 4) reload the whole page something else? I played with a hack CL that basically does 2): https://bugs.chromium.org/p/chromium/issues/detail?id=813232#c15 And found out that in a lot of cases, simply reloading the original url for the frame isn't good enough, because the parent isn't expecting the reload, so the parent frame isn't posting messages to the newly loaded iframe. So I expect if we want to reload a frame, then it probably requires a web platform change, maybe to inform the parent page that a subframe was reloaded or something like that?
,
May 9 2018
Yes, that's a tough question. We went with not auto-reloading subframes when killing a process via the hung renderer dialog for the same reason-- the iframe often isn't set up properly via postMessages, etc. How bad was it in your experiment? Did real iframes just end up looking blank in a lot of cases? I wish it were something we could classify into successful and unsuccessful. Then we could auto-reload when it becomes visible (option 2) and just show some UI (similar to the sad frame) in the unsuccessful cases, letting the user reload the whole page if they want. I imagine it's hard to tell if the auto-reload was successful in general, though. Not sure about making it a web visible thing. Maybe that's more in the realm of panicker@'s lifecycle work, but it also exposes pretty browser-specific details (like whether subframes can fail independently of the page). If web pages did support that, though, it would be the best outcome. Option 3 (tap to reload frame) is probably useful regardless, but it won't solve the problem that auto-reload faces. (In other words, the user would still see broken frames in many cases when they manually reload the iframe, unless they later reload the whole page.) Option 4 (reloading the whole page) would "work," but it kind of defeats the point of having the main frame's process survive the subframe crash, and it may be unnecessary in some cases where the subframe could have been reloaded independently.
,
May 9 2018
> How bad was it in your experiment? Did real iframes just end up looking blank in a lot of cases? I didn't try that many sites. But enough ads on real web pages ended up being blank after a reload that made me think a simple reload isn't good enough.
,
May 9 2018
> I wish it were something we could classify into successful and unsuccessful. Hmm, can we keep track of whether it received any post messages? Might be worth a UMA, say if 90% of iframes when it crashed haven't received any post messages, then maybe it's ok to just forget the last 10% (making up numbers here)
,
May 9 2018
Auto-reloading sad iframes might also be dangerous in other cases, like if it's loaded from a POST, reloading might cause unwanted side effects on the server side. But maybe we can exclude those cases from being able to auto-reload. #4: agreed that those metrics might be nice to add. Maybe lukasza@ might be able to look into adding those as part of issue 841493 ? We can easily record whether a frame has received cross-process postMessages in RenderFrameProxyHost::OnRouteMessageEvent(). It'll not be a perfect predictor, but would be an interesting data point to have for this discussion. In terms of making this web-visible, the details seem hard to get right. What if it's not the parent but a sibling or some other random frame in the FrameTree (or opener hierarchy) that needs to communicate with the reloaded subframe? I'm also not sure how we feel about exposing a primitive to detect whether a given page is going to crash the renderer or not.
,
May 9 2018
,
May 9 2018
Also, for auto-reloading, we'd need to handle the case where an iframe just keeps triggering a crash when it reloads.
,
May 9 2018
> Also, for auto-reloading, we'd need to handle the case where an iframe just keeps triggering a crash when it reloads. Oh I didn't describe chrome on android's tab reload behavior clearly. It only reloads a tab if it died when hidden, and only when user switches to that tab again. Sad tab is still shown if the visible tab dies. So if we do roughly the same thing, then there is no loop.
,
May 10 2018
To chime in on #2 comment about web-exposed Lifecycle -- currently this is spec'd to have the entire frame tree of the page in the same Lifecycle state (Discarded is the relevant state here). So this breaks today's expectation. The spec can be evolved to make frames have different states if needed - we've considered it, and deliberately haven't precluded this possibility for the future.
,
Jun 11 2018
Some update for anyone here who's not aware. lukasza@ added metrics for sad frames in crbug.com/841493 . And site isolation experiment on android canary showed sad frame rate is very high, as expected: (google-only doc) https://docs.google.com/document/d/1YolLnahAxozzLgWLhTsO2R2Iq98X-VAj2xZ2rvCEVJE/edit?usp=sharing I'm looking at adding a metric to see what ratio of crashed frames has never received a postMessage from any ancestor frame, so it's safe to just reload. It's easy enough to keep track of whether a frame has received any post message from ancestor frames. That exact value is computed right here: https://cs.chromium.org/chromium/src/content/browser/frame_host/render_frame_proxy_host.cc?rcl=5b1538f9453b66fcb5d5c039aebd18986f6bdee6&l=373 It's easy enough to just save that state in RenderFrameHostImpl. Then I was thinking of getting to the root RFHI from CrossProcessFrameConnector::MaybeLogCrash when kShownAfterCrashing is logged. However, that seems to be kind of against the spirit of things, going from widget to view to root frame? A simpler alternative would be to log the post message state in RenderFrameHostImpl::RenderProcessGone. That's quite different from kShownAfterCrashing, but probably good enough for a rough estimate. Thoughts?
,
Jun 12 2018
Here's the simpler approach: https://chromium-review.googlesource.com/c/chromium/src/+/1097695
,
Jun 12 2018
#10: I agree keeping track of postMessages is a good start for determining that the frame might be safe to reload. There might be other ways for frames to communicate (message ports and MessageChannel, window.name, setting URL fragments, maybe something else?), but hopefully postMessage is much more common than those. RenderFrameProxyHost::OnRouteMessageEvent handles all cross-process postMessaging, so updating the flag there sounds right. However, lukasza@ and I chatted, and we don't think this needs to be restricted to just ancestor frames. One example is sibling frames: if we have A(B,C), and B postMessages C, and then C dies, it seems there's still a danger of breaking communication between B and C if we reload C. Re: getting to the RFHI from CrossProcessFrameConnector -- seems we already do that there (e.g., https://cs.chromium.org/chromium/src/content/browser/frame_host/cross_process_frame_connector.cc?l=413&rcl=1e6813f718e47ca5352e14a098d740ce842ac44e, https://cs.chromium.org/chromium/src/content/browser/frame_host/cross_process_frame_connector.cc?l=497&rcl=2fa8c2c72da47e87f4e9a9221b3ddab97ee66fd8), and both are in content/browser/frame_host/, so this seems ok layering-wise. Lukasz and/or Ken, do you agree this is preferable over logging from RenderFrameHostImpl::RenderProcessGone?
,
Jun 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b175bca0f37931ea3020290abd9db8322f68d197 commit b175bca0f37931ea3020290abd9db8322f68d197 Author: Bo Liu <boliu@chromium.org> Date: Wed Jun 20 18:32:49 2018 Add UMA to track cross-process postMessage to iframes Add a UMA when a crashed subframe becomes visible whether it has received any post messages from non-descendant frames. For frames that did, they cannot be silently reloaded. Bug: 841572 Change-Id: Ib2d0b4f5fc30b0f7596a00253afa9a0cfbe24faa Reviewed-on: https://chromium-review.googlesource.com/1097695 Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#568941} [modify] https://crrev.com/b175bca0f37931ea3020290abd9db8322f68d197/content/browser/frame_host/cross_process_frame_connector.cc [modify] https://crrev.com/b175bca0f37931ea3020290abd9db8322f68d197/content/browser/frame_host/render_frame_host_impl.h [modify] https://crrev.com/b175bca0f37931ea3020290abd9db8322f68d197/content/browser/frame_host/render_frame_proxy_host.cc [modify] https://crrev.com/b175bca0f37931ea3020290abd9db8322f68d197/tools/metrics/histograms/histograms.xml
,
Jul 9
The metric added in #13 is RenderFrameHostImpl.ReceivedPostMessageFromNonDescendant Not posting the direct link on public bug, but it shows on android canary (where there are site isolation experiments running), of the frames that we could potentially restore (processed died, and then frame became visible), ~60% has received a postMessage from non-descendant frame. So at least 60% of the time, it's not safe to reload a crashed iframe without informing the page first.
,
Sep 17
One thing I've noticed a couple of times recently on canary is that I got a bunch of sad frames after switching back to an inactive tab. I'm not sure that's actually better user experience than just reloading the whole tab, which we do automatically for sad tabs, if the main frame renderer was killed. The page will need to reload if course, but users are already kind of accustomed that this may happen. Maybe we can trigger reload of the background page when we're switching to it and there are visible sad frames on it?
,
Sep 17
> One thing I've noticed a couple of times recently on canary is that I got a bunch of sad frames after switching back to an inactive tab. I claim responsibility :) I've been largely trying to lower sad tabs (and tab reloads) at the cost of more sad frames in crbug.com/855037 Reloading a page when is pretty costly as well. Also then do we reload a page when user scrolls and exposes a sad frame? Stability.ChildFrameCrash.Visibility ShownAfterCrashing currently doesn't distinguish between switching tabs vs scrolling; maybe it's useful to have that distinction? I guess is if we do this, reloads will be so frequent that it will be just as annoying to the user. Although do note that lowering sad tabs at the cost of sad frames is the wrong optimization in this case.
,
Sep 28
,
Oct 2
BTW, I peeked at the data for RenderFrameHostImpl.ReceivedPostMessageFromNonDescendant, since that metric was about to expire (see issue 891053). We now have data for it from all channels up to M69 stable, though for Android the beta/stable data isn't useful as we don't have any public site isolation trials there yet. The current numbers on Android canary/dev are a bit better than before: on Dev, ~42% of frames shown after crashing had received a postMessage from a non-descendant frame. This is for two weeks ending on Sep 30. Canary is ~46%. Comparing the data for full site isolation vs subset of sites on Dev, I see ~30% for subset of sites vs. 46% for full site isolation. So apparently with subset of sites sad frames are eligible for a reload more often.
,
Oct 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c0bed8365b82df6160f2f3f0d85079f9738b8d0f commit c0bed8365b82df6160f2f3f0d85079f9738b8d0f Author: Bo Liu <boliu@chromium.org> Date: Wed Oct 03 03:44:19 2018 Remove ReceivedPostMessageFromNonDescendant UMA Obtained desired data. Bug: 841572, 891053 Change-Id: Ib81d8fb1ad0dc6df8437fe16264c249b60a5ae30 Reviewed-on: https://chromium-review.googlesource.com/c/1257856 Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#596116} [modify] https://crrev.com/c0bed8365b82df6160f2f3f0d85079f9738b8d0f/content/browser/frame_host/cross_process_frame_connector.cc [modify] https://crrev.com/c0bed8365b82df6160f2f3f0d85079f9738b8d0f/content/browser/frame_host/render_frame_host_impl.h [modify] https://crrev.com/c0bed8365b82df6160f2f3f0d85079f9738b8d0f/content/browser/frame_host/render_frame_proxy_host.cc [modify] https://crrev.com/c0bed8365b82df6160f2f3f0d85079f9738b8d0f/tools/metrics/histograms/histograms.xml
,
Oct 9
> Reloading a page when is pretty costly as well. Also then do we reload a page when user scrolls and exposes a sad frame? Stability.ChildFrameCrash.Visibility ShownAfterCrashing currently doesn't distinguish between switching tabs vs scrolling; maybe it's useful to have that distinction? Yes, I agree we should add finer-grained metrics for ShownAfterCrashing as the first step; I'm putting together a CL to do that. Once we have that data, we can evaluate different policies; e.g. reloading when just switching to a tab exposes sad frames vs also reloading when switching to a tab that contains out-of-view sad frames that the user might scroll to later.
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025 commit 2b0c91df42b8e4e4df7ba3f05c8713b603bf9025 Author: Alex Moshchuk <alexmos@chromium.org> Date: Wed Oct 10 18:41:16 2018 Add metrics on how sad frames become visible after crashing. When a frame crashes while it is not visible to the user, but later becomes visible, we currently log the Stability.ChildFrameCrash.Visibility.ShownAfterCrashing metric. This CL adds more metrics to understand how exactly the sad frame was shown later, to distinguish between the following cases: 1. User switch to a tab, revealing the sad frame. 2. The viewport intersection became non-empty (e.g., user scrolled to the sad frame) 3. The frame visibility changed (e.g., parent modified the iframe element's CSS to make it visible) 4. Case 1 followed by case 2. E.g., user switched tabs and then scrolled to the sad frame. 5. Case 1 followed by case 3. E.g., user switched tabs and then interacted with the page, causing the parent to modify CSS and make the sad frame visible. These metrics will be useful to understand whether reloading a tab with sad frames when the user switches to it might help reduce the number of sad frames that users see. Bug: 841572 Change-Id: I6dff352f87d22d01cdfcb163dd52c9b5506bae38 Reviewed-on: https://chromium-review.googlesource.com/c/1270011 Reviewed-by: Steven Holte <holte@chromium.org> Reviewed-by: Ćukasz Anforowicz <lukasza@chromium.org> Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#598422} [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/content/browser/frame_host/cross_process_frame_connector.cc [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/content/browser/frame_host/cross_process_frame_connector.h [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/content/browser/site_per_process_browsertest.cc [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/content/browser/web_contents/web_contents_impl.cc [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/tools/metrics/histograms/enums.xml [modify] https://crrev.com/2b0c91df42b8e4e4df7ba3f05c8713b603bf9025/tools/metrics/histograms/histograms.xml
,
Oct 23
Quick update with metrics from #c21 on how sad frames become visible. So far with ~10 days of canary data ending Oct 21, on the subset-of-sites isolation trial, the majority of sad frames (69%) become visible because of scrolling. 15% become visible immediately after a tab is switched, and another 16% become visible after a tab is switched followed by scrolling to the sad frame. So, all in all, if we were to mark background tabs for reload if they contain a sad frame, we could avoid at most 31% of sad frames. That's lower than I hoped, but masking a third of sad frames might still be worthwhile. We only have a couple of days of data from dev so far, but so far it shows a very similar picture - 75% sad frames becoming visible after scrolling vs 25% after tab switch (possibly followed by scrolling). One thing that both Bo and Nasko brought up is avoiding too many reloads. Nasko mentioned being really annoyed when reloads happen on slow networks, so if we were to explore a reload policy for tabs with sad frames, one idea would be to limit it only apply on fast networks. We were also hoping that more sad frames would become visible on tab switch in subset-of-sites isolation compared to strict --site-per-process, but the opposite is actually true. In --site-per-process, 24% of sad frames become visible after tab switches, and 20% more on scrolling after tab switch. So a reload policy would be more effective with strict site isolation. I'll note that the work on issue 787576 might change the sad frame metrics on the subset-of-sites experiment, as it will start grouping cross-site non-isolated subframes on an isolated site into one process. So we should take another look at this data after that issue is fixed.
,
Nov 6
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cdfa4b1e6d54a365aa51baef4e8975d56b5731a6 commit cdfa4b1e6d54a365aa51baef4e8975d56b5731a6 Author: Bo Liu <boliu@chromium.org> Date: Tue Nov 06 00:21:44 2018 Add metric to track tab restore On Android, Navigation.LoadIfNecessaryType.kRequestedByClient counts number of tab restore loads after switches back to a tab with a dead renderer process. Bug: 841572 Change-Id: I753bedf1f0acfbe5bafe286b7cabfa2cf5c31fcf Reviewed-on: https://chromium-review.googlesource.com/c/1316361 Commit-Queue: Bo <boliu@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#605527} [modify] https://crrev.com/cdfa4b1e6d54a365aa51baef4e8975d56b5731a6/content/browser/frame_host/navigation_controller_impl.cc [modify] https://crrev.com/cdfa4b1e6d54a365aa51baef4e8975d56b5731a6/content/browser/frame_host/navigation_controller_impl.h [modify] https://crrev.com/cdfa4b1e6d54a365aa51baef4e8975d56b5731a6/tools/metrics/histograms/enums.xml [modify] https://crrev.com/cdfa4b1e6d54a365aa51baef4e8975d56b5731a6/tools/metrics/histograms/histograms.xml
,
Dec 7
Quick update now that there's more than a month of data for how sad frames become visible. The numbers are now a bit better than in #c22. For Dev site-per-process trial, we could avoid 45% of sad frames by marking a tab for reload when it has visible sad frames. (25% become visible immediately after tab switch, and another 20% become visible after tab switch + scroll.) For IsolateTopPasswordSites, the numbers are lower and would only avoid 22% of sad frames. I don't have a good explanation as to why, though that trial has substantially lower sad frame count to start with, and we may soon change the list of whitelisted sites which may affect the numbers here (the current list is desktop-specific, and acolwell@ is looking into getting it for mobile). We haven't yet fixed issue 787576, which would reduce the # of OOPIFs and could also affect our strategy here. It's turning out more complicated and requires fixing issue 898281 as well, but lukasza@ and acolwell@ have been making progress there. That said, perhaps we should start experimenting with the tab reload policy, given that the payoff might actually be quite nice? That way we'll have enough time to gather data and check how much this affects the tab restore metric that boliu@ added in #c23, to see if this idea is actually reasonable or not. For implementation, boliu@ mentioned this is where sad tabs get marked for reload: https://cs.chromium.org/chromium/src/chrome/android/java/src/org/chromium/chrome/browser/tab/TabWebContentsObserver.java?rcl=c13a10b85120f967920ffa056b4ed967b049fa4e&l=157 which eventually uses NavigationControllerImpl::SetNeedsReload. Then when tab becomes visible, NavigationControllerImpl::LoadIfNecessary reloads the tab. For the policy I'm proposing, ideally we'd just call SetNeedsReload from something like CrossProcessFrameConnector::RenderProcessGone(), if IsVisible() is true and the WebContents is not currently shown. Not sure how to best make that Android-specific though - looks like the existing logic in TabWebContentsObserver.java relies on WebContentsObserver::RenderProcessGone, which only fires for main frame process crashes, plus we'd also want the IsVisible() info. We could consider adding something for sad frames to WebContentsObserver. Bo, any opinions on what the plumbing should look like for this?
,
Dec 7
Trying to estimate the downside side, how many more tab reloads would this cause. Stability.ChildFrameCrash.ShownAfterCrashingReason counts frames, not tabs, so it's only an upper bound. Going by last 14 days of data and comparing to Navigation.LoadIfNecessaryType.kRequestedByClient, this would the increase reload by ~10% for IsolateTopPasswordSites, and ~35% for SitePerProcess. And these are upper bounds. Although there might be second order effects, eg reloading all the iframes causes more background tabs/frames to die. Seems ok to at least try this. We'll get real numbers when we try this. And should also keep an eye on page loads. > Bo, any opinions on what the plumbing should look like for this? WebContentsObserverProxy is the class that plumbs calls from native WebContentsObserver to java ones. Everything else should be pretty straightforward?
,
Jan 8
We'll want to finalize a strategy here before shipping SI on Android, so adding appropriate label to track this. I've been busy with other work and haven't had a chance to follow up on #c24-25 yet. boliu@ - if you have some spare cycles and are interested in implementing that policy, please feel free to do it, that would be much appreciated! Otherwise I'll try to get to it some time soon. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by boliu@chromium.org
, May 9 2018