Issue metadata
Sign in to add a comment
|
Need UMA for the number of sad frames |
||||||||||||||||||||||
Issue descriptionIt would be useful if the Android field trials of Site Isolation could answer how many sad frames the user sees (because of OS killing the renderers or for other reasons). lfg@ points out a UMA could be potentially added either in CrossProcessFrameConnector::RenderProcessGone or RenderWidgetHostViewChildFrame::RenderProcessGone.
,
May 9 2018
I am not sure how to compute whether the RWHVCF is currently visible inside RWHVCF::RenderProcessGone. Would something like this work:
bool is_visible = IsShowing() && WasUnOccluded()
? Or is just checking IsShowing() sufficient?
I guess having |is_visible| could log the following metrics:
- Stability.ChildFrameProcess.CrashedWhileVisible.ExitCodes
(similar to CrashExitCodes.Renderer)
- Stability.ChildFrameProcess.CrashedWhileInvisible.CrashedProcessAge
(similar to Stability.CrashedProcessAge.Renderer)
lfg@ points out that these metrics would log multiple events if multiple frames are hosted in the crashed renderers, but I think that should be fine as long as the semantics is clear to the readers/users of the metric.
I wonder if it might also be worth to tweak RWHVCF::Show() override to check RPH::HasConnection() and report something like Stability.ChildFrameProcess.DeadWhenShown.CrashedProcessAge.
WDYT?
,
May 9 2018
> I am not sure how to compute whether the RWHVCF is currently visible inside RWHVCF::RenderProcessGone. Would something like this work: > bool is_visible = IsShowing() && WasUnOccluded() > ? Or is just checking IsShowing() sufficient? I was trying to say WasUnOccluded is probably too hard to compute. So IsShowing is sufficient for now. We should also count the case where a hidden subframe dies, and then user switches to that tab after. And probably should count each sad frame only once, even if user switches back to it multiple times. Exit codes and termination status on android is largely meaningless on Android. But I think sad frame doesn't depend on how the process died?
,
May 9 2018
One more thing to point out - a sad frame might be offscreen and not visible at time of crash, but can later be scrolled into view and be visible. It is not easy to account for these cases I assume, is it?
,
May 9 2018
I meant to cover that case in #1 and with the WasUnOccluded comment, ie it's too hard and too much work for a UMA and let's just not bother.
,
May 9 2018
If we really want to report only the when user actually sees the sad frame image, perhaps it would be possible to use IntersectionObserver to do this? I think we would have to do this from the renderer though, and wouldn't be possible to get the termination status for the child process. Ken, WDYT?
,
May 9 2018
Even a hidden (e.g. 0-sized, or below-the-fold) frame can interact (postMessage etc.) with the rest of a page (and therefore negatively impact the user experience when crashed/killed). The true visibility might still be interesting but maybe as a separate metric (I assume that RWHVCF's "shown" state approximates whether the whole page/tab is visible; not sure if this assumption is true).
Okay, so I think so far I hear:
- let's not do ExitCodes (not useful on Android per #c3)
- CrashedProcessAge didn't hear any support so far
- no explicit request for logging the *duration* of time a sad RWHVCF is in a "shown" state
(is this what #c1 was alluding to?)
Which I think leaves only the following 2 relatively simple metrics to implement:
- Stability.ChildFrameProcess.VisibilityState:
- value is an enum: showing / not showing
(metric description can try to explain how this is different from "visible")
- logged from RWHVCF::RenderProcessGone(...)
- visibility calculated just based on RWHVCF::IsShowing()
- I can try to take a stab at implementing this
(unless anybody else wants to volunteer)
- Stability.ChildFrameProcess.ShownWhenDead:
- logged from RWHVCF::Show() (only once per RWHVCF/crash as suggested in #c3)
- I don't know how to hold UMA to log an *occurence* of an event
(so not an enum/bool/duration/accumulatedcount/etc). I'll try
to ask around...
,
May 9 2018
If those two metrics are independent, then I think still not possible to work out number of sad frames shown after the fact, since RWHVCF::RenderProcessGone while visible can still trigger a ShowWhenDead later if user switches away then back to that tab. I think we only need ShownWhenDead, and in addition to RWHVCF::Show, it should also be logged in RenderProcessGone if IsShowing is true. > - I don't know how to hold UMA to log an *occurence* of an event Umm... enum with a single entry? I guess it's rare to have metrics that are not meant to be useful by itself, but should be normalized with page loads later.
,
May 10 2018
I have a WIP CL @ https://crrev.com/c/1054099: - lfg@ - let's discuss on the CL why it only works for CrashedWhileShowing and not for ShownAfterCrashing (let me send a question via Gerrit in a minute or two) - boliu@ - does the shape of the histograms in this CL match your expectations (ignoring detailed review and whether the CL actually works)
,
May 10 2018
yep, shape looks good :p
,
May 10 2018
There is one thing that I'd like to change. As lfg@ points out, we care about subframe crashes even if the iframe is not visible (0-sized? sitting below the fold? occluded?) but still possibly important for UX (e.g. processing postMessages). So: 1. do we really care about crashes per widget/localroot? 2. or maybe we truly only care about crashes per subframe? 3. just about crashes affecting subframes? One interesting scenario to consider if what should happen in A(B(B2,C)) scenario when the B process crashes. - In CrashedWhileVisible case there are 2 affected widgets/localroots and 3 affected frames and only 1 subframe process crash - In ShownAfterCrash case we only have dead B when WebContents is shown, so we would only log 1 subframe crash anyway. This is kind of weird (and this is how my current revision of the CL works). So maybe, we should instead log a crash once per RenderFrameHostImpl (rather than per FrameTreeNode and/or RenderWidgetHostSomethingSomething)? I think I like this direction - let me try to implement this.
,
May 10 2018
I was thinking of counting the *number* of sad frames user "sees". So in the A(B(B2,C)) case, it would only be counted once, because there is only a single sad frame in A. But if B crashes in A(B,B,B), it would be counted 3 times, because user is looking at 3 sad frames. I think number of sad frames is probably a more sensible measure of impact of the crash than number of frames brought down, because users can't really access number of nested frames. So I think that would be 1. crash per widget/localroot?
,
May 11 2018
Another thing to consider logging here is some notion of the size of the crashed subframe. One motivation is that one option in issue 841572 is to let the user click/tap on the sad frame to reload it, so I'm curious in how many cases that would be possible. Presumably, the user won't really be able to see or click a 1-pixel sad frame even when it's visible. Also, we do have UMAs for main frame sad tabs that we'll be able to compare this to, right?
,
May 11 2018
There are lots of related information, like size of iframe or whether it's occluded mentioned above, that could be useful/interesting, but would be non-trivial to log. So I'd say do what's easy for now since we want this uma soon. Sad tab on android is "Stability.Android.RendererCrash", logged immediately after showSadTab call: https://cs.chromium.org/chromium/src/chrome/android/java/src/org/chromium/chrome/browser/tab/TabWebContentsObserver.java?rcl=1129d0d679e25123091792887b463ac9c2606297&l=123
,
May 11 2018
RE: #c12 CrashedWhileVisible and ShownAfterCrashing probably indeed only make sense for widgets/localroots. I also realize now that even if RWVHCF disappear after a crash, the remaining sad RFHs correspond exactly to localroots. So, I guess I can try to go back to a hybrid approach where: - ShownAfterCrashing is logged from WebContentsImpl::WasShown which iterates over all frames and finds ones that crashed (and which have their parent in a different process). - CrashedWhileVisible is logged from RWVHCF::RenderProcessGone Let me try to figure out 1) where to keep the boolean that prevents logging ShownAfterCrashing more than once (after multiple hide/show cycles) 2) how to plumb RWVHCF::RenderProcessGone to the place where the boolean is kept
,
May 11 2018
Sorry for completely missing the question in comment 6... RenderWidgetHostViewChildFrame does know the size of the frame, and to some extent whether it is shown/hidden. CanBecomeVisible() accounts for having an ancestor that has CSS visibility set to hidden, for example. It also knows whether the frame is currently positioned in the window viewport (i.e. it can call frame_connector_->viewport_intersection_rect()). That accounts for clips if a container has scrolled it out of view, but doesn't account for any elements that might be positioned on top of the OOPIF (occlusion). One thing that might be missing in accounting for visibility is display:none, and now I'm wondering if that is a bug. If you want to know that a crashed OOPIF was later scrolled into view, I believe CrossProcessFrameConnector is the right place to do that. The RenderWidgetHostViewChildFrame is destroyed immediately following the process crash, but the CPFC persists and continues tracking the frame's viewport intersection.
,
May 11 2018
RE: #c16 Thanks! I've got something based on CrossProcessFrameConnector that seems to work (at least passes browser tests on my machine) - https://crrev.com/c/1054099
,
May 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/873e60893e31e6d54996ea07d954990133529f79 commit 873e60893e31e6d54996ea07d954990133529f79 Author: Lukasz Anforowicz <lukasza@chromium.org> Date: Thu May 17 00:12:33 2018 Add Stability.ChildFrameCrash.Visibility for tracking sad *sub*frames. This CL adds Stability.ChildFrameCrash.Visibility which is logged after detecting that a sad subframe is shown. The logged enum describes the timing of a renderer crash in relation to whether the subframe is visible or not. Bug: 841493 Change-Id: Ic893764817bd1a9aa31684d53f858a398a31dfc8 Reviewed-on: https://chromium-review.googlesource.com/1054099 Reviewed-by: Alex Moshchuk <alexmos@chromium.org> Reviewed-by: Steven Holte <holte@chromium.org> Reviewed-by: Lucas Gadani <lfg@chromium.org> Commit-Queue: Alex Moshchuk <alexmos@chromium.org> Cr-Commit-Position: refs/heads/master@{#559356} [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/cross_process_frame_connector.cc [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/cross_process_frame_connector.h [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/interstitial_page_impl.cc [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/interstitial_page_impl.h [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/render_frame_host_delegate.cc [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/frame_host/render_frame_host_delegate.h [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/site_per_process_browsertest.cc [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/content/browser/web_contents/web_contents_impl.cc [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/tools/metrics/histograms/enums.xml [modify] https://crrev.com/873e60893e31e6d54996ea07d954990133529f79/tools/metrics/histograms/histograms.xml
,
May 17 2018
The CL from #c18 landed after the Canary cut yesterday, so let's wait for a day or two and then verify the metrics are looking reasonably well in a UMA dashboard.
,
May 21 2018
The NextAction date has arrived: 2018-05-21
,
May 21 2018
The UMA data gathered over the last few days (from the site-per-process Android/Canary field trial) looks reasonable - let me mark this bug as fixed. I assume that there is no need to merge this metric to M67. Please open a separate bug if other kinds of metrics are needed (e.g. ones tracking the number of *frames* affected by a crash [rather than the number of sad *widgets*]). |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by boliu@chromium.org
, May 9 2018