Investigate CPM changes caused by Site Isolation |
||||||||
Issue descriptionLet's use this bug to track CLs we need to investigate CPM changes caused by Site Isolation.
,
May 4 2018
Commit b1088809... initially landed in 68.0.3419.0. I've looked at crashes in this version and nothing worrisome caught my attention. I've looked at the UMA data and we are getting back Stability.CrashedProcessAge.Renderer and Stability.CrashedProcessAge.Extension data that looks valid. This fix is important to merge back into M67, as it should help explain the observed impact of Site Isolation on stability metrics (especially on the observed regression in renderer CPM metric). The fix is of medium size, but is fairly straightforward/simple - I think the risk of merging should be very low. The fix should only have impact on code paths taken when the browser process detects a crash of a renderer process.
,
May 4 2018
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 4 2018
This looks like a good candidate to merge. The fix makes sense, and the new UMA metrics will be very useful for confirming theories about Site Isolation stability in M67. No concerns from my side.
,
May 4 2018
Approving merge to M67 branch 3396 based on comments #2, #4 and per offline chat with creis@. Please merge ASAP. Thank you.
,
May 4 2018
Hmmm... there is no ChildProcessTerminationInfo struct in M67 - the CL won't merge cleanly. We can either also merge r553367 (which introduced ChildProcessTerminationInfo and is a big, but simple/mechanical change that had 10 days of bake time on Canary/Dev) or I can try to shoehorn the new UMA into RendererClosedDetails (present on M67 / replaced by ChildProcessTerminationInfo in M68). My gut feeling is that the former option will be safer: - safer to deal with if other ChildProcessTerminationInfo-dependent changes need to be merged in the future - safer than re-authoring chunks of the fix in M67 WDYT?
,
May 4 2018
Oh... That might be a problem. At first glance, https://chromium-review.googlesource.com/c/chromium/src/+/1017386 seems too large to merge, even if it's pretty mechanical. Re-authoring your CL to work without it is also not a great option. I'm not sure what to recommend here yet. We might have to get by with just having these metrics on M68, which is unfortunate but might not be a showstopper. boliu@: What's your sense on whether your CL can be safely merged, despite its size? lukasza@: Have you tried merging it and your CL in your local branch to see if they apply cleanly and pass tests, just as a sanity check?
,
May 4 2018
I've checked if I can locally/manually cleanly merge r553367 followed by r555936. It turned out that a local merge failed with conflicts in content/browser/renderer_host/render_process_host_impl.cc and content/browser/browser_child_process_host_impl.cc. I guess my previous smoke test was based on a wrong assumption that the first |git drover| prompt won't appear until the merge is done cleanly. The above probably means that re-authoring / tweaking RendererClosedDetails is the only real option. I've uploaded it here in case we want to go down this route: https://crrev.com/c/1045274
,
May 4 2018
Thanks. I'll try to take a close look at the differences between the committed CL and your edited merge. We can see how risky it is (assuming you've built it on your local checkout of the M67 branch and it passes tests).
,
May 4 2018
RE: assuming you've built it on your local checkout of the M67 branch and it passes tests It builds fine locally and passes components_unittests - AFAIK the changes don't have any test coverage beyond StabilityMetricsHelperTest. Hmmm... I guess I could also try building chrome and seeing if chrome://histograms shows the new UMA after manually killing a renderer - I'll try to remember to do this on Monday.
,
May 5 2018
> boliu@: What's your sense on whether your CL can be safely merged, despite its size? Unfortunately that CL is not purely mechanical, as I called out in the CL description. There is a behavior change in render_process_host_impl.cc. And populating the android-only fields in ChildProcessTerminationInfo requires merging more prior CLs, although nothing uses those fields at that point, so you could leave them out during merge.
,
May 7 2018
We think we'd like to postpone the merge for now: - The merge is slightly risky due to no ChildProcessTerminationInfo on M67 - Early Canary data from the new metric doesn't seem promising for explaining the increased CPM rate for Site Isolation.
,
May 7 2018
Rejecting merge to M67 based on comment #12. Please re-request a merge to M67 if needed when safe change is ready to be merged. Thank you.
,
May 24 2018
I think we've concluded our investigation in http://go/site-isolation-stability (sorry, internal link). The increase in renderer CPMs seems to be due to the increased number of processes in Site Isolation, thanks to third party interference. The effect is much smaller or non-existent on platforms that prevent such interference. Marking fixed. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by bugdroid1@chromium.org
, May 4 2018