New issue
Advanced search Search tips

Issue 838623 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 2
Type: Bug



Sign in to add a comment

Investigate CPM changes caused by Site Isolation

Project Member Reported by lukasza@chromium.org, May 1 2018

Issue description

Let's use this bug to track CLs we need to investigate CPM changes caused by Site Isolation.
 
Project Member

Comment 1 by bugdroid1@chromium.org, May 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb

commit b10888098e6b76fcbe960c9dd3f1acd1cc3344eb
Author: Lukasz Anforowicz <lukasza@chromium.org>
Date: Fri May 04 00:50:08 2018

Adding Stability.CrashedProcessAge.Extension/Renderer metrics.

Bug:  838623 
Cq-Include-Trybots: master.tryserver.chromium.mac:ios-simulator-cronet;master.tryserver.chromium.mac:ios-simulator-full-configs
Change-Id: I6afbc0a870a7818ebec64da5d81875dd72173586
Tbr: olivierrobin@chromium.org
Reviewed-on: https://chromium-review.googlesource.com/1036527
Commit-Queue: Ɓukasz Anforowicz <lukasza@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: Alex Moshchuk <alexmos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#555936}
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/chrome/browser/metrics/chrome_stability_metrics_provider.cc
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/components/metrics/stability_metrics_helper.cc
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/components/metrics/stability_metrics_helper.h
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/components/metrics/stability_metrics_helper_unittest.cc
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/content/browser/child_process_launcher.cc
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/content/browser/child_process_launcher.h
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/content/public/browser/child_process_termination_info.h
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/ios/chrome/browser/metrics/ios_chrome_stability_metrics_provider.mm
[modify] https://crrev.com/b10888098e6b76fcbe960c9dd3f1acd1cc3344eb/tools/metrics/histograms/histograms.xml

Cc: gov...@chromium.org
Labels: Merge-Request-67 OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows
Owner: lukasza@chromium.org
Status: Started (was: Untriaged)
Commit b1088809... initially landed in 68.0.3419.0.  I've looked at crashes in this version and nothing worrisome caught my attention.  I've looked at the UMA data and we are getting back Stability.CrashedProcessAge.Renderer and Stability.CrashedProcessAge.Extension data that looks valid.

This fix is important to merge back into M67, as it should help explain the observed impact of Site Isolation on stability metrics (especially on the observed regression in renderer CPM metric).

The fix is of medium size, but is fairly straightforward/simple - I think the risk of merging should be very low.  The fix should only have impact on code paths taken when the browser process detects a crash of a renderer process.

Project Member

Comment 3 by sheriffbot@chromium.org, May 4 2018

Labels: -Merge-Request-67 Merge-Review-67 Hotlist-Merge-Review
This bug requires manual review: M67 has already been promoted to the beta branch, so this requires manual review
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 4 by creis@chromium.org, May 4 2018

Cc: creis@chromium.org
This looks like a good candidate to merge.  The fix makes sense, and the new UMA metrics will be very useful for confirming theories about Site Isolation stability in M67.  No concerns from my side.
Labels: -Merge-Review-67 Merge-Approved-67
Approving merge to M67 branch 3396 based on comments #2, #4 and per offline chat with creis@. Please merge ASAP. Thank you.
Cc: boliu@chromium.org
Hmmm... there is no ChildProcessTerminationInfo struct in M67 - the CL won't merge cleanly.  We can either also merge r553367 (which introduced ChildProcessTerminationInfo and is a big, but simple/mechanical change that had 10 days of bake time on Canary/Dev) or I can try to shoehorn the new UMA into RendererClosedDetails (present on M67 / replaced by ChildProcessTerminationInfo in M68).  My gut feeling is that the former option will be safer:
- safer to deal with if other ChildProcessTerminationInfo-dependent changes need to be merged in the future
- safer than re-authoring chunks of the fix in M67

WDYT?

Comment 7 by creis@chromium.org, May 4 2018

Oh...  That might be a problem.  At first glance, https://chromium-review.googlesource.com/c/chromium/src/+/1017386 seems too large to merge, even if it's pretty mechanical.  Re-authoring your CL to work without it is also not a great option.

I'm not sure what to recommend here yet.  We might have to get by with just having these metrics on M68, which is unfortunate but might not be a showstopper.

boliu@: What's your sense on whether your CL can be safely merged, despite its size?

lukasza@: Have you tried merging it and your CL in your local branch to see if they apply cleanly and pass tests, just as a sanity check?
I've checked if I can locally/manually cleanly merge r553367 followed by r555936.  It turned out that a local merge failed with conflicts in content/browser/renderer_host/render_process_host_impl.cc and content/browser/browser_child_process_host_impl.cc.  I guess my previous smoke test was based on a wrong assumption that the first |git drover| prompt won't appear until the merge is done cleanly.

The above probably means that re-authoring / tweaking RendererClosedDetails is the only real option.  I've uploaded it here in case we want to go down this route: https://crrev.com/c/1045274


Comment 9 by creis@chromium.org, May 4 2018

Thanks.  I'll try to take a close look at the differences between the committed CL and your edited merge.  We can see how risky it is (assuming you've built it on your local checkout of the M67 branch and it passes tests).
RE: assuming you've built it on your local checkout of the M67 branch and it passes tests

It builds fine locally and passes components_unittests - AFAIK the changes don't have any test coverage beyond StabilityMetricsHelperTest.  Hmmm... I guess I could also try building chrome and seeing if chrome://histograms shows the new UMA after manually killing a renderer - I'll try to remember to do this on Monday.
> boliu@: What's your sense on whether your CL can be safely merged, despite its size?

Unfortunately that CL is not purely mechanical, as I called out in the CL description. There is a behavior change in render_process_host_impl.cc.

And populating the android-only fields in ChildProcessTerminationInfo requires merging more prior CLs, although nothing uses those fields at that point, so you could leave them out during merge.
We think we'd like to postpone the merge for now:
- The merge is slightly risky due to no ChildProcessTerminationInfo on M67
- Early Canary data from the new metric doesn't seem promising for explaining the increased CPM rate for Site Isolation.
Labels: -Merge-Approved-67 Merge-Rejected-67
Rejecting merge to M67 based on comment #12. Please re-request a merge to M67 if needed when safe change is ready to be merged. Thank you.

Comment 14 by creis@chromium.org, May 24 2018

Status: Fixed (was: Started)
I think we've concluded our investigation in http://go/site-isolation-stability (sorry, internal link).  The increase in renderer CPMs seems to be due to the increased number of processes in Site Isolation, thanks to third party interference.  The effect is much smaller or non-existent on platforms that prevent such interference.  Marking fixed.

Sign in to add a comment