New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 788808 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 29
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature



Sign in to add a comment

Enable sampling profiler for renderer processes

Project Member Reported by wittman@chromium.org, Nov 27 2017

Issue description

This will support profiling code in the renderer.

Implementation steps:

Create a StackSamplingProfiler in a similar fashion as in the GPU process: https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.cc?sq=package:chromium&dr=CSs&l=40-51

Verify that the profiler can walk stacks containing frames in v8 runtime-generated code, on Windows and on Mac.

Decide which renderer instances should be profiled, and flag profiles from renderers started at Chrome startup.

(If necessary) optimize the profile representation to reduce memory usage.
 
Cc: tdres...@chromium.org

Comment 2 by vmp...@chromium.org, Nov 28 2017

Cc: ericrk@chromium.org enne@chromium.org vmi...@chromium.org khushals...@chromium.org
I've broken out the optimization of the profile representation into  issue 804942  FWIW.

Comment 4 by sadrul@chromium.org, Jan 23 2018

Cc: sadrul@chromium.org
If we want to profile only the compositor thread at first, then we wouldn't have to worry about v8 stack etc. Right?

Comment 5 by sadrul@chromium.org, Jan 30 2018

Owner: sadrul@chromium.org
Status: Started (was: Assigned)
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 31 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f9c9c93a13722b535707412b677f2cfb187fd17c

commit f9c9c93a13722b535707412b677f2cfb187fd17c
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Wed Jan 31 21:05:51 2018

sampling profiler: Merge some configs into ENABLED.

Instead of having separate configurations for BROWSER, BROWSER_AND_GPU,
and GPU, merge all of them into a single ENABLED configuration. This
makes it easier to turn on sampling profiler for other process types
(e.g. renderer).

BUG= 788808 

Change-Id: Ic67cd07abeb7c935dc4584f3dfbf72bd82019e99
Reviewed-on: https://chromium-review.googlesource.com/894451
Reviewed-by: Mike Wittman <wittman@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#533409}
[modify] https://crrev.com/f9c9c93a13722b535707412b677f2cfb187fd17c/chrome/common/stack_sampling_configuration.cc
[modify] https://crrev.com/f9c9c93a13722b535707412b677f2cfb187fd17c/chrome/common/stack_sampling_configuration.h

Project Member

Comment 7 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0e60bc47077c56dfadc02805ab21ed416252e2ef

commit 0e60bc47077c56dfadc02805ab21ed416252e2ef
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Thu Feb 01 01:14:34 2018

sampling profiler: Add enum for compositor thread.

The renderer processes can have a compositor thread. The gpu process
can also have a compositor thread (for the display compositor). So,
introduce a COMPOSITOR_THREAD enum so that we can add support for
the sampling profiler for the compositor thread in renderer and gpu
processes.

BUG= 788808 

Change-Id: I6cfe49d441ef636ce4a3c1ca4c60b7d09e7bc201
Reviewed-on: https://chromium-review.googlesource.com/892362
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#533508}
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/call_stack_profile_metrics_provider.cc
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/call_stack_profile_params.h
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/cpp/call_stack_profile_struct_traits.h
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/cpp/call_stack_profile_struct_traits_unittest.cc
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/interfaces/call_stack_profile_collector.mojom
[modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/third_party/metrics_proto/execution_context.proto

Project Member

Comment 8 by bugdroid1@chromium.org, Feb 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/16b3a169afbad1e9fab637cae7d00e52b27e24aa

commit 16b3a169afbad1e9fab637cae7d00e52b27e24aa
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Sat Feb 03 01:43:18 2018

sampling profiler: Set up startup profiler for renderer.

Set up a startup sampling-profiler for the compositor thread in the
renderer process. Notable changes:
. Callback to the RendererClient when the compositor-thread is created.
  The client uses this callback to set-up the sampling profiler.
. Allow the renderer processes to connect to the browser to report the
  profiler data over mojom. Add the --start-stack-profiler flag to the
  renderer's cmd-line so that it can initiate sampling profiler. Select
  every fifth renderer for collecting the profile data from, so that we
  can still get enough useful data, but don't overburden bandwidth or
  memory usage.

BUG= 788808 

Change-Id: I478184502a17f4dbb83f03a67a7ef482c92c0a78
Reviewed-on: https://chromium-review.googlesource.com/894848
Reviewed-by: Scott Violet <sky@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#534235}
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/browser/chrome_content_browser_client.cc
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/browser/chrome_content_browser_manifest_overlay.json
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/common/stack_sampling_configuration.cc
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/DEPS
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/chrome_content_renderer_client.cc
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/chrome_content_renderer_client.h
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/public/renderer/content_renderer_client.cc
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/public/renderer/content_renderer_client.h
[modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/renderer/render_thread_impl.cc

Startup and periodic collection of the compositor thread are now enabled as of 8d74ea2d5a9b5e896fe542a038ec1ce7cb8bba0c (I forgot to tag this bug on the latter change) and the data is now available in the dashboard: https://uma.googleplex.com/p/chrome/callstacks?sid=0b7891c31c13c1a71829d27a6e8aac55

For the periodic collection 99.6% of samples are idling in the message loop. Of the remaining 0.4% of samples, 0.26% represent non-idle samples and are representative of the work being done on the thread. The remaining 0.13% represent non-recoverable samples due to failures to walk the complete stack (this is roughly normal for Windows). The non-recoverable samples are lumped with the non-idle samples for accounting purposes, which is why the flame graph only shows 67% of non-idle samples in base::`anonymous namespace'::ThreadFunc.
I also tested out running the profiler on the render thread. With the latest changes this now a simple matter of calling ThreadProfiler::CreateAndStartOnMainThread() and saving the result until thread shutdown.

It seems to work fine except that it does stop when it hits v8 runtime-generated code, at the v8::internal::Builtin_HandleApiCall frame. We'll need to work with v8 to be able to unwind through these frames.
Project Member

Comment 11 by bugdroid1@chromium.org, Mar 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/97c6f1f84c9ccf85a328470d437d68b7755655c2

commit 97c6f1f84c9ccf85a328470d437d68b7755655c2
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Fri Mar 02 01:21:16 2018

viz: Start the sampling profiler for oop-d thread.

Start the sampling profiler on the oop-d thread. Plumb the task-runner
of the compositor thread from viz through content to chrome, which
starts the profiler on the thread.

BUG= 788808 

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel
Change-Id: Id9e05bc290205ad2d397028313b6e1a2430a9d92
Reviewed-on: https://chromium-review.googlesource.com/943401
Reviewed-by: Antoine Labour <piman@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540396}
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/chrome/gpu/chrome_content_gpu_client.cc
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/chrome/gpu/chrome_content_gpu_client.h
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/components/viz/service/main/viz_main_impl.cc
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/components/viz/service/main/viz_main_impl.h
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/gpu/gpu_child_thread.cc
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/gpu/gpu_child_thread.h
[modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/public/gpu/content_gpu_client.h

Cc: chrishtr@chromium.org
Any idea when we'll get this enabled on the main thread?

chrishtr@ and I were talking today, and it sounds like he would get a fair bit of value out of this even without V8 stacks.
For v8: wittman@ has a doc [1] that explains what needs to happen. There are currently no owners, as far as I know. Perhaps someone from jbroman@'s team can own that?

In the meantime, I was thinking of turning this on even just for the native code. I think that could also be useful, even without the work for v8.

[1] https://docs.google.com/document/d/1MNl5x_01As_OTOs4PIM6rezQ4LNmEpPNZV3vuY26dcg/edit
Cc: jbroman@chromium.org
+jbroman@
+1 to turning this on for native code only.
If there's value in obtaining native code sampling on the render thread, then turning it on for that thread in advance of the V8 unwinding SGTM. 

Keep in mind that this will only recover stacks with *no* V8 code. I have no idea what fraction that is, but I guess we can find out. :)
Sadrul, are you the right person to own enabling this on the main thread, without V8 stack support?
It seems like the sort of thing that would be more easily tacked by someone on the V8 team than us, given how entangled unwinding JS stacks are with the runtime implementation. As that doc demonstrates, it is a little thorny. :)
The problem space is complex but the solution is not particularly complicated -- the Unwind Algorithm and Unwind Information sections in the doc distill exactly what code needs to be written to support V8 frame unwinding.

Implementing this would be a largely mechanical effort that probably could be done by any engineer in a few weeks of dedicated effort, if someone wants to pick it up.
Also, enabling profiling without V8 stack support on the render thread is trivial, just mimic what's being done for the ChromeContentGpuClient main thread profiler: https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.h?sq=package:chromium&dr=CSs&l=64

Not sure if Sadrul is intending to do this, but I'd be happy to review a CL from anyone. :)
I wanted to do a little bit of cleanup first ... I will put up CLs.
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9aa68366bdb67e4c83aca1fceb4ab762201abba8

commit 9aa68366bdb67e4c83aca1fceb4ab762201abba8
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Tue Apr 10 02:46:26 2018

profiler: Simplify the enum for Thread.

Instead of having separate enums for the main-thread in various
processes, just have a single MAIN_THREAD identifier for the
main thread in all processes. The callstacks are associated with
a Process enum anyway, so no information is lost from this.

BUG= 788808 

Change-Id: I095339ce8f9c6db5f9946f93fb677a8fe298f66a
Reviewed-on: https://chromium-review.googlesource.com/1000292
Reviewed-by: Ken Buchanan <kenrb@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549391}
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/browser/chrome_browser_main.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/common/thread_profiler.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/common/thread_profiler.h
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/gpu/chrome_content_gpu_client.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_metrics_provider.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_metrics_provider_unittest.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_params.h
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/child_call_stack_profile_collector_unittest.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/cpp/call_stack_profile_struct_traits.h
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/cpp/call_stack_profile_struct_traits_unittest.cc
[modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/interfaces/call_stack_profile_collector.mojom

Project Member

Comment 23 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d37b4d4c7be5a962a91e99f8622ec83597b93083

commit d37b4d4c7be5a962a91e99f8622ec83597b93083
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Tue Apr 10 17:32:39 2018

profiler: Sampling profiler for main thread in renderer.

Turn on the sampling profiler on the main thread in renderer
processes. Note that this turns on the profiler only for native
(i.e. c++) code. Additional work needs to happen to turn this
on for javascript code. See details here:
https://docs.google.com/document/d/1MNl5x_01As_OTOs4PIM6rezQ4LNmEpPNZV3vuY26dcg/edit

BUG= 788808 

Change-Id: I1b3f638daf6d24624db4eb9e9226cfd363328284
Reviewed-on: https://chromium-review.googlesource.com/1003377
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549575}
[modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/chrome/renderer/chrome_content_renderer_client.cc
[modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/chrome/renderer/chrome_content_renderer_client.h
[modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/content/public/test/render_view_test.cc

Cool, thanks!

Will this data be available on the Canary channel starting in a few days?
Yes, we should have data by Friday.
Awesome, thanks!
Cc: altimin@chromium.org
We now have profiles from the render thread without V8. Here's the steady-state sampling: https://uma.googleplex.com/p/chrome/callstacks?sid=542892d8c93d1a034c52b7667051abf1

And here's process startup: https://uma.googleplex.com/p/chrome/callstacks?sid=3f959d5312aed0e0f18b766958c88307

On first glance it looks like ~71% of the startup samples are not in V8 code, and ~90% for the steady-state samples.

Note that the unsymbolized functions in the steady-state profiling are a transient issue that should be resolved in tomorrow's run.
Should we consider getting the samples for extension-renderer processes separately?
Thanks - this data is terrifying.

2% of samples in blink::scheduler::RendererMetricsHelper::RecordTaskMetrics?
2.6% in base::Time::ActivateHighResolutionTimer?

Mike - just confirming I'm reading this correctly, before I file some bugs.
Cc: npm@chromium.org
Er, re #27, where are you getting the numbers on fraction of time spent outside of V8?

+npm@, as EQT computation appears extremely expensive.
re #28: separating out extension renderers sounds reasonable to me. We should be able to do this by supplying an indicator of renderer process type with the sample collection.

re #29: for full context, those are the percentages of samples where we're actually doing work (i.e. not idling in the message loop). The continuous collection samples from renderers uniformly over time, so I would expect these profiles to be weighted towards activity occurring in 'mostly inactive' renderers

In the process startup profiles, which I assume would be more reflective of an 'active' workload, those percentage are much smaller

re:#30: the fraction of time spent outside of V8 comes from the wWinMain percentage, which is the percentage of stacks we were able to successfully capture. I expect that nearly all the stacks we weren't able to capture are V8 stacks.

Status: Fixed (was: Started)
I have filed https://bugs.chromium.org/p/chromium/issues/detail?id=909957 to track the work needed for v8. I am closing this one.

Sign in to add a comment