Enable sampling profiler for renderer processes |
|||||||||
Issue descriptionThis will support profiling code in the renderer. Implementation steps: Create a StackSamplingProfiler in a similar fashion as in the GPU process: https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.cc?sq=package:chromium&dr=CSs&l=40-51 Verify that the profiler can walk stacks containing frames in v8 runtime-generated code, on Windows and on Mac. Decide which renderer instances should be profiled, and flag profiles from renderers started at Chrome startup. (If necessary) optimize the profile representation to reduce memory usage.
,
Nov 28 2017
,
Jan 23 2018
I've broken out the optimization of the profile representation into issue 804942 FWIW.
,
Jan 23 2018
If we want to profile only the compositor thread at first, then we wouldn't have to worry about v8 stack etc. Right?
,
Jan 30 2018
,
Jan 31 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f9c9c93a13722b535707412b677f2cfb187fd17c commit f9c9c93a13722b535707412b677f2cfb187fd17c Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Wed Jan 31 21:05:51 2018 sampling profiler: Merge some configs into ENABLED. Instead of having separate configurations for BROWSER, BROWSER_AND_GPU, and GPU, merge all of them into a single ENABLED configuration. This makes it easier to turn on sampling profiler for other process types (e.g. renderer). BUG= 788808 Change-Id: Ic67cd07abeb7c935dc4584f3dfbf72bd82019e99 Reviewed-on: https://chromium-review.googlesource.com/894451 Reviewed-by: Mike Wittman <wittman@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#533409} [modify] https://crrev.com/f9c9c93a13722b535707412b677f2cfb187fd17c/chrome/common/stack_sampling_configuration.cc [modify] https://crrev.com/f9c9c93a13722b535707412b677f2cfb187fd17c/chrome/common/stack_sampling_configuration.h
,
Feb 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0e60bc47077c56dfadc02805ab21ed416252e2ef commit 0e60bc47077c56dfadc02805ab21ed416252e2ef Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Thu Feb 01 01:14:34 2018 sampling profiler: Add enum for compositor thread. The renderer processes can have a compositor thread. The gpu process can also have a compositor thread (for the display compositor). So, introduce a COMPOSITOR_THREAD enum so that we can add support for the sampling profiler for the compositor thread in renderer and gpu processes. BUG= 788808 Change-Id: I6cfe49d441ef636ce4a3c1ca4c60b7d09e7bc201 Reviewed-on: https://chromium-review.googlesource.com/892362 Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Mike Wittman <wittman@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#533508} [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/call_stack_profile_metrics_provider.cc [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/call_stack_profile_params.h [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/cpp/call_stack_profile_struct_traits.h [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/cpp/call_stack_profile_struct_traits_unittest.cc [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/components/metrics/public/interfaces/call_stack_profile_collector.mojom [modify] https://crrev.com/0e60bc47077c56dfadc02805ab21ed416252e2ef/third_party/metrics_proto/execution_context.proto
,
Feb 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16b3a169afbad1e9fab637cae7d00e52b27e24aa commit 16b3a169afbad1e9fab637cae7d00e52b27e24aa Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Sat Feb 03 01:43:18 2018 sampling profiler: Set up startup profiler for renderer. Set up a startup sampling-profiler for the compositor thread in the renderer process. Notable changes: . Callback to the RendererClient when the compositor-thread is created. The client uses this callback to set-up the sampling profiler. . Allow the renderer processes to connect to the browser to report the profiler data over mojom. Add the --start-stack-profiler flag to the renderer's cmd-line so that it can initiate sampling profiler. Select every fifth renderer for collecting the profile data from, so that we can still get enough useful data, but don't overburden bandwidth or memory usage. BUG= 788808 Change-Id: I478184502a17f4dbb83f03a67a7ef482c92c0a78 Reviewed-on: https://chromium-review.googlesource.com/894848 Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: Mike Wittman <wittman@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Reviewed-by: Alexei Svitkine <asvitkine@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#534235} [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/browser/chrome_content_browser_client.cc [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/browser/chrome_content_browser_manifest_overlay.json [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/common/stack_sampling_configuration.cc [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/DEPS [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/chrome_content_renderer_client.cc [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/chrome/renderer/chrome_content_renderer_client.h [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/public/renderer/content_renderer_client.cc [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/public/renderer/content_renderer_client.h [modify] https://crrev.com/16b3a169afbad1e9fab637cae7d00e52b27e24aa/content/renderer/render_thread_impl.cc
,
Feb 28 2018
Startup and periodic collection of the compositor thread are now enabled as of 8d74ea2d5a9b5e896fe542a038ec1ce7cb8bba0c (I forgot to tag this bug on the latter change) and the data is now available in the dashboard: https://uma.googleplex.com/p/chrome/callstacks?sid=0b7891c31c13c1a71829d27a6e8aac55 For the periodic collection 99.6% of samples are idling in the message loop. Of the remaining 0.4% of samples, 0.26% represent non-idle samples and are representative of the work being done on the thread. The remaining 0.13% represent non-recoverable samples due to failures to walk the complete stack (this is roughly normal for Windows). The non-recoverable samples are lumped with the non-idle samples for accounting purposes, which is why the flame graph only shows 67% of non-idle samples in base::`anonymous namespace'::ThreadFunc.
,
Feb 28 2018
I also tested out running the profiler on the render thread. With the latest changes this now a simple matter of calling ThreadProfiler::CreateAndStartOnMainThread() and saving the result until thread shutdown. It seems to work fine except that it does stop when it hits v8 runtime-generated code, at the v8::internal::Builtin_HandleApiCall frame. We'll need to work with v8 to be able to unwind through these frames.
,
Mar 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/97c6f1f84c9ccf85a328470d437d68b7755655c2 commit 97c6f1f84c9ccf85a328470d437d68b7755655c2 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri Mar 02 01:21:16 2018 viz: Start the sampling profiler for oop-d thread. Start the sampling profiler on the oop-d thread. Plumb the task-runner of the compositor thread from viz through content to chrome, which starts the profiler on the thread. BUG= 788808 Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel Change-Id: Id9e05bc290205ad2d397028313b6e1a2430a9d92 Reviewed-on: https://chromium-review.googlesource.com/943401 Reviewed-by: Antoine Labour <piman@chromium.org> Reviewed-by: Mike Wittman <wittman@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#540396} [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/chrome/gpu/chrome_content_gpu_client.cc [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/chrome/gpu/chrome_content_gpu_client.h [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/components/viz/service/main/viz_main_impl.cc [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/components/viz/service/main/viz_main_impl.h [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/gpu/gpu_child_thread.cc [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/gpu/gpu_child_thread.h [modify] https://crrev.com/97c6f1f84c9ccf85a328470d437d68b7755655c2/content/public/gpu/content_gpu_client.h
,
Apr 5 2018
Any idea when we'll get this enabled on the main thread? chrishtr@ and I were talking today, and it sounds like he would get a fair bit of value out of this even without V8 stacks.
,
Apr 5 2018
For v8: wittman@ has a doc [1] that explains what needs to happen. There are currently no owners, as far as I know. Perhaps someone from jbroman@'s team can own that? In the meantime, I was thinking of turning this on even just for the native code. I think that could also be useful, even without the work for v8. [1] https://docs.google.com/document/d/1MNl5x_01As_OTOs4PIM6rezQ4LNmEpPNZV3vuY26dcg/edit
,
Apr 5 2018
+jbroman@
,
Apr 5 2018
+1 to turning this on for native code only.
,
Apr 5 2018
If there's value in obtaining native code sampling on the render thread, then turning it on for that thread in advance of the V8 unwinding SGTM. Keep in mind that this will only recover stacks with *no* V8 code. I have no idea what fraction that is, but I guess we can find out. :)
,
Apr 6 2018
Sadrul, are you the right person to own enabling this on the main thread, without V8 stack support?
,
Apr 6 2018
It seems like the sort of thing that would be more easily tacked by someone on the V8 team than us, given how entangled unwinding JS stacks are with the runtime implementation. As that doc demonstrates, it is a little thorny. :)
,
Apr 6 2018
The problem space is complex but the solution is not particularly complicated -- the Unwind Algorithm and Unwind Information sections in the doc distill exactly what code needs to be written to support V8 frame unwinding. Implementing this would be a largely mechanical effort that probably could be done by any engineer in a few weeks of dedicated effort, if someone wants to pick it up.
,
Apr 6 2018
Also, enabling profiling without V8 stack support on the render thread is trivial, just mimic what's being done for the ChromeContentGpuClient main thread profiler: https://cs.chromium.org/chromium/src/chrome/gpu/chrome_content_gpu_client.h?sq=package:chromium&dr=CSs&l=64 Not sure if Sadrul is intending to do this, but I'd be happy to review a CL from anyone. :)
,
Apr 6 2018
I wanted to do a little bit of cleanup first ... I will put up CLs.
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9aa68366bdb67e4c83aca1fceb4ab762201abba8 commit 9aa68366bdb67e4c83aca1fceb4ab762201abba8 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Tue Apr 10 02:46:26 2018 profiler: Simplify the enum for Thread. Instead of having separate enums for the main-thread in various processes, just have a single MAIN_THREAD identifier for the main thread in all processes. The callstacks are associated with a Process enum anyway, so no information is lost from this. BUG= 788808 Change-Id: I095339ce8f9c6db5f9946f93fb677a8fe298f66a Reviewed-on: https://chromium-review.googlesource.com/1000292 Reviewed-by: Ken Buchanan <kenrb@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: Mike Wittman <wittman@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#549391} [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/browser/chrome_browser_main.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/common/thread_profiler.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/common/thread_profiler.h [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/chrome/gpu/chrome_content_gpu_client.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_metrics_provider.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_metrics_provider_unittest.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/call_stack_profile_params.h [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/child_call_stack_profile_collector_unittest.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/cpp/call_stack_profile_struct_traits.h [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/cpp/call_stack_profile_struct_traits_unittest.cc [modify] https://crrev.com/9aa68366bdb67e4c83aca1fceb4ab762201abba8/components/metrics/public/interfaces/call_stack_profile_collector.mojom
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d37b4d4c7be5a962a91e99f8622ec83597b93083 commit d37b4d4c7be5a962a91e99f8622ec83597b93083 Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Tue Apr 10 17:32:39 2018 profiler: Sampling profiler for main thread in renderer. Turn on the sampling profiler on the main thread in renderer processes. Note that this turns on the profiler only for native (i.e. c++) code. Additional work needs to happen to turn this on for javascript code. See details here: https://docs.google.com/document/d/1MNl5x_01As_OTOs4PIM6rezQ4LNmEpPNZV3vuY26dcg/edit BUG= 788808 Change-Id: I1b3f638daf6d24624db4eb9e9226cfd363328284 Reviewed-on: https://chromium-review.googlesource.com/1003377 Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Reviewed-by: Mike Wittman <wittman@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Cr-Commit-Position: refs/heads/master@{#549575} [modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/chrome/renderer/chrome_content_renderer_client.cc [modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/chrome/renderer/chrome_content_renderer_client.h [modify] https://crrev.com/d37b4d4c7be5a962a91e99f8622ec83597b93083/content/public/test/render_view_test.cc
,
Apr 10 2018
Cool, thanks! Will this data be available on the Canary channel starting in a few days?
,
Apr 10 2018
Yes, we should have data by Friday.
,
Apr 10 2018
Awesome, thanks!
,
Apr 12 2018
We now have profiles from the render thread without V8. Here's the steady-state sampling: https://uma.googleplex.com/p/chrome/callstacks?sid=542892d8c93d1a034c52b7667051abf1 And here's process startup: https://uma.googleplex.com/p/chrome/callstacks?sid=3f959d5312aed0e0f18b766958c88307 On first glance it looks like ~71% of the startup samples are not in V8 code, and ~90% for the steady-state samples. Note that the unsymbolized functions in the steady-state profiling are a transient issue that should be resolved in tomorrow's run.
,
Apr 13 2018
Should we consider getting the samples for extension-renderer processes separately?
,
Apr 13 2018
Thanks - this data is terrifying. 2% of samples in blink::scheduler::RendererMetricsHelper::RecordTaskMetrics? 2.6% in base::Time::ActivateHighResolutionTimer? Mike - just confirming I'm reading this correctly, before I file some bugs.
,
Apr 13 2018
Er, re #27, where are you getting the numbers on fraction of time spent outside of V8? +npm@, as EQT computation appears extremely expensive.
,
Apr 13 2018
re #28: separating out extension renderers sounds reasonable to me. We should be able to do this by supplying an indicator of renderer process type with the sample collection. re #29: for full context, those are the percentages of samples where we're actually doing work (i.e. not idling in the message loop). The continuous collection samples from renderers uniformly over time, so I would expect these profiles to be weighted towards activity occurring in 'mostly inactive' renderers In the process startup profiles, which I assume would be more reflective of an 'active' workload, those percentage are much smaller re:#30: the fraction of time spent outside of V8 comes from the wWinMain percentage, which is the percentage of stacks we were able to successfully capture. I expect that nearly all the stacks we weren't able to capture are V8 stacks.
,
Nov 29
I have filed https://bugs.chromium.org/p/chromium/issues/detail?id=909957 to track the work needed for v8. I am closing this one. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by tdres...@chromium.org
, Nov 28 2017