New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 657012 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Feature



Sign in to add a comment

Annotate Samples from StackSamplingProfiler as Code Progresses

Project Member Reported by bcwh...@chromium.org, Oct 18 2016

Issue description

The StackSamplingProfiler gathers a fixed number of samples and sends the collected information via Metrics.  The current use is "startup" but there is no way of associating which "phase" of startup is being sampled at any given moment.  Thus, while UMA metrics might indicate that some phase is taking excess time, there is no way of breaking down the samples to know which belong to that phase.

Add "annotation" support so that blocks of samples can be associated with particular phases of operation.

 
One thing to note is we actually already do something similar for the Tasks Profiler (i.e. Jank profiler) data.

Here's some relevant client code:

https://cs.chromium.org/chromium/src/components/metrics/profiler/tracking_synchronizer.cc?rcl=1476789447&l=313

The way that's done is it actually has a repeated field of "past events":

https://cs.corp.google.com/piper///depot/google3/logs/proto/googleclient/chrome/uma/profiler_event.proto

One of the advantages of using a repeated field is that this can support events arriving at different order. For example, we may finish loading something from disk before we paint the first tab or after.

Having it represented by repeated fields allows the data format to express both of those cases without having to enumerate all the orderings ahead of time.
I was thinking of a new "annotation" repeated field in the Sample structure of CallStackProfile:

  // A sample consisting of one or more callstacks with the same stack frames
  // and instruction pointers.
  message Sample {
    // The callstack. Sample.entries[0] represents the call on the top of the
    // stack.
    repeated Entry entry = 1;

    // Number of times this stack signature occurs.
    optional int64 count = 2;

    // New annotations that were set between the last sample and this one. If
    // the number is negative, it applies backwards, such as an action that
    // just completed.
    repeated int32 annotation = 3;
  }

This should be fairly easy when order is preserved but will complicate the un-ordered collection because it'll be able to merge only while the set of annotations remains constant.  Still, shouldn't be too difficult.
The jank profiler method seems like a reasonable way to go. I think we'll want to explicitly call out the phases in the proto in a similar fashion as well, to avoid confusion keeping things in sync between the Chrome and server side code.
It's pretty much the same idea except it marks start-times and we'll generally know the finish times because that's when the UMA metrics are recorded.
There is a "phase" of the system: startup, general operation, shutdown.  Each of these may be split into many sub-phases.  It's fine to just set bits as each completes and report the entire thing as a bit-field.  I say bit-field instead of a counter because it makes it easy to further subdivide the phases in the future.

But within any of these phases there may be things that start and end: rendering a page, writing profiles to disk, doing a metrics upload, etc.

While the former flags would get set and remain set the latter would be set and cleared as those tasks begin and end.  The former will be small and dense but the latter could be very large and sparse.  Think that for now they can both be handled as simple bit-fields but it may be necessary in the future to use a different mechanism for the latter if the number of flags becomes very large.

Kicking off discussion of what state to record:

I suggest recording phases associated with the relevant timing heartbeat histograms:

NewTabPage.LoadTime
Startup.FirstWebContents.MainNavigationStart
PageLoad.PaintTiming.NavigationToFirstContentfulPaint
SessionRestore.ForegroundTabFirstLoaded
Startup.FirstWebContents.NonEmptyPaint2

Plus the non-duplicative timing histograms listed under the StartupTemperature section of histograms.xml: https://chromium.googlesource.com/chromium/src/+/fc917a6b8f127fa85e7864d2b7a0c7088bfaa86c/tools/metrics/histograms/histograms.xml#111549

Plus Startup.Temperature itself (this is a per-profile state, rather than a phase).

Anything else you can think of Alexei?
Nothing per-profile will work because only the first setting of a "phase" state will be remembered.

Other things for the startup before contents?

MainLoop started?

Though that could be tricky because the flags will have to be defined in components (or something accessible by components).  Hmmm...


Also...  What other activities happen that could affect performance and should be monitored?

Metrics Collection?
Writing Profile State?
Opening New Profile?
Cc: gab@chromium.org
+gab for his thoughts on what stages we want to breakdown sampling profiler data for (initially)

I'm ok with us not adding too many to start with, just some important ones:
  Startup.FirstWebContents.MainNavigationStart
  Startup.FirstWebContents.NonEmptyPaint2

For other ones, I'm not 100% sure, here are my thoughts:
  - NewTabPage.LoadTime (this will be logged multiple times, so not sure we should have it)
  - SessionRestore.ForegroundTabFirstLoaded (maybe interesting - I guess we'd only hit it on session restore start ups which will be a subset - but I guess useful in specifically diagnosing regressions in that metric, so probably fine)
  - PageLoad.PaintTiming.NavigationToFirstContentfulPaint (again, loaded multiple times - so not sure if we want it)


Comment 10 by gab@chromium.org, Nov 7 2016

Phases that allow us to reconstruct a mental image of a startup trace as in https://docs.google.com/document/d/1MN7I9f_7lUthZkmDaTBHsCtOUB8ApiJ8y-AIbs6ubxI/edit#heading=h.dfg4l0unq4cd
would be useful.

e.g.:
 Startup.BrowserWindowDisplay (albeit being aware that this metric is potentially broken:  issue 589118 ).
 The initial ProfileManager::GetProfile (need to look closer but looks like Profile.CreateProfileHelperTime might be that)
 StartupBrowserCreator::Start (Startup.StartupBrowserCreator_Start)
 Startup.MessageLoopStartTime is probably also interesting.

I like bcwhite's idea of using a bit field so that overlapping phases can be tracked.
Project Member

Comment 11 by bugdroid1@chromium.org, Nov 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/30c14f27cf63a48b082efe67ae29bb6e4282cbd7

commit 30c14f27cf63a48b082efe67ae29bb6e4282cbd7
Author: bcwhite <bcwhite@chromium.org>
Date: Wed Nov 23 22:56:14 2016

Add process lifetime annotations to stack samples.

Annotations are passed in the proto along
with the stack frames so analysis can correlate with other things
happening on the system.

BUG= 657012 

Review-Url: https://codereview.chromium.org/2444143002
Cr-Commit-Position: refs/heads/master@{#434273}

[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/native_stack_sampler.h
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/native_stack_sampler_posix.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/native_stack_sampler_win.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/stack_sampling_profiler.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/stack_sampling_profiler.h
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/base/profiler/stack_sampling_profiler_unittest.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/call_stack_profile_metrics_provider.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/call_stack_profile_metrics_provider.h
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/call_stack_profile_metrics_provider_unittest.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/proto/call_stack_profile.proto
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/proto/execution_context.proto
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/public/cpp/call_stack_profile.typemap
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/public/cpp/call_stack_profile_struct_traits.h
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/public/cpp/call_stack_profile_struct_traits_unittest.cc
[modify] https://crrev.com/30c14f27cf63a48b082efe67ae29bb6e4282cbd7/components/metrics/public/interfaces/call_stack_profile_collector.mojom

Project Member

Comment 12 by bugdroid1@chromium.org, Jan 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/da097d30d0e99243c59c75d336a761b0ec5e8def

commit da097d30d0e99243c59c75d336a761b0ec5e8def
Author: bcwhite <bcwhite@chromium.org>
Date: Thu Jan 12 23:55:03 2017

Set process phases in the StackSamplingProfiler.

BUG= 657012 

Review-Url: https://codereview.chromium.org/2530043002
Cr-Commit-Position: refs/heads/master@{#443394}

[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/base/profiler/stack_sampling_profiler.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/base/profiler/stack_sampling_profiler.h
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/base/profiler/stack_sampling_profiler_unittest.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/chrome/browser/chrome_browser_main.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/chrome/browser/chrome_browser_main.h
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/call_stack_profile_metrics_provider.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/call_stack_profile_metrics_provider.h
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/call_stack_profile_metrics_provider_unittest.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/proto/execution_context.proto
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/public/cpp/call_stack_profile_struct_traits.h
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/metrics/public/interfaces/call_stack_profile_collector.mojom
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/startup_metric_utils/browser/BUILD.gn
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/startup_metric_utils/browser/DEPS
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/components/startup_metric_utils/browser/startup_metric_utils.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/content/browser/browser_main_loop.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/content/browser/browser_main_loop.h
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/content/browser/browser_main_runner.cc
[modify] https://crrev.com/da097d30d0e99243c59c75d336a761b0ec5e8def/content/public/browser/browser_main_parts.h

Status: Fixed (was: Started)
Can you create a separate bug to cover propagation of annotations to non-browser processes, if that won't be implemented under this bug?

Sign in to add a comment