New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 881967 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Potential jank caused by histogram synchronizer

Project Member Reported by etienneb@chromium.org, Sep 7

Issue description

We are observing long running tasks that are causing jank on the browser process main thread through slow-reports.

The current scenario is happening when there is an ImportantFileWriter::WriteNow at the same time than an histograms synchronisation. (see attachment)

This can be observed in these slow-reports (on crash):
  be7f01923eeccd7b  ~56 seconds.

I spent some time looking to this issue and I think the problem is related to the lock in FindHistogram (called by WriteNow) which is also used by the histogram.

  WriteNow:
  https://cs.chromium.org/chromium/src/base/files/important_file_writer.cc?l=115
  https://cs.chromium.org/chromium/src/base/metrics/statistics_recorder.cc?l=194

  GetHistograms:
  https://cs.chromium.org/chromium/src/base/metrics/statistics_recorder.cc?l=331

But this lock is not enough to cause the jank. I'm still missing something.

Strangely, there is a watchdog of minutes, which is aligned with the trace observed:

https://cs.chromium.org/chromium/src/content/browser/histogram_synchronizer.cc?sq=package:chromium&dr&g=0&l=206

https://cs.chromium.org/chromium/src/content/browser/histogram_synchronizer.cc?sq=package:chromium&dr&g=0&l=250
 
histograms.png
30.7 KB View Download
Components: Internals>Metrics>UMA
Cc: wittman@google.com
So looking at this more, the task is the one that's posted by InternalPostTask - so actually it's not the histogram synchronizer code itself but whatever callback is given to it.

So let's trace back where it's coming from:

InternalPostTask
SetTaskRunnerAndCallback
HistogramSynchronizer::FetchHistogramsAsynchronously
content::FetchHistogramsAsynchronously
ChromeMetricsServiceClient::OnMemoryDetailCollectionDone
ChromeMetricsServiceClient::OnHistogramSynchronizationDone
ChromeMetricsServiceClient::CollectFinalMetricsForLog

->
MetricsService::OnFinalLogInfoCollectionDone

So that's the culprit I think.

So interestingly, this *can* call OpenNewLog(). So I think this is likely the same issue that Etienne was seeing in another trace. (Not 100% sure because it's not clear that OpenNewLog() inside MetricsService::OnFinalLogInfoCollectionDone is to blame or something else.)

.... I think stack sampled metrics would be a good use here rather than trying to reason about all of this manually. Etienne, is that something you're already collaborating on Mike with? Or when you mentioned stacks was that something else?
FYI This is what OnFinalLogInfoCollectionDone looks like in sampling profiler data:

https://uma.googleplex.com/callstacks?sid=891c3560dbe29e34891ad81bf240442c

Nothing stands out very obviously.

(Also I appreciate if someone can double check my callback following logic in comment 2 to make sure I didn't miss something.)
Cc: ssid@chromium.org
I personally use the UMA sampler to understand the calltree. It ease debugging a lot.

We are rolling out a version of slow-reports that is running the sampler profiler. It will ease the correlation of both. Some results should come in a few days.

Keep in mind that the UMA sampler profile is the "average" user, but the slow-reports is the 99th percentile user. We won't see the same profile and often some rare codepaths may be executed.
Labels: Hotlist-SamplingProfilerInField Performance-Browser

Sign in to add a comment