New issue
Advanced search Search tips

Issue 810748 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

OOP HP: Add a parameter to enabling sampling allocations.

Project Member Reported by erikc...@chromium.org, Feb 9 2018

Issue description

Add a mode to OOP HP to support alph's poisson sampling algorithm.

Simply, probability of taking a sample of size N is (N / sample_size). This has the following effects:

1) Allocations >> sample_size are always sampled.

2) Allocations << sample size with low frequency [e.g. frequency * N << sample_size] will be lost in the noise.

3) Allocations << sample size with high frequency [e.g. frequency * N >> sample_size] will have std dev = frequency * sqrt(N). 

Basically,
 * small, infrequent allocations will have high noise. These are precisely the allocations we don't find interesting for heap profiling from the wild.
 * Everything else will have fairly high accuracy
 
Project Member

Comment 1 by bugdroid1@chromium.org, Feb 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e

commit 8a2fb2e70857db265b990ff2d41fcc9f09bfae7e
Author: Erik Chen <erikchen@chromium.org>
Date: Fri Feb 09 20:34:29 2018

Refactor StartProfiling to take a ProfilingParams struct.

This CL is a refactor and has no intended behavior change. It wraps all
profiling parameters into a ProfilingParams struct, which will simplify future
changes that require adding new profiling params.

Bug:  810748 
Change-Id: Ia371d4e84027d14031a6c24b57c8955f4ec46eb5
Reviewed-on: https://chromium-review.googlesource.com/911509
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535811}
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/browser/profiling_host/profiling_process_host.cc
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/memlog_allocator_shim.cc
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/memlog_allocator_shim.h
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/profiling_client.cc
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/profiling_client.h
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/profiling_client.mojom
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/common/profiling/profiling_service.mojom
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/profiling/memlog_connection_manager.cc
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/profiling/memlog_connection_manager.h
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/profiling/profiling_service.cc
[modify] https://crrev.com/8a2fb2e70857db265b990ff2d41fcc9f09bfae7e/chrome/profiling/profiling_service.h

silly question: is the sampling done at the client side (where the allocation is intercepted) or at the service side (after the recv()) ?
If the latter, isn't your overhead dominated by the backtrace  + socket anyways?

client side. I already have a CL that does this, with tests, etc.

https://chromium-review.googlesource.com/c/chromium/src/+/911891

Currently debugging one last thing...
I wanted to leave a record of some technical details that don't need to live in the C++ implementation of desampling.

From json_exporter.cc:
"""
  if (alloc_size < params->sampling_rate && alloc_size != 0) {
      // To desample, we need to know the probability P that an allocation will
      // be sampled. Once we know P, we still have to deal with discretization.
      // Let's say that there's 1 allocation with P=0.85. Should we report 1 or
      // 2 allocations? Should we report a fudged size (size / 0.85), or a
      // discreted size, e.g. (1 * size) or (2 * size)? There are tradeoffs.
      //
      // We choose to emit a fudged size, which will return a more accurate
      // total allocation size, but less accurate per-allocation size.
      //
      // The probability that an allocation will be sampled is
      // alloc_size / sampling_rate.
"""

There's a subtlety though, what happens if alloc_size >= sampling_rate?

Let R be sampling rate, X be size of allocation. Let Y=(e^(-X/R))
Then P = (1 - Y) - X/R * integral(1/ln(z), {z, 0, Y})

When X=R, then P = 0.85. When X = 2 * R, P = 0.95. Naively, we'd think that we have to desample here. However...in the implementation, when sample_interval < 0, we do sample_interval += GetNext(). This actually corrects for this desampling factor by allowing us to sometimes record *multiple* allocations per sampling interval.

In turns out that over a large number of allocations with alloc_size > sampling_rate, this ends up being perfectly desampled [which is not surprising, considering the poisson process guarantees that the mean of the sum of a large number of GetNextSampleInterval() should equal the mean of a large number of sampled allocations].

I've verified this with an independent python script that models the sampling/desampling process.
Project Member

Comment 5 by bugdroid1@chromium.org, Feb 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae

commit 8bc20d8233df8daf60ae5fa4f5588c71bdce7fae
Author: erikchen <erikchen@chromium.org>
Date: Wed Feb 14 03:21:51 2018

Add a sampling mode to out of process heap profiling.

alph@ demonstrated that using a Poisson process sampling algorithm provides
highly actionable data with low overhead. This CL adds a flag to do that for OOP
HP.

The sampling algorithm guarantees that large allocations, and frequent, small
allocations are recorded with high fidelity. Infrequent, small allocations will
have noise, but those are precisely the allocations we're not interested in.

The new flags are --memlog-sampling and --memlog-sampling-rate=<XYZ>. The former
flag is available in about://flags with a default sampling rate of 1000.

Bug:  810748 
Change-Id: Id88186bdd0c17287a05b44761f6c4d9d9fd2f3cf
Reviewed-on: https://chromium-review.googlesource.com/911891
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Alexei Filippov <alph@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Ted Choc <tedchoc@chromium.org>
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536635}
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/android/javatests/src/org/chromium/chrome/browser/profiling_host/ProfilingProcessHostAndroidTest.java
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/android/javatests/src/org/chromium/chrome/browser/profiling_host/TestAndroidShim.java
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/about_flags.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/flag_descriptions.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/flag_descriptions.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/chrome_browser_main_extra_parts_profiling.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/memlog_browsertest.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_process_host.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_process_host.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_test_driver.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_test_driver.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/test_android_shim.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/test_android_shim.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/chrome_switches.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/chrome_switches.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/profiling/memlog_allocator_shim.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/profiling/profiling_client.mojom
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter_unittest.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/memlog_connection_manager.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/memlog_connection_manager.h
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/profiling_service.cc
[modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/tools/metrics/histograms/enums.xml

I implemented the sampling algorithm used by both alph's NHP and v8's heap sampler. I've attached 2 chrome traces, with sampling interval 100, and no sampling. Notice that the allocations don't line up. Specifically, some allocations < sample_interval are over sampled. I've determined that the current sampling algorithm has a bug. Luckily, there's an easy fix.

Current algorithm pseudocode:
"""
def GetNextSampleInterval():
  ...

sample_interval = GetNextSampleInterval()
def MaybeRecordAlloc(size):
  sample_interval -= size
  if sample_interval < 0:
    RecordAlloc()
    sample_interval += GetNextSampleInterval()
"""

The interesting line is:
"""
  sample_interval += GetNextSampleInterval()
"""

The += is necessary in order to deal with size > expected_sample_interval. Since P is always less than 1, we need to compensate for large allocations by sometimes taking multiple samples per interval. This however, relies on the assumption that the distribution being sampled is uniform, which it isn't. Imagine the following scenario.

expected_sample_interval = 100

MaybeRecordAlloc(10000)
MaybeRecordAlloc(1)
MaybeRecordAlloc(1)
... <-- 100 total calls to MaybeRecordAlloc
MaybeRecordAlloc(1)

We would expect 1 of the 100 calls to MaybeRecordAlloc(1) to be sampled, but instead, they're all sampled! So if you have a system that sometimes makes really big allocations [e.g. large container resizes], followed by a bunch of small allocations [moving/adding elements into the resized container], this algorithm pathologically overcounts small allocations.

The correct algorithm should be:
"""
sample_interval = GetNextSampleInterval()
expected_sample_interval = <XYZ>

def MaybeRecordAlloc(size):
  if size >= expected_sample_interval:
    RecordAlloc()
    return

  sample_interval -= size
  if sample_interval < 0:
    RecordAlloc()
    sample_interval = GetNextSampleInterval()
"""

I've attached a trace with the correct algorithm, which demonstrates the correct behavior.
trace_nosampling.json.gz
3.7 MB Download
trace_sampling100.json.gz
3.7 MB Download
correct_sampling100.json.gz
2.6 MB Download
I've attached traces using sample intervals from 1 to 10^6, increasing by factors of 10. 

There's virtually no noise up to 10,000. There is ~10% noise starting from 100,000. 
correct_sampling1000000.json.gz
41.8 KB Download
correct_sampling100000.json.gz
89.3 KB Download
correct_sampling10000.json.gz
290 KB Download
correct_sampling1000.json.gz
916 KB Download
correct_sampling100.json.gz
2.6 MB Download
correct_sampling10.json.gz
3.6 MB Download
correct_nosampling.json.gz
3.7 MB Download
Correction*, the correct algorithm should be:

"""
sample_interval = GetNextSampleInterval()
expected_sample_interval = <XYZ>

def MaybeRecordAlloc(size):
  if size >= expected_sample_interval:
    RecordAlloc()
    return

  sample_interval -= size
  if sample_interval < 0:
    RecordAlloc()
    sample_interval += GetNextSampleInterval()
"""

We should still use += on sample_interval, otherwise we will undercount samples slightly less than expected_sample_interval. The actual problem was   sample_interval -= size, which would cause undesirable behavior when size >> expected_sample_interval.
Project Member

Comment 9 by bugdroid1@chromium.org, Feb 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/880847569b9398c91655fc4617c9b3753e5d47e9

commit 880847569b9398c91655fc4617c9b3753e5d47e9
Author: erikchen <erikchen@chromium.org>
Date: Wed Feb 14 19:23:21 2018

Add sampling rate as a parameter to uploaded heap dumps.

Bug:  810748 
Change-Id: Ida3b064c694eccd248b37bf3ade9dcbc6978d890
Reviewed-on: https://chromium-review.googlesource.com/919462
Reviewed-by: Albert J. Wong <ajwong@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536773}
[modify] https://crrev.com/880847569b9398c91655fc4617c9b3753e5d47e9/chrome/browser/profiling_host/profiling_process_host.cc

Project Member

Comment 10 by bugdroid1@chromium.org, Feb 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/515f341aec52437dc077b2c71119671fe0a8cc4a

commit 515f341aec52437dc077b2c71119671fe0a8cc4a
Author: erikchen <erikchen@chromium.org>
Date: Wed Feb 14 19:48:42 2018

Fix a bug in sampling algorithm for OOP HP.

The previous algorithm would overcount small allocations that immediately follow
a very large allocation. See
https://bugs.chromium.org/p/chromium/issues/detail?id=810748#c6 for more
details.

The new algorithm samples small allocations as before, but now always records
large allocations.

Bug:  810748 
Change-Id: Ic6f23db49825a07e9e7320addd9ff1d465d0a967
Reviewed-on: https://chromium-review.googlesource.com/919461
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Albert J. Wong <ajwong@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536781}
[modify] https://crrev.com/515f341aec52437dc077b2c71119671fe0a8cc4a/chrome/common/profiling/memlog_allocator_shim.cc

Attaching some samples taken from Android. As on macOS, sampling with threshold 1000 and 10000 both produce very comparable results.
memlog_nosample
2.7 MB View Download
memlog_sample_1000
2.6 MB View Download
memlog_sample_10000
2.6 MB View Download
Project Member

Comment 12 by bugdroid1@chromium.org, Feb 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/99bfce01f0446f15c97781f576b7d71bd737048f

commit 99bfce01f0446f15c97781f576b7d71bd737048f
Author: erikchen <erikchen@chromium.org>
Date: Wed Feb 14 20:29:22 2018

OOP HP: Update default sampling rate to 10,000 from 1,000.

10,000 has been observed to produce traces with comparably low noise, and has
better performance. See
https://bugs.chromium.org/p/chromium/issues/detail?id=810748#c7 for more
details.

Bug:  810748 
Change-Id: I1a552100b175a989a4ee2891b62bc184e8972710
Reviewed-on: https://chromium-review.googlesource.com/919184
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Albert J. Wong <ajwong@chromium.org>
Cr-Commit-Position: refs/heads/master@{#536802}
[modify] https://crrev.com/99bfce01f0446f15c97781f576b7d71bd737048f/chrome/browser/flag_descriptions.cc
[modify] https://crrev.com/99bfce01f0446f15c97781f576b7d71bd737048f/chrome/browser/profiling_host/profiling_process_host.cc

Status: Fixed (was: Assigned)
I ran performance numbers on Android.
[internal link: https://docs.google.com/spreadsheets/d/14sYqiD7D_oFozUIy2DqYr07GcAamre1M7dhCbTXI1a4/edit#gid=0]

There's a fixed overhead of spinning up a profiling process. If we measure against that as a baseline:
pseudo, no sampling: has ~14% overhead
pseudo, sampling rate 1000: has ~4% overhead
pseudo, sampling rate 10000: has ~1% overhead
native, sampling rate 10000: has ~4% overhead

These are running a local experiment, just loading wikipedia.org. 
Cc: ssid@chromium.org dskiba@chromium.org
+ssid / dskiba: I really suggest to read this bug!
Erik your work here is amazing (both analyzing data and implementing this)
Final follow up w.r.t. implementation of sampling.

The perfect implementation is as follows:
"""
def MaybeRecordAlloc(size):
  if size >= expected_sample_interval:
    RecordAlloc()
    return

  if size < rand(0, sample_interval):
    RecordAlloc()
"""

This doesn't rely on a Poisson process, and samples with Probability P = MIN(1, size / sample_interval). This however requires a call to rand() on every small alloc.

By introducing Poisson Process sampling, we reduce the number of calls to rand(), but introduce a small amount of sampling-noise, dependent on the underlying population being sampled. 

"""
def MaybeRecordAlloc(size):
  sample_interval -= size
  if sample_interval < 0:
    RecordAlloc()
    sample_interval += GetNextSampleInterval()
"""

Now let's say that we're trying to sample allocations with the same size as |expected_sample_interval|. The probability that GetNextSampleInterval() >= |expected_sample_interval| is 0.85, but we want this probability to be 1. That's why we have to use "sample_interval += GetNextSampleInterval()". We sample on every underflow, and this allows us to carry over underflow between samples. Over a large number of allocations, the probability that an interval will be sampled is close to ~0.996. 

However, this is still sampling from a uniform distribution. Consider the following case:

"""
expected_sample_interval = 100

def MakeOneSetOfAllocations():
  MaybeRecordAlloc(10000)
  for _ in range(100):
    MaybeRecordAlloc(1)

for _ in range (1000000):
  MakeOneSetOfAllocations()
"""

The probability that each 1-sized allocation is recorded ~0.86, even though we want it to be ~0.01, thus over-sampling by a factor of 86. The root problem is the underlying distribution is non-uniform.

So we arrive at the current implementation, which adds a conditional to get rid of huge outliers:

"""
def MaybeRecordAlloc(size):
  if size > expected_sample_interval:
    RecordAlloc()
    return

  sample_interval -= size
  if sample_interval < 0:
    RecordAlloc()
    sample_interval += GetNextSampleInterval()
"""

This still has sampling noise in non-uniform distributions. Consider:
"""
expected_sample_interval = 100

def MakeOneSetOfAllocations():
  MaybeRecordAlloc(99)
  MaybeRecordAlloc(1)

for _ in range (1000000):
  MakeOneSetOfAllocations()
"""

There is ~4% probability that GetNextSampleInterval() returns a value greater than (2 * 99 + 1), which means that MaybeRecordAlloc(1) will be sampled with probability >4%, even though we only want it to be sampled with probability 1%, resulting in 4x oversampling. 

This doesn't appear to cause large skew in Chrome samples though, so I haven't attempted to correct. In general, large non-uniformity in the population being poisson-process sampled causes sampling skew.  The larger the expected_sample_interval, the larger the potential skew.
Project Member

Comment 16 by bugdroid1@chromium.org, Feb 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4f8aac4cdeee00476e5d6accb35814ff2ef052ef

commit 4f8aac4cdeee00476e5d6accb35814ff2ef052ef
Author: erikchen <erikchen@chromium.org>
Date: Fri Feb 16 18:39:48 2018

Fix OOP HP implementation of poisson process sampling.

To correctly implement poisson process sampling, we occasionally have to take
multiple samples from a single allocation. Since the memlog stream format does
not support a "count" or "weight" parameter, use "sz" as a proxy.

Bug:  810748 
Change-Id: Ie87be75b0915afd5d032943f8a42cb5a190ce6e5
Reviewed-on: https://chromium-review.googlesource.com/922982
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#537361}
[modify] https://crrev.com/4f8aac4cdeee00476e5d6accb35814ff2ef052ef/chrome/common/profiling/memlog_allocator_shim.cc

Sign in to add a comment