OOP HP: Add a parameter to enabling sampling allocations. |
|||
Issue descriptionAdd a mode to OOP HP to support alph's poisson sampling algorithm. Simply, probability of taking a sample of size N is (N / sample_size). This has the following effects: 1) Allocations >> sample_size are always sampled. 2) Allocations << sample size with low frequency [e.g. frequency * N << sample_size] will be lost in the noise. 3) Allocations << sample size with high frequency [e.g. frequency * N >> sample_size] will have std dev = frequency * sqrt(N). Basically, * small, infrequent allocations will have high noise. These are precisely the allocations we don't find interesting for heap profiling from the wild. * Everything else will have fairly high accuracy
,
Feb 9 2018
silly question: is the sampling done at the client side (where the allocation is intercepted) or at the service side (after the recv()) ? If the latter, isn't your overhead dominated by the backtrace + socket anyways?
,
Feb 9 2018
client side. I already have a CL that does this, with tests, etc. https://chromium-review.googlesource.com/c/chromium/src/+/911891 Currently debugging one last thing...
,
Feb 12 2018
I wanted to leave a record of some technical details that don't need to live in the C++ implementation of desampling.
From json_exporter.cc:
"""
if (alloc_size < params->sampling_rate && alloc_size != 0) {
// To desample, we need to know the probability P that an allocation will
// be sampled. Once we know P, we still have to deal with discretization.
// Let's say that there's 1 allocation with P=0.85. Should we report 1 or
// 2 allocations? Should we report a fudged size (size / 0.85), or a
// discreted size, e.g. (1 * size) or (2 * size)? There are tradeoffs.
//
// We choose to emit a fudged size, which will return a more accurate
// total allocation size, but less accurate per-allocation size.
//
// The probability that an allocation will be sampled is
// alloc_size / sampling_rate.
"""
There's a subtlety though, what happens if alloc_size >= sampling_rate?
Let R be sampling rate, X be size of allocation. Let Y=(e^(-X/R))
Then P = (1 - Y) - X/R * integral(1/ln(z), {z, 0, Y})
When X=R, then P = 0.85. When X = 2 * R, P = 0.95. Naively, we'd think that we have to desample here. However...in the implementation, when sample_interval < 0, we do sample_interval += GetNext(). This actually corrects for this desampling factor by allowing us to sometimes record *multiple* allocations per sampling interval.
In turns out that over a large number of allocations with alloc_size > sampling_rate, this ends up being perfectly desampled [which is not surprising, considering the poisson process guarantees that the mean of the sum of a large number of GetNextSampleInterval() should equal the mean of a large number of sampled allocations].
I've verified this with an independent python script that models the sampling/desampling process.
,
Feb 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae commit 8bc20d8233df8daf60ae5fa4f5588c71bdce7fae Author: erikchen <erikchen@chromium.org> Date: Wed Feb 14 03:21:51 2018 Add a sampling mode to out of process heap profiling. alph@ demonstrated that using a Poisson process sampling algorithm provides highly actionable data with low overhead. This CL adds a flag to do that for OOP HP. The sampling algorithm guarantees that large allocations, and frequent, small allocations are recorded with high fidelity. Infrequent, small allocations will have noise, but those are precisely the allocations we're not interested in. The new flags are --memlog-sampling and --memlog-sampling-rate=<XYZ>. The former flag is available in about://flags with a default sampling rate of 1000. Bug: 810748 Change-Id: Id88186bdd0c17287a05b44761f6c4d9d9fd2f3cf Reviewed-on: https://chromium-review.googlesource.com/911891 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Alexei Filippov <alph@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Ted Choc <tedchoc@chromium.org> Reviewed-by: Primiano Tucci <primiano@chromium.org> Cr-Commit-Position: refs/heads/master@{#536635} [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/android/javatests/src/org/chromium/chrome/browser/profiling_host/ProfilingProcessHostAndroidTest.java [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/android/javatests/src/org/chromium/chrome/browser/profiling_host/TestAndroidShim.java [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/about_flags.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/flag_descriptions.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/flag_descriptions.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/chrome_browser_main_extra_parts_profiling.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/memlog_browsertest.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_process_host.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_process_host.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_test_driver.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/profiling_test_driver.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/test_android_shim.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/browser/profiling_host/test_android_shim.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/chrome_switches.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/chrome_switches.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/profiling/memlog_allocator_shim.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/common/profiling/profiling_client.mojom [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/json_exporter_unittest.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/memlog_connection_manager.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/memlog_connection_manager.h [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/chrome/profiling/profiling_service.cc [modify] https://crrev.com/8bc20d8233df8daf60ae5fa4f5588c71bdce7fae/tools/metrics/histograms/enums.xml
,
Feb 14 2018
I implemented the sampling algorithm used by both alph's NHP and v8's heap sampler. I've attached 2 chrome traces, with sampling interval 100, and no sampling. Notice that the allocations don't line up. Specifically, some allocations < sample_interval are over sampled. I've determined that the current sampling algorithm has a bug. Luckily, there's an easy fix.
Current algorithm pseudocode:
"""
def GetNextSampleInterval():
...
sample_interval = GetNextSampleInterval()
def MaybeRecordAlloc(size):
sample_interval -= size
if sample_interval < 0:
RecordAlloc()
sample_interval += GetNextSampleInterval()
"""
The interesting line is:
"""
sample_interval += GetNextSampleInterval()
"""
The += is necessary in order to deal with size > expected_sample_interval. Since P is always less than 1, we need to compensate for large allocations by sometimes taking multiple samples per interval. This however, relies on the assumption that the distribution being sampled is uniform, which it isn't. Imagine the following scenario.
expected_sample_interval = 100
MaybeRecordAlloc(10000)
MaybeRecordAlloc(1)
MaybeRecordAlloc(1)
... <-- 100 total calls to MaybeRecordAlloc
MaybeRecordAlloc(1)
We would expect 1 of the 100 calls to MaybeRecordAlloc(1) to be sampled, but instead, they're all sampled! So if you have a system that sometimes makes really big allocations [e.g. large container resizes], followed by a bunch of small allocations [moving/adding elements into the resized container], this algorithm pathologically overcounts small allocations.
The correct algorithm should be:
"""
sample_interval = GetNextSampleInterval()
expected_sample_interval = <XYZ>
def MaybeRecordAlloc(size):
if size >= expected_sample_interval:
RecordAlloc()
return
sample_interval -= size
if sample_interval < 0:
RecordAlloc()
sample_interval = GetNextSampleInterval()
"""
I've attached a trace with the correct algorithm, which demonstrates the correct behavior.
,
Feb 14 2018
I've attached traces using sample intervals from 1 to 10^6, increasing by factors of 10. There's virtually no noise up to 10,000. There is ~10% noise starting from 100,000.
,
Feb 14 2018
Correction*, the correct algorithm should be:
"""
sample_interval = GetNextSampleInterval()
expected_sample_interval = <XYZ>
def MaybeRecordAlloc(size):
if size >= expected_sample_interval:
RecordAlloc()
return
sample_interval -= size
if sample_interval < 0:
RecordAlloc()
sample_interval += GetNextSampleInterval()
"""
We should still use += on sample_interval, otherwise we will undercount samples slightly less than expected_sample_interval. The actual problem was sample_interval -= size, which would cause undesirable behavior when size >> expected_sample_interval.
,
Feb 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/880847569b9398c91655fc4617c9b3753e5d47e9 commit 880847569b9398c91655fc4617c9b3753e5d47e9 Author: erikchen <erikchen@chromium.org> Date: Wed Feb 14 19:23:21 2018 Add sampling rate as a parameter to uploaded heap dumps. Bug: 810748 Change-Id: Ida3b064c694eccd248b37bf3ade9dcbc6978d890 Reviewed-on: https://chromium-review.googlesource.com/919462 Reviewed-by: Albert J. Wong <ajwong@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#536773} [modify] https://crrev.com/880847569b9398c91655fc4617c9b3753e5d47e9/chrome/browser/profiling_host/profiling_process_host.cc
,
Feb 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/515f341aec52437dc077b2c71119671fe0a8cc4a commit 515f341aec52437dc077b2c71119671fe0a8cc4a Author: erikchen <erikchen@chromium.org> Date: Wed Feb 14 19:48:42 2018 Fix a bug in sampling algorithm for OOP HP. The previous algorithm would overcount small allocations that immediately follow a very large allocation. See https://bugs.chromium.org/p/chromium/issues/detail?id=810748#c6 for more details. The new algorithm samples small allocations as before, but now always records large allocations. Bug: 810748 Change-Id: Ic6f23db49825a07e9e7320addd9ff1d465d0a967 Reviewed-on: https://chromium-review.googlesource.com/919461 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Albert J. Wong <ajwong@chromium.org> Cr-Commit-Position: refs/heads/master@{#536781} [modify] https://crrev.com/515f341aec52437dc077b2c71119671fe0a8cc4a/chrome/common/profiling/memlog_allocator_shim.cc
,
Feb 14 2018
Attaching some samples taken from Android. As on macOS, sampling with threshold 1000 and 10000 both produce very comparable results.
,
Feb 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/99bfce01f0446f15c97781f576b7d71bd737048f commit 99bfce01f0446f15c97781f576b7d71bd737048f Author: erikchen <erikchen@chromium.org> Date: Wed Feb 14 20:29:22 2018 OOP HP: Update default sampling rate to 10,000 from 1,000. 10,000 has been observed to produce traces with comparably low noise, and has better performance. See https://bugs.chromium.org/p/chromium/issues/detail?id=810748#c7 for more details. Bug: 810748 Change-Id: I1a552100b175a989a4ee2891b62bc184e8972710 Reviewed-on: https://chromium-review.googlesource.com/919184 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Albert J. Wong <ajwong@chromium.org> Cr-Commit-Position: refs/heads/master@{#536802} [modify] https://crrev.com/99bfce01f0446f15c97781f576b7d71bd737048f/chrome/browser/flag_descriptions.cc [modify] https://crrev.com/99bfce01f0446f15c97781f576b7d71bd737048f/chrome/browser/profiling_host/profiling_process_host.cc
,
Feb 14 2018
I ran performance numbers on Android. [internal link: https://docs.google.com/spreadsheets/d/14sYqiD7D_oFozUIy2DqYr07GcAamre1M7dhCbTXI1a4/edit#gid=0] There's a fixed overhead of spinning up a profiling process. If we measure against that as a baseline: pseudo, no sampling: has ~14% overhead pseudo, sampling rate 1000: has ~4% overhead pseudo, sampling rate 10000: has ~1% overhead native, sampling rate 10000: has ~4% overhead These are running a local experiment, just loading wikipedia.org.
,
Feb 15 2018
+ssid / dskiba: I really suggest to read this bug! Erik your work here is amazing (both analyzing data and implementing this)
,
Feb 15 2018
Final follow up w.r.t. implementation of sampling.
The perfect implementation is as follows:
"""
def MaybeRecordAlloc(size):
if size >= expected_sample_interval:
RecordAlloc()
return
if size < rand(0, sample_interval):
RecordAlloc()
"""
This doesn't rely on a Poisson process, and samples with Probability P = MIN(1, size / sample_interval). This however requires a call to rand() on every small alloc.
By introducing Poisson Process sampling, we reduce the number of calls to rand(), but introduce a small amount of sampling-noise, dependent on the underlying population being sampled.
"""
def MaybeRecordAlloc(size):
sample_interval -= size
if sample_interval < 0:
RecordAlloc()
sample_interval += GetNextSampleInterval()
"""
Now let's say that we're trying to sample allocations with the same size as |expected_sample_interval|. The probability that GetNextSampleInterval() >= |expected_sample_interval| is 0.85, but we want this probability to be 1. That's why we have to use "sample_interval += GetNextSampleInterval()". We sample on every underflow, and this allows us to carry over underflow between samples. Over a large number of allocations, the probability that an interval will be sampled is close to ~0.996.
However, this is still sampling from a uniform distribution. Consider the following case:
"""
expected_sample_interval = 100
def MakeOneSetOfAllocations():
MaybeRecordAlloc(10000)
for _ in range(100):
MaybeRecordAlloc(1)
for _ in range (1000000):
MakeOneSetOfAllocations()
"""
The probability that each 1-sized allocation is recorded ~0.86, even though we want it to be ~0.01, thus over-sampling by a factor of 86. The root problem is the underlying distribution is non-uniform.
So we arrive at the current implementation, which adds a conditional to get rid of huge outliers:
"""
def MaybeRecordAlloc(size):
if size > expected_sample_interval:
RecordAlloc()
return
sample_interval -= size
if sample_interval < 0:
RecordAlloc()
sample_interval += GetNextSampleInterval()
"""
This still has sampling noise in non-uniform distributions. Consider:
"""
expected_sample_interval = 100
def MakeOneSetOfAllocations():
MaybeRecordAlloc(99)
MaybeRecordAlloc(1)
for _ in range (1000000):
MakeOneSetOfAllocations()
"""
There is ~4% probability that GetNextSampleInterval() returns a value greater than (2 * 99 + 1), which means that MaybeRecordAlloc(1) will be sampled with probability >4%, even though we only want it to be sampled with probability 1%, resulting in 4x oversampling.
This doesn't appear to cause large skew in Chrome samples though, so I haven't attempted to correct. In general, large non-uniformity in the population being poisson-process sampled causes sampling skew. The larger the expected_sample_interval, the larger the potential skew.
,
Feb 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4f8aac4cdeee00476e5d6accb35814ff2ef052ef commit 4f8aac4cdeee00476e5d6accb35814ff2ef052ef Author: erikchen <erikchen@chromium.org> Date: Fri Feb 16 18:39:48 2018 Fix OOP HP implementation of poisson process sampling. To correctly implement poisson process sampling, we occasionally have to take multiple samples from a single allocation. Since the memlog stream format does not support a "count" or "weight" parameter, use "sz" as a proxy. Bug: 810748 Change-Id: Ie87be75b0915afd5d032943f8a42cb5a190ce6e5 Reviewed-on: https://chromium-review.googlesource.com/922982 Reviewed-by: Primiano Tucci <primiano@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#537361} [modify] https://crrev.com/4f8aac4cdeee00476e5d6accb35814ff2ef052ef/chrome/common/profiling/memlog_allocator_shim.cc |
|||
►
Sign in to add a comment |
|||
Comment 1 by bugdroid1@chromium.org
, Feb 9 2018