MemoryInfra: Malloc allocated_objects should report only allocations made by Chrome |
|||
Issue descriptionBackground context: go/memory-infra In Android malloc allocated objects size is taken from mallinfo. But this number includes allocations made by zygote and caches used by malloc. This does not give us a good estimate of Chrome's memory usage. Shown in doc: https://docs.google.com/document/d/1oZRA0PEmTOkXYKR3JfL5YkCPumGE57C50DbDp2KcoRE I am thinking we should implement something here similar to heap profiling, so that we can extent this in future to get categories of allocations too.
,
Jan 13 2017
So, I am trying to just count all allocations made from Chrome by using an extra Shim layer, that just updates a counter instead of registering allocations. I tried 2 ways to implement this counter. 1: Just increment atomic counter using NoBarrier_AtomicIncrement. 2: Have a thread local counter (attached to AllocationContextTracker) that counts allocations in each thread. Malloc dump provider would iterate all threads and count all allocations at dump point. Idea is that accessing tls slot will be faster than atomic counter. I ran a small benchmark which has 50 threads and each thread updates the counter 1000000 times and quits. The 1st algorithm runs 5-10 times slower than the second on Linux. I haven't tested android, but I think thread local should be faster. Looking at the actual values: it takes <200ms to access thread local and increment 50 * 1000000 allocations. We know that from heap profiler experiments that Chrome can reach upto 100k allocations per second. So, ideally the overhead should be 0.4ms per sec (0.04%). I will run some Chrome benchmarks to verify there is no performance impact. If there is nothing, I think we can enable this on production. Note: I might be missing something. Please correct if I am wrong. Re: #3 I think we should have another bug to analyze and reduce the memory not allocated by Chrome.
,
Jan 13 2017
Since we do not know the performance of malloc_usable_size in all devices, we cannot have a counter based on malloc_usable_size at each allocation and free. The implementation of jemalloc and dlmalloc usable_size seems to be fine. dlmalloc_usable_size just reads the metadata at each allocation. So, it does mangle of the address and reference the memory to get the size. jemalloc_usable_size does a tree search for the allocation to find the usable size. I think we should try enabling in the field and turn it off if it affects performance. We are going to enable only for 1% of Dev and Canary. It should be acceptable to test it to find performance impact, after finding no big impact in local devices? Waiting for heap profiler in field might take a bit of time.
,
Jan 25 2017
,
May 18 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by dskiba@chromium.org
, Jan 12 2017