MemoryInfra: suspiciously low malloc memory reported by the heap profiler on Android |
||||||||||
Issue descriptionBackground context:go/memory-infra: memory profiling in chrome://tracing From an offline conversation with Maria: ----- I am playing with startup tracing for Android using the instructions here. I turned on heap profiling and tried to collect an early trace, but I am seeing a lot fewer objects in the heap profile than being reported total (e.g. 4MB of heap objects vs 13MB total malloc stuff). I was wondering why we have this discrepancy. Attaching a trace I was using for you to see. ----- Hmm, this is weird. I have two suspects here: 1) the malloc_dump_provider is registered too late on android, so we miss most of the initial allocations. 2) something is going wrong with the shim. kraynov can you take a look at this and, specifically test: 1. put some big malloc leaks in various pieces of the codebase for the browser process (e.g., net, content/browser) and see if they show up in the heap profiler. 2. Like above, but do some big C++ new() 3. Like above, but use a std::vector<char> and .reserve() a big amount. And tell us what you find :)
,
Apr 27 2016
Hmm I'd be interested in #1. It *should* work because the -wrap,malloc trick is at the linker level. Even if libstc++ is prebuilt, the magic happens when we link it. It should be easy to test. Just create a std::vector, add a TRACE_EVENT0, resize the vector to 42 MB and see if you see in the heap profiler.
,
Apr 27 2016
I'm doing some tests in this area, will report today.
,
Apr 27 2016
Yeah, so libc++ is not a problem, it's .cpp code is fairly small, and mostly uses operator new, which is weak. But, it looks like that Android runtime also uses malloc (MILD SHOCK): Breakpoint 4, 0xf749c25a in malloc () from /tmp/dskiba-adb-gdb-libs/system/lib/libc.so (gdb) bt #0 0xf749c25a in malloc () from /tmp/dskiba-adb-gdb-libs/system/lib/libc.so #1 0xf76ea8b4 in android::SharedBuffer::alloc(unsigned int) () from /tmp/dskiba-adb-gdb-libs/system/lib/libutils.so #2 0xf76ed068 in android::VectorImpl::_shrink(unsigned int, unsigned int) () from /tmp/dskiba-adb-gdb-libs/system/lib/libutils.so #3 0xf76eec3c in android::Looper::pollInner(int) () from /tmp/dskiba-adb-gdb-libs/system/lib/libutils.so #4 0xf76eeed6 in android::Looper::pollOnce(int, int*, int*, void**) () from /tmp/dskiba-adb-gdb-libs/system/lib/libutils.so #5 0xf7636630 in android::NativeMessageQueue::pollOnce(_JNIEnv*, _jobject*, int) () from /tmp/dskiba-adb-gdb-libs/system/lib/libandroid_runtime.so #6 0x71a4e56e in ?? () So when we ask for "number of allocated bytes", we also get Android's stuff.
,
Apr 27 2016
So I added "total size of all allocations" to AllocationRegister, and reported it in malloc dump provider. Then I took ~30 seconds trace (Linux, CNN.com). The results: Browser first: allocated_objects 23.8 MiB shim_registered_size 11.0 MiB DIFF = 12.8 Browser last: allocated_objects 70.2 MiB shim_registered_size 57.0 MiB DIFF = 13.2 Renderer CNN.com first: allocated_objects 17.8 MiB shim_registered_size 16.7 MiB DIFF = 1.1 Renderer CNN.com last: allocated_objects 77.1 MiB shim_registered_size 75.7 MiB DIFF = 1.4 So even on Linux, where we use TCMalloc, we have a difference.
,
Apr 27 2016
Trace: https://drive.google.com/file/d/0B_Hmi138MnbJcXl1Qkc0dTcyaUE/view?usp=sharing Diff is attached, look for shim_registered_size value for malloc.
,
Apr 27 2016
,
Apr 28 2016
On Android situation is similar:
Browser first:
allocated_objects 19.3 MiB
shim_registered_size 3.3 MiB
DIFF: 16
Browser last:
allocated_objects 29.8 MiB
shim_registered_size 13.5 MiB
DIFF: 16.3
Renderer first:
allocated_objects 5.6 MiB
shim_registered_size 1.3 MiB
DIFF: 4.3
Renderer last:
allocated_objects 56.1 MiB
shim_registered_size 49.3 MiB
DIFF: 6.8
So we can see that Android (+ potentially something else) is responsible for ~16 MiB in the browser process. However, not all of those 16 MiB are committed. Here is a meminfo dump from "Hello World" Android app from Maria:
** MEMINFO in pid 30622 [com.example.mariakhomenko.testapp] **
Pss Private Private Swapped Heap Heap Heap
Total Dirty Clean Dirty Size Alloc Free
------ ------ ------ ------ ------ ------ ------
Native Heap 3992 3600 0 0 19968 17886 2081
Dalvik Heap 4286 4088 0 0 24192 20310 3882
Dalvik Other 682 540 0 0
Stack 180 180 0 0
Ashmem 4 4 0 0
Gfx dev 572 572 0 0
Other dev 4 0 4 0
.so mmap 1315 136 64 0
.apk mmap 218 0 36 0
.ttf mmap 85 0 72 0
.dex mmap 2968 76 2892 0
.oat mmap 1075 0 300 0
.art mmap 893 656 24 0
Other mmap 9 8 0 0
EGL mtrack 41280 41280 0 0
Unknown 188 188 0 0
TOTAL 57751 51328 3392 0 44160 38196 5963
Native Heap / Heap Size, Heap Alloc and Heap Free are usmblks, uordblks and fordblks from mallinfo() (see android_os_Debug.cpp). So even simple Android app allocates ~16 MiB, but only commits 3600 of them - Native Heap / Private Dirty is calculated as PrivateDirty of regions from /proc/smaps.
,
Apr 28 2016
I did another experiment to see how much memory is allocated before MallocDumpProvider installs its shim. I calculated total allocated bytes at shim level (ShimXXX functions), and reported that as "shim_allocated_size". Difference between this value and "shim_registered_size", which comes from AllocationRegister, tells us how much we missed. See patch attached. Numbers from CNN.com case (load, scroll bottom/top): === Linux === https://drive.google.com/file/d/0B_Hmi138MnbJc1NlR0NaNHVkUGM/view?usp=sharing Browser / first dump: shim_allocated_size: 24.5 MiB shim_registered_size: 16.3 MiB DIFF: 8.2 Browser / last dump: shim_allocated_size: 54.5 MiB shim_registered_size: 46.7 MiB DIFF: 7.8 Renderer (CNN.com) / first dump: shim_allocated_size: 5.8 MiB shim_registered_size: 5.5 MiB DIFF: 0.3 Renderer (CNN.com) / last dump: shim_allocated_size: 45.1 MiB shim_registered_size: 46.4 MiB DIFF: -0.6 === Android === https://drive.google.com/file/d/0B_Hmi138MnbJUXB5aE5IQkV6NTA/view?usp=sharing Browser / first dump: shim_allocated_size: 4.0 MiB shim_registered_size: 3.8 MiB DIFF: 0.2 Browser / last dump: shim_allocated_size: 16.6 MiB shim_registered_size: 16.6 MiB DIFF: 0 Renderer / first dump: shim_allocated_size: 1.5 MiB shim_registered_size: 1.3 MiB DIFF: 0.2 Renderer / last dump: shim_allocated_size: 63.1 MiB shim_registered_size: 63.9 MiB DIFF: -0.8 So Android is fine, while on Linux we miss ~8 MiB. Also, negative DIFFs are concerning. Either this is a bug in the way I calculate numbers, or something is not removed properly from AllocationRegister.
,
Apr 29 2016
Ok this bug is becoming hard to follow. There are different findings here Let me try to make a point of the situation. 1. mallinfo() inherited by the Android Zygote pre-fork() -------------------------------------------------------- Maria experiments suggest that on Android we start with a non-zero malloc() size. Thinking more, this is perfectly WAI. The Android Zygote does all sort of allocations before it fork()-s to specialize into the various processes. Action item: there is nothing we can do here, also these allocations are not really using memory, as very likely they happened pre-fork(). I think the only think we should do is sample the value of mallinfo() at startup on Android, and expose/subtract that in memory-infra as "pre-fork() Zygote" or similar. 2. Android itself uses malloc ----------------------------- Yes, agree, mild shock. I don't think there is anything we can do here. We'll have to live with that. Also that should be hopefully a minor conributions. 3. Difference on beween mallinfo() and heap profiler hooks ---------------------------------------------------------- Here I am very confused about the various experiments. Can you summarize them so we can analyze them case by case? Some things to keep in mind: - Not sure if tcmalloc mallinfo() takes into account fragmentation. That might explain some of the difference. - Remember that on Linux Desktop the renderers are forked form the chrome zygote. we might have some similar effects like in the android zygote case.
,
Apr 29 2016
So let me summarize the situation on Android: === Android framework allocations inflate 'malloc' values in traces. === Android uses malloc() to allocate, and that affects mallinfo() results. This might be expected, but nevertheless makes it hard to reason about 'malloc' values in traces. Simple 'Hello World' Android app allocates 16 MiB, and shim only registers ~3 MiB, which makes malloc values reported for the browser process almost meaningless, since most of that comes from Android, not Chrome. This is what my first experiment measured - I added shim_registered_size value which shows amount of memory AllocationRegister knows about, i.e. that was allocated by Chrome. The difference between shim_registered_size and allocated_objects is memory that either allocated by Android, or by Chrome before AllocationRegister is plugged in. Results: 16 MiB for the browser process, 4-6 MiB for the renderer. Note the difference - it means that most of those 16 MiB were not inherited from Android's zygote. OK, so we know that simple "Hello World" allocates 16 MiB, how to we estimate size of memory allocated by Chrome before AllocationRegister is turned on? By measuring total allocation size at the shim level. I.e. malloc calls ShimAlloc, which either goes directly to system malloc, or goes through AllocationRegister and then to system malloc. So allocated size measured at ShimXXX functions reflects total size allocated by Chrome from the start, while value reported by AllocationRegister reflects only amount allocated since AllocationRegister was turned on. This is what my second experiment measured - I added shim_allocated_size that shows amount of memory allocated through ShimXXX interface. Difference between shim_allocated_size and shim_registered_size tells us how much Chrome allocates before AllocationRegister plugs in. Results: 0.2 MiB for the browser and 0 for the renderer. Good, this means that on Android AllocationRegister knows about almost all allocations (compare to Linux where diff is 8 MiB). >>> ACTION ITEM: Think of ways to separate 'malloc' value into 'Android' and 'Chrome' parts. Or maybe report only 'Chrome' part of it in the trace. (Note that there is an interesting opportunity to reorganize Java code so that it allocates less native memory from malloc.) === No correlation between 'malloc' value and private dirty. === In memory reduction efforts we mostly care about reducing 'Private Dirty'. This is what everyone is looking at when comparing memory usages. However, we only have approximate idea where 'Private Dirty' comes from. We saw that "Hello World" Android app allocates 16 MiB, but reports only 3.6 MiB as dirty. So when in a Chrome trace we see that 'malloc' is 22.2 MiB and 'Private Dirty' is 36.5 MiB we can't expect that reducing 'malloc' will proportionally affect 'Private Dirty' (or affect at all). And of course it doesn't help that some of 'Private Dirty' comes from Android framework. >>> ACTION ITEM: Think of ways to extract 'malloc' part from 'Private Dirty', so that we know how much malloc contributes. Overall, the idea is to make things that are reported in a trace actionable. Or separate them into 'actionable' and 'FYI' parts. (I think I'll post this to Project TRIM so that other guys can chime in.)
,
Apr 29 2016
> 3. Difference on beween mallinfo() and heap profiler hooks > ---------------------------------------------------------- > Here I am very confused about the various experiments. Can you summarize them so we can analyze them case by case? > Some things to keep in mind: > - Not sure if tcmalloc mallinfo() takes into account fragmentation. That might explain some of the difference. > - Remember that on Linux Desktop the renderers are forked form the chrome zygote. we might have some similar effects like in the android zygote case. So, experiment 1 (shim_registered_size) shows that on Linux the amount AllocationRegister knows about is ~12 MiB less than what TCMalloc reports. Why is that? Two possibilities: 1. Something allocates memory directly (not using shim). 2. Memory is allocated using shim, but before AllocationRegister is plugged in. I verified #1 manually by creating shim paths in TCMalloc and breaking on non-shim paths. I've found that we have UncheckedMalloc() thing, which avoids shime by calling system allocation functions directly (__libc_malloc, tc_malloc_skip_new_handler, etc). But it doesn't seem to be used much. Experiment 2 confirms #2 - I added shim_allocated_size, which measures memory at ShimXXX functions, and there is a difference of 8 MiB between that and shim_registered_size from AllocationRegister. However, something is wrong because difference is changing over time and eventually becomes negative. I didn't investigate it much, but the code (see patch in #9) seems fine. >>> ACTION ITEM: Continue investigation, verify differently - by snapshotting allocated_size from TCMalloc at the time we plug AllocationRegister in - that will tell us how much was allocated before AllocationRegister, but won't track deallocations of that memory.
,
Apr 29 2016
Grace was curious about this.
,
Apr 29 2016
Re #11: Any memory malloc'd and written to after fork+specialize from zygote will show up as private dirty, and reducing mallocs will reduce private dirty. The Native Heap private dirty number is already counting only memory from malloc - malloc in libc applies an "[anon:libc_malloc]" label to memory mapped for the native heap, and then the private dirty numbers are extracted from /proc/pid/smaps mappings with that label.
,
Apr 29 2016
https://source.android.com/devices/tech/debug/native-memory.html has some slightly-out-of-date debugging information for native heap memory use. The ddms native=true part still applies, but to enable malloc debug you now use: adb root adb shell setprop libc.debug.malloc.program app_process64 adb shell 'setprop libc.debug.malloc.options "backtrace=8"' adb shell stop adb shell start ddms should then let you capture a native heap dump, and can subtract out zygote usage and show you stack traces of calls to malloc and how much was allocated, and break it down by library.
,
Apr 29 2016
,
May 3 2016
@primiano: Where is the allocator waste accounted for? For instance, memory held into the various thread-local caches of the allocator, or the free space in the arenas. This would show up as private dirty, unless it's de-allocated (using madvise(), which I *think* jemalloc() does on Android).
,
May 3 2016
,
May 10 2016
Some of my experiments above relied on a idea that all allocations go through ShimXXX functions. Well, as I found base::UncheckedMalloc() doesn't go through shim, and at least on Android it can allocate quite a lot, see attached screenshot. This is just FYI, I don't think we need to do anything about it.
,
May 11 2016
Uh what feels a bug, why is that?
From what I see in the code:
bool UncheckedMalloc(size_t size, void** result) {
#if BUILDFLAG(USE_EXPERIMENTAL_ALLOCATOR_SHIM)
*result = allocator::UncheckedAlloc(size);
And, in allocator_shim.cc
void* UncheckedAlloc(size_t size) {
const allocator::AllocatorDispatch* const chain_head = GetChainHead();
return chain_head->alloc_function(chain_head, size);
}
UncheckedMalloc should go trough the shim :/
,
May 24 2016
,
May 25 2016
kraynov can you take a look at this and see whether UncheckedMalloc is intercepted or not (and if not, why)?
,
Jul 26 2016
To understand which allocations Chrome's heap profiler misses, I used Android's heap profiler to capture all allocations. Chrome's heap profiler was enabled too. Here are all the allocations that originated from libchrome.so when starting to the NTP: https://docs.google.com/a/google.com/spreadsheets/d/1djmF53A5IKVmL0BRejbvu727838YG5NUNib6oMxgNvY/edit?usp=sharing As you can see Chrome's heap profiler missed the following: 1.5 MiB ShimCppNew 222 KiB ShimMalloc 40 KiB ShimRealloc I.e. ~1.7 MiB of allocations. All other allocations originated from HookXXX functions, i.e. were intercepted and recorded in Chrome's heap profiler. It looks like we're enabling heap profiler too late, and missing some startup stuff.
,
Aug 2 2016
It turned out that startup tracing causes heap profiler to start early and catch most allocations: 3.7 MiB HookAlloc 136 KiB HookllocAligned 49 KiB HookRealloc 26 KiB ShimCppNew I.e. previously uncaught 1.7 MiB were captured.
,
Jan 4 2017
,
Jan 4 2017
,
Jul 12 2017
,
Jul 12 2017
This bug is pretty old. What is the plan here?
,
Jul 13 2017
Just to check that UncheckedMalloc is accounted correctly. Probably will be just closed.
,
Jul 19 2017
Obsolete. UncheckedMalloc is accounted correctly. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by mariakho...@chromium.org
, Apr 25 2016