Make a more comprehensive OOM metric |
|||
Issue descriptionToday we have multiple ways Chrome can OOM: - Android low memory killer - Malloc/new fails to allocate (ran out of space on the heap) - v8 can fail to allocate on v8 heap These three metrics are recorded in different places and some of them are not available in UMA at all. It would be better to have a good overview on how we're doing with all of these.
,
May 22 2018
,
May 22 2018
We had 367649 "base%OnNoMemory" crashes in M65 (65.0.3325.109): https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_Android%27+AND+EXISTS+%28SELECT+1+FROM+UNNEST%28CrashedStackTrace.StackFrame%29+WHERE+FunctionName+LIKE+%27base%25OnNoMemory%25%27%29+AND+product.Version%3D%2765.0.3325.109%27 However, there are two implementations of OnNoMemory in play here: 1. base::OnNoMemory from memory_linux.cc, which is set as new_handler, i.e. is called when "new" fails to allocate. Additionally, allocator_shim calls new_handler when any allocation function (e.g. malloc) returns NULL (see SetCallNewHandlerOnMallocFailure). These crashes are aggregated under "[Assert] base::`anonymous namespace'::OnNoMemorySize" magic signature (21.80%, 80149 crashes). 2. base::OnNoMemory from memory.cc, which is called from base::TerminateBecauseOutOfMemory(). There are several places that call TerminateBecauseOutOfMemory(), but the top one is a call by ClientDiscardableSharedMemoryManager::AllocateLockedDiscardableSharedMemory(). Such crashes are aggregated under "[Out of Memory] discardable_memory::ClientDiscardableSharedMemoryManager::AllocateLockedDiscardableSharedMemory" (76.87%, 282602 crashes).
,
May 22 2018
It might be hard to record Uma for onNoMemory kills since the renderer generates a minidump for the crashes. The browser only uploads the dumps. We might have to parse the minidump in browser to find if it's no memory crash, which might be heavy due to compression. The other way is to get it from logcat if the assertion printed was no memory, which could be heavy again. I think the solution could be server side by combining Uma and crash database maybe? Also Uma is sampled and crashes are not. So there might be more work to get the right numbers
,
May 22 2018
I think we don't need to parse (as in walk the stack, etc.) crash reports, we just need to have "oom_crash" bit in the report, and act on that bit. And change all "crash on out of memory" functions set that bit.
,
Jun 11 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by mariakho...@chromium.org
, May 11 2018Status: Assigned (was: Available)