Issue metadata
Sign in to add a comment
|
OOM Accountability: Exposing subsystems that cause OOMs, vs final allocation. |
||||||||||||||||||||||
Issue descriptionFigure out a way to expose / discovery / increase discoverability of OOM causes, both in the renderer and the browser. Which subsystem is using memory? The most memory? Stacktraces can often be misleading because the final allocation isn't always for the subsystem that is using the most memory. Action item from IDB OOM postmortem: https://docs.google.com/document/d/1PphqDNEMLueeTtK5OZDN4HtnQ_hrRHEEQXmyr-C8ST0/edit
,
May 3 2017
,
May 4 2017
This is non-trivial and solving this in the general case requires attribution of memory usage. We have a subset of this functionality with MemoryDumpProviders - perhaps we can pull stats from those into crash dumps. [As a vague first idea, the memory-infra service pushes Crashpad with updated stats every time it gets them].
,
May 5 2017
This bug is a mix of something we have and something else we should build. What we have / are building: ---------------------------- We already have many probes in memory-infra [1] which are able to expose the details of the major chrome subsystems. We also have a mechanism [2] that collects this data from a subset of the population (canary+dev) when we hit low memory situations (which is != OOM though) We are building a larger scale mechanism (Issue 680200) to have this information into UMA. This is good but doesn't cover: 1. Non-instrumented code, that would be generically attributed to "malloc". 2. Cases where the leak is so aggressive that causes nearly immediate OOMs. How we could build to tackle these cases: ----------------------------------------- On a large scale, what erikchen@ said. Although, as he says, that is not trivial and requires lot of other work to happen. Given the priorities, the other problems, and the limited number of eng resources this is unlikely to happen before ~3 quarters. On a smaller scale (i.e. local repros): I think for cases like this what we want here is a combination of --enable-heap-profiling AND --trace-startup-file (both of them already exist) The thing that is missing is a way to tell tracing "if you hit X GB limit, create a memory snapshot and serialize the trace". This sort of feature is indirectly already planned as part of Issue 607533 (peak detection). So once that happens, adding the "and stop tracing" part will be trivial. [1] https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra [2] go/slow-memory-reports
,
May 5 2017
Interesting- this makes sense. Can we help you out by trying to expose leveldb's memory allocation better?
,
Sep 6 2017
Victor, you mentioned there was work on exposing leveldb's memory allocation to memory profiling, do you remember who the lead was on that project?
,
Sep 6 2017
You're probably interested in talking to cmumford@ and dskiba@. Useful links: https://crbug.com/711518 https://crrev.com/2855953002 https://crbug.com/750751
,
May 21 2018
Bulk edit** This bug has the label Postmortem-Followup but has not been updated in 3+ weeks. We are working on a new workflow to improve postmortem followthrough. Postmortems and postmortem bugs are very important in making sure we don't repeat prior mistakes and for making Chrome better for all. We will be taking a closer look at these bugs in the coming weeks. Please take some time to work on this, reassign, or close if the issue has been fixed. Thank you. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by dmu...@chromium.org
, Apr 18 2017