crash reports should report device physical memory |
||||||
Issue descriptionThis emerged during the TRIM meeting. It would be nice to have base::sys_info::AmountOfPhysicalMemory() plumbed into crash reports. This would allow to better triage crashes (especially OOM or OOM-related) on all platforms. For instance haraken@ suspects that some oilpan hangs are due to thrashing, but has no data to correlate. I think there are two options here: 1) Setting a crash key on startup 2) Using the extra mime fields in components/crash/content/app/breakpad_linux.cc https://chromium.googlesource.com/chromium/src/+blame/master/components/crash/content/app/breakpad_linux.cc#1580 I think my question boils down to: which of the two options will work on both breakpad and crashpad? Do crash keys work with both (I'd guess yes)? Does components/crash/breakpad_linux.cc affect also crashpad? mark / scott / rsesek: inputs?
,
Apr 29 2016
,
Apr 29 2016
haraken@ When you say AmountOfVirtualMemory() do you mean amount of virtual memory "free" left I suppose? This is another interesting concept. +tobiasjs@ was doing that (but only for webview/clank microdumps) in crrev.com/1796803003. I think the real question you want to ask here is not "how much virtual memory is left" but "what is the largest contiguous block of virtual memory left?" to take into account virtual address space fragmentation. Technically this is a bit more challenging and cannot be done as a base:: api, for this reason: you want to query the state of virtual memory at the time of the crash, not at the time you start chrome. For this reason, neither 1) nor 2) are suitable for this specific case. I think the only option is generalizing +tobiasjs idea, as part of a crash-time breakpad analysis, and attach it also to minidumps.
,
Apr 29 2016
What I really want to know from a crash report is how much memory the device was using when it hits the crash. In long term, tracing v2 + crash reports would be a solution, but I'm wondering if we can just use the base APIs as a short-term solution. > Technically this is a bit more challenging and cannot be done as a base:: api, for this reason: you want to query the state of virtual memory at the time of the crash, not at the time you start chrome. Yes, I'm assuming to record the values periodically (e.g., every time Oilpan's GC is triggered).
,
Apr 29 2016
Ok so there are 3 different things here: 1) Report the device memory in crashes. This seems a no-brainer, is just a matter of finding the right place, as I questioned in #0 2) Reporting VM size (to detect if hanging is due to trashing). Note here vm size and not RSS/PSS, as if we are swapping (as suspected) the PSS/RSS will lower down due to the page-out activity. 3) vaddr fragmentation. This is to validate the theory that we crash after long uptime due to vaddr fragmentation even if we are technically not OOM (i.e. vsmize is still reasonable) Now the question is: where/when should we do 2. and 3. Both of them could be done by breakpad at crash dump time. breakpad/crashpad have full visibility of mmaps. Updating a crash key periodically might be another opition for 2) (3 would take too long) but there are bunch of thing to keep in mind. - Reading /proc/PID/status (which is how ProcessMetrics::GetPagefileUsage() is implemented for Linux/Android) is slow-ish and should be posted on a blocking worker pool. - Reading /proc/PID/status for a renderer might not work on Linux desktop due to sandboxing (in memory-infra we had to dump renderer stats from the browser process). No idea about what the situation is on Windows/Mac. mark/rsesek might have a clearer idea on how to proceed on Mac, and scottmg on Windows.
,
Apr 29 2016
+Bruce filed a related request here https://bugs.chromium.org/p/crashpad/issues/detail?id=105. For Windows: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366770(v=vs.85).aspx - Getting physical seems easy - Getting commit for the crashing process from the handler seems not-easy, but pushing it in periodically from Chrome might be enough. We might also want to include some info from https://msdn.microsoft.com/en-us/library/windows/desktop/ms684824(v=vs.85).aspx . I'm not sure where we want to put them. Crash key would be easy-ish and works for seeing it on crash/, but it's sometimes nice to have in the dump itself on Windows. The address map is already stored in Windows crash dumps from Crashpad, see attached for an example of what !address reports. Is that useful to understand "free" virtual memory? Maybe it would be helpful to work on the crash/ ui to have a summary (e.g. largest free chunk of virtual memory), or otherwise present that somehow.
,
Apr 29 2016
!address definitely can be useful, although it's a shame that the debuggers don't make it more obvious and easier to browse. I used it recently to prove that a Chrome OOM crash was not due to being out of virtual address space in the process (the usual reason) and therefore must be due to actually being out of memory. Hence my request for system level information about system commit (ideally total memory, total memory + page-file, and total committed). This would also give hints about possible page-file thrashing, but no certainty unfortunately. Detecting paging rates requires administrator privileges I believe.
,
May 2 2016
Bug crashpad:105 is related. I want to have a proper home field for this in the minidump. This shouldn’t be something that everyone needs their own code to compute, and nobody should have to stick in a custom crash key. Probably we should just hang a system information extension off of the CrashpadInfo stream.
,
May 19 2016
,
Jul 19 2016
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by haraken@chromium.org
, Apr 29 2016