add exit code for OOM |
|||||||||
Issue descriptionRight now, an OOM causes a __debugbreak() which returns an exit code 0x80000003 - STATUS_BREAKPOINT. It would be nice if instead a special OOM exit code was returned somehow from both renderer and browser processes, then this could be tracked by the browser/watcher processes respectively. This would give us a better gauge of OOM errors rather than having to use the data from crash/
,
Jul 7 2016
(Scoping this just to Windows for the time being) primiano; jochen - i'd like your view on this. I think we'd have to remove the CHECK on OOM and replace it with something else that we can somehow signal crashpad with. It should be done in a safe way so we still guarantee we crash and we gather a crash dump. Perhaps we can throw an unrecoverable exception with a specific code, at least on Windows. Also, the code will have to be consistent between v8/blink and chromium... Or is this a lost cause, and any sort of analysis should be done out of process (e.g. by crashpad)
,
Jul 8 2016
So, I think I miss a bit of context here. What is the problem we are trying to solve? Is the fact that various piece of the codebase report OOM in different ways and we want o to make it uniform?
,
Jul 8 2016
the problem we are trying to solve is trying to make the sad tab page more informative. right now there is no way to distingish a CHECK() from an OOM since they both raise STATUS_BREAKPOINT exception and the return code is the same. I think the way we crash due to OOM is relatively uniform between all the allocators - I think they all end up in a __debugbreak() eventually (on Windows anyway). As a test I changed the chromium oom handler to call ::RaiseException() with a custom exception code (0xE0000008) from the OOM handler and this seems to work fine, and the return code is now unique for OOM vs a debug CHECK. Histogram: CrashExitCodes.Renderer recorded 1 samples (flags = 0x1) 536870904 ------------------------------------------------------------------------O (1 = 100.0%) crashpad grabs this exception fine and the stack looks like (testing via a new debug url chrome://memory-exhaust) before: 0:000> kn *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 0118d43c 55c107ff chrome_child!base::debug::BreakDebugger+0x9 [c:\src\gclient\src\base\debug\debugger_win.cc @ 21] 01 0118d984 55c48447 chrome_child!logging::LogMessage::~LogMessage+0x1ef [c:\src\gclient\src\base\logging.cc @ 751] 02 0118da44 55c79f5c chrome_child!base::`anonymous namespace'::OnNoMemory+0x57 [c:\src\gclient\src\base\process\memory_win.cc @ 43] 03 (Inline) -------- chrome_child!?A0xc2e0f132::call_new_handler+0xc [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 77] 04 0118da58 5728a41a chrome_child!malloc+0x3c [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 119] 05 0118da60 5728cf58 chrome_child!content::`anonymous namespace'::ExhaustMemory+0xa [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 402] 06 0118db30 572941c1 chrome_child!content::`anonymous namespace'::MaybeHandleDebugURL+0x588 [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 523] 07 0118dcc4 5728d003 chrome_child!content::RenderFrameImpl::PrepareRenderViewForNavigation+0x71 [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 5766] after: 0:000> kn *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 00f5dcd4 553c6eb1 KERNELBASE!RaiseException+0x48 01 00f5dce8 553f879c chrome_child!base::`anonymous namespace'::OnNoMemory+0x11 [c:\src\gclient\src\base\process\memory_win.cc @ 42] 02 (Inline) -------- chrome_child!?A0xc2e0f132::call_new_handler+0xc [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 77] 03 00f5dcfc 56a0702a chrome_child!malloc+0x3c [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 119] 04 00f5dd04 56a09b68 chrome_child!content::`anonymous namespace'::ExhaustMemory+0xa [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 402] 05 00f5ddd4 56a10da1 chrome_child!content::`anonymous namespace'::MaybeHandleDebugURL+0x588 [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 523] 06 00f5df68 56a09c13 chrome_child!content::RenderFrameImpl::PrepareRenderViewForNavigation+0x71 [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 5766] Also the exception code changes from 0x80000003 to 0xE0000008. Both these changes will have to be checked against minidump_stackwalk and crash/ backend to make sure the OOMs would still be bucketed correctly.
,
Jul 8 2016
FWIW crashes aren't correctly bucketed on crash/ see crash/d9aec13600000000 so to make this change some minor changes to the crash algorithm would be needed, probably just to ignore the RaiseException stack frame.
,
Jul 8 2016
I have CL(s) that call RaiseException from v8, partitionalloc and allocator_shim - which I think are all the allocation paths, unless I'm missing any? These all end up in Histogram: CrashExitCodes.Renderer with the right exit code. V8: 0:000> kn *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 00a3cdc0 570ef713 KERNELBASE!RaiseException+0x48 01 00a3cdd8 56409016 chrome_child!blink::reportOOMErrorInMainThread+0x43 [c:\src\gclient\src\third_party\webkit\source\bindings\core\v8\v8initializer.cpp @ 104] 02 00a3cdec 563f8168 chrome_child!v8::Utils::ReportOOMFailure+0x76 [c:\src\gclient\src\v8\src\api.cc @ 347] 03 00a3d910 56443447 chrome_child!v8::internal::V8::FatalProcessOutOfMemory+0x1e8 [c:\src\gclient\src\v8\src\api.cc @ 321] 04 00a3d938 582a7299 chrome_child!v8::internal::Factory::NewUninitializedFixedArray+0x147 [c:\src\gclient\src\v8\src\factory.cc @ 148] Partition Alloc: 0:000> kn *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 012fb53c 57faccc8 KERNELBASE!RaiseException+0x48 01 012fb558 57facaec chrome_child!WTF::partitionsOutOfMemoryUsing2G+0x28 [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitions.cpp @ 122] 02 012fb568 57f9cf76 chrome_child!WTF::Partitions::handleOutOfMemory+0x3c [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitions.cpp @ 188] 03 012fb570 57f9b20f chrome_child!WTF::partitionOutOfMemory+0x26 [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitionalloc.cpp @ 326] 04 012fb59c 56f8642e chrome_child!WTF::partitionAllocSlowPath+0x10bf [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitionalloc.cpp @ 838] 05 (Inline) -------- chrome_child!WTF::partitionBucketAlloc+0x2dd [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitionalloc.h @ 632] 06 (Inline) -------- chrome_child!WTF::partitionAlloc+0x39e [c:\src\gclient\src\third_party\webkit\source\wtf\allocator\partitionalloc.h @ 671] 07 012fb5cc 56f664bd chrome_child!blink::InlineBox::operator new+0x3ce [c:\src\gclient\src\third_party\webkit\source\core\layout\line\inlinebox.cpp @ 89] 0:000> kn *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr 00 0118d7b8 55d37641 KERNELBASE!RaiseException+0x48 01 0118d7cc 55d693bc chrome_child!base::`anonymous namespace'::OnNoMemory+0x11 [c:\src\gclient\src\base\process\memory_win.cc @ 42] 02 (Inline) -------- chrome_child!?A0xc2e0f132::call_new_handler+0xc [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 77] 03 0118d7e0 5737841a chrome_child!malloc+0x3c [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 119] 04 0118d7e8 5737af48 chrome_child!content::`anonymous namespace'::ExhaustMemory+0xa [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 402] 05 0118d8b8 57382161 chrome_child!content::`anonymous namespace'::MaybeHandleDebugURL+0x588 [c:\src\gclient\src\content\renderer\render_frame_impl.cc @ 523]
,
Jul 11 2016
Re #4, Ahh I see. Unfortunately I am not have a good knowledge of winapis to help. That RaiseException seems to me something similar to unix return code? Seems fine to me. It would be just trickier if you want to achieve the same behavior on Linux. There, OOM can manifest in two different ways: 1) We try to malloc/mmap and we get a error / nullptr. In this case we can decide what to do and how to suicide, all good. 2) We just get killed by the kernel OOM while we are page-faulting. This case (which I think is quite common) is tough, because we can only observe that the child has been killed with SIGKILL. We could probably assume that SIGKILL == OOM, even if it's not true for cases like logoff / OS shutdown (but probably we don't care too much there). My understanding is that this doesn't happen on Windows, where effectively the OS does not overcommits memory and every allocations is guaranteed to be backed by the pagefile. > I have CL(s) that call RaiseException from v8, partitionalloc and allocator_shim - which I think are all the allocation paths. There are other two that come to my mind: - Skia: sk_out_of_memory in SkMemory_new_handler.cpp, which does an abort() - Oilpan (aka Blink GC) in WebKit/Source/platform/heap/PageMemory.cpp blinkGCOutOfMemory -> does a IMMEDIATE_CRASH() -> __builtin_trap()
,
Jul 13 2016
skia has skia/ext/SkMemory_new_handler.cpp but I also found third_party/skia/src/ports/SkMemory_malloc.cpp both call abort() but want to know which I should be changing.
,
Jul 13 2016
mtklein I wonder if you could help me with #8 specifically adding the ::RaiseException in the right OOM handler in skia (or alternatively adding OOM callbacks in skia to call into the embedder).
,
Jul 13 2016
skia/ext/SkMemory_new_handler.cpp is the one used by Chrome. I think you just want to raise your new exception in sk_out_of_memory(). (sk_abort_no_print() is used for other fatal, non-OOM issues.)
,
Jul 13 2016
SKIA seems to go via the allocator shim in release... # ChildEBP RetAddr 00 010fd250 032bf051 KERNELBASE!RaiseException+0x48 01 010fd264 032dbfac chrome_2df0000!base::`anonymous namespace'::OnNoMemory+0x11 [c:\src\gclient\src\base\process\memory_win.cc @ 42] 02 (Inline) -------- chrome_2df0000!?A0xc2e0f132::call_new_handler+0xc [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 77] 03 010fd278 038ca9de chrome_2df0000!malloc+0x3c [c:\src\gclient\src\base\allocator\allocator_shim_win.cc @ 119] 04 010fd28c 0368950a chrome_2df0000!sk_malloc_throw+0xe [c:\src\gclient\src\skia\ext\skmemory_new_handler.cpp @ 69] 05 010fd298 05612563 chrome_2df0000!SkCanvas::SkCanvas+0x5a [c:\src\gclient\src\third_party\skia\src\core\skcanvas.cpp @ 776] 06 010fd2dc 04e4bf38 chrome_2df0000!skia::AnalysisCanvas::AnalysisCanvas+0x23 [c:\src\gclient\src\skia\ext\analysis_canvas.cc @ 352] 07 010fe474 04e545c3 chrome_2df0000!cc::RasterSource::PerformSolidColorAnalysis+0x158 [c:\src\gclient\src\cc\playback\raster_source.cc @ 230] 08 010fe854 04e572a4 chrome_2df0000!cc::TileManager::AssignGpuMemoryToTiles+0x223 [c:\src\gclient\src\cc\tiles\tile_manager.cc @ 646] 09 010fe8c4 04e25afb chrome_2df0000!cc::TileManager::PrepareTiles+0x144 [c:\src\gclient\src\cc\tiles\tile_manager.cc @ 485]
,
Jul 14 2016
Re #11. Oh right, thinking to that more, I think that their code on throw_on_failure is not hit in most cases. The reason, as you show, is that the shim has its own check, so if we fail the shim suicides before we get back to skia's throw_on_failure. However, mind that some skia paths should bypass the shim. Specificially sk_calloc, which uses base::UncheckedMalloc, which IIRC is bypassed in the shim. Summarizing: sk_malloc -> gets the shim protection, sk_out_of_memory there is never hit. sk_calloc -> I believe this one can hit sk_out_of_memory and do its own abort().
,
Jul 14 2016
I agree, if malloc(), calloc(), and realloc() throw on failure already, throw_on_failure()'s abort() should never be called. sk_calloc()'s naming seem pretty poor. Probably my fault there. It's the logical parallel to sk_malloc_nothrow(). (There is no function sk_malloc().)
,
Jul 14 2016
You may still want to add your new thing into sk_out_of_memory(). It can be called directly too, see SkTemplates.h:204 and https://codereview.chromium.org/831583004 .
,
Jul 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8ca194a234f2ec714a46da50b2e10f2b651b9aba commit 8ca194a234f2ec714a46da50b2e10f2b651b9aba Author: wfh <wfh@chromium.org> Date: Wed Jul 20 02:06:54 2016 Change OOMs to raise custom exception rather than breakpoint on Windows. This adds the exception reporting for allocators in: - Chromium - Blink (both PartitionAlloc and Oilpan) - Skia - V8 (via call to new API added in crrev.com/2139873002) Magic signature update to crash processor is in go/internal_cl_for_2130293003 BUG= 614440 TEST=Visit chrome://memory-exhaust and verify that histogram CrashExitCodes.Renderer has values in bucket 536870904. TEST=Simulate more crashes in Chromium, Blink, V8 and Skia, and verify they all raise exception 0xe0000008. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation Review-Url: https://codereview.chromium.org/2130293003 Cr-Commit-Position: refs/heads/master@{#406458} [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/base/process/memory.h [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/base/process/memory_unittest.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/base/process/memory_win.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/chrome/browser/extensions/extension_tab_util.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/chrome/common/url_constants.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/content/browser/frame_host/debug_urls.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/content/public/common/url_constants.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/content/public/common/url_constants.h [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/content/renderer/render_frame_impl.cc [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/skia/ext/SkMemory_new_handler.cpp [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/third_party/WebKit/Source/bindings/core/v8/V8Initializer.cpp [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/third_party/WebKit/Source/platform/heap/PageMemory.cpp [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/third_party/WebKit/Source/wtf/Assertions.h [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/third_party/WebKit/Source/wtf/allocator/PartitionAlloc.cpp [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/third_party/WebKit/Source/wtf/allocator/Partitions.cpp [modify] https://crrev.com/8ca194a234f2ec714a46da50b2e10f2b651b9aba/tools/metrics/histograms/histograms.xml
,
Jul 21 2016
Looks like I missed (at least) one - child_discardable_shared_memory_manager.cc calls base::TerminateBecauseOutOfMemory which calls OnNoMemory in base/process/memory.cc.
,
Jul 21 2016
Ah damn it... I knew about that one! that's the bell that was ringing in the back of my mind.
,
Jul 21 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f7640b9cc0c85d0c650725d3f496facd18488663 commit f7640b9cc0c85d0c650725d3f496facd18488663 Author: wfh <wfh@chromium.org> Date: Thu Jul 21 21:14:29 2016 Add some missing CrashExitCodes for child processes. Also, rename 536870904 to "Out of Memory" since this isn't a real Windows exception code. BUG= 614440 Review-Url: https://codereview.chromium.org/2170923002 Cr-Commit-Position: refs/heads/master@{#406941} [modify] https://crrev.com/f7640b9cc0c85d0c650725d3f496facd18488663/tools/metrics/histograms/histograms.xml
,
Jul 22 2016
,
Jul 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3850a4d9df7a7e6f1c2d48715362de6bac01523a commit 3850a4d9df7a7e6f1c2d48715362de6bac01523a Author: wfh <wfh@chromium.org> Date: Sat Jul 23 00:23:53 2016 Make base::TerminateBecauseOutOfMemory call RaiseException on Windows. This is a follow-on CL to https://codereview.chromium.org/2130293003 which added the RaiseException call in memory_win.cc but missed memory.cc BUG= 614440 Review-Url: https://codereview.chromium.org/2173463002 Cr-Commit-Position: refs/heads/master@{#407315} [modify] https://crrev.com/3850a4d9df7a7e6f1c2d48715362de6bac01523a/base/process/memory.cc [modify] https://crrev.com/3850a4d9df7a7e6f1c2d48715362de6bac01523a/base/process/memory.h [modify] https://crrev.com/3850a4d9df7a7e6f1c2d48715362de6bac01523a/base/process/memory_stubs.cc [modify] https://crrev.com/3850a4d9df7a7e6f1c2d48715362de6bac01523a/base/process/memory_win.cc [modify] https://crrev.com/3850a4d9df7a7e6f1c2d48715362de6bac01523a/skia/ext/SkMemory_new_handler.cpp
,
Jul 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/48c487e61345a4b96040d04e7cdc59b38c49c891 commit 48c487e61345a4b96040d04e7cdc59b38c49c891 Author: wfh <wfh@chromium.org> Date: Wed Jul 27 22:48:47 2016 Label sandbox job object OOM as OOM in process termination. BUG= 614440 , 630472 TEST=go to chrome://memory-exhaust on 64-bit Chrome and verify it gives the OOM sad tab. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.win:win10_chromium_x64_rel_ng Review-Url: https://codereview.chromium.org/2191643002 Cr-Commit-Position: refs/heads/master@{#408259} [modify] https://crrev.com/48c487e61345a4b96040d04e7cdc59b38c49c891/base/process/kill.h [modify] https://crrev.com/48c487e61345a4b96040d04e7cdc59b38c49c891/base/process/kill_win.cc [modify] https://crrev.com/48c487e61345a4b96040d04e7cdc59b38c49c891/content/app/content_main_runner.cc [modify] https://crrev.com/48c487e61345a4b96040d04e7cdc59b38c49c891/sandbox/win/src/sandbox_types.h
,
Jul 28 2016
Tested this issue on Windows-7 using chrome latest canary M54-54.0.2810.2. By navigating to chrome://memory-exhaust observed OOM sad tab. Wfh@ - Able to see the Aw, Snap! error as expected, But unable to see any crash ID's under chrome://crashes. Could you please confirm is this is a expected behavior?
,
Jul 28 2016
There is no crash dump for Chrome 64-bit because the Job object terminates the process and crashpad cannot get a dump. However, there should be a crash dump for Chrome 32-bit.
,
Aug 16 2016
,
Jan 10 2018
Any reason why 0xE0000008 was chosen over any of the standard Windows exception codes used to report OOM? Any of the following that would make much more sense: ERROR_NOT_ENOUGH_MEMORY (0x80070008) "Not enough storage is available to process this command" ERROR_OUTOFMEMORY (0x8007000E) "Not enough storage is available to complete this operation" AFAICT The former is a more specific case of the later: it reports that something was never done because not enough memory was available to do it, whereas the later reports doing something failed at some point because not enough memory was available to complete it. The current exception code leaves everyone unaware of this special code completely oblivious as to what caused the exception without further investigation and can act as a red herring to people looking for more serious issues. For instance, I work on a tool called BugId that can be used to analyze application crashes. BugId now detects these exception and reports them as OOM crashes, but I've had people insist that it must be something else because of the weird exception code. It would make my life easier if I did not have to special case this exception code in Chrome, or explain to people that this really is just an OOM crash. Also, I'm finding there are places in the code that still throw a breakpoint on OOM: https://chromium.googlesource.com/chromium/src/+/master/base/task_scheduler/scheduler_worker_pool_impl.cc#214 workers_.reserve(num_initial_workers); `workers_` is a `std::vector` and `workers_.reserve(...)` will trigger an int 3 on OOM. The same should be true for other `std::vector`-s in the code. https://chromium.googlesource.com/chromium/src/+/master/services/ui/public/cpp/gpu/client_gpu_memory_buffer_manager.cc#37 CHECK(thread_.Start()); if `thread_.Start()` fails because of OOM, `CHECK` will throw a Breakpoint exception. I hope it's possible to replace the exception code with one of the standard ones, and if you're going to address these other OOM-leads-to-breakpoint-exception bugs, I'll continue to report any others I may find.
,
Jan 10 2018
RaiseException should only be used with an "application-defined exception code" - see https://msdn.microsoft.com/en-us/library/windows/desktop/ms680552.aspx - this is why a custom one was used. We are unlikely to change this. Thanks for the report of breakpoint exceptions. Sounds like the second one is under our control and should probably be fixed, but I wonder if the call to reserve() ends up in the stl and so might not be as easy to change. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by wfh@chromium.org
, Jun 29 2016