browser crash not caught by crashpad |
||||||
Issue descriptionChrome Version: 69.0.3492.0 OS: Win10 Came back to PC and chrome had crashed, but loading chrome there is nothing in chrome://crashes. There is an event log, however: Faulting application name: chrome.exe, version: 69.0.3492.0, time stamp: 0x5b4ae878 Faulting module name: chrome.dll, version: 69.0.3492.0, time stamp: 0x5b4ae5dc Exception code: 0xc0000005 Fault offset: 0x000000000001c640 Faulting process id: 0x26f8 Faulting application start time: 0x01d41d2ddda775eb Faulting application path: C:\Users\wfh\AppData\Local\Google\Chrome SxS\Application\chrome.exe Faulting module path: C:\Users\wfh\AppData\Local\Google\Chrome SxS\Application\69.0.3492.0\chrome.dll Report Id: 8f479fde-1dff-48ec-ba51-80cd2423eabd Faulting package full name: Faulting package-relative application ID:
,
Jul 17
crashes with this on the stack are here: http://shortn/_ntb2iWaQvm maybe this is to do with call to Env::GetInstance() which does access TLS. Probably not actionable without a stack.
,
Jul 17
H/T brucedawson I had actually turned on local dumps, but forgotten where I'd set them to output to. I do have a dump for this (but no crashpad report). 0:043> k *** Stack trace for last set context - .thread/.cxr resets it # Child-SP RetAddr Call Site 00 000000c4`264010a0 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371] 01 000000c4`264010d0 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643] 02 000000c4`26401130 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44] 03 000000c4`26401160 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128 04 000000c4`26401200 00007ffd`197fdc1e ntdll!RtlDispatchException+0x6a 05 000000c4`26401900 00007ffc`d571c640 ntdll!KiUserExceptionDispatch+0x2e 06 000000c4`26402020 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371] 07 000000c4`26402050 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643] 08 000000c4`264020b0 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44] 09 000000c4`264020e0 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128 0a 000000c4`26402180 00007ffd`197fdc1e ntdll!RtlDispatchException+0x6a 0b 000000c4`26402880 00007ffc`d571c640 ntdll!KiUserExceptionDispatch+0x2e 0c 000000c4`26402fa0 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371] 0d 000000c4`26402fd0 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643] 0e 000000c4`26403030 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44] 0f 000000c4`26403060 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128 [this just keeps repeating over and over] This appears to be in GlobalActivityTracker::RecordExceptionImpl trying to get the tracker.
,
Jul 17
Cool! I'm glad the local WER crash dumps helped. Pasting in the link to instructions 'cause I never miss a chance: https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps Also note that you need to redo this after every major (~6 months) Windows upgrade. Can you share a crash dump?
,
Jul 17
Also, relevant to the chromium-dev@ thread, any ideas on why breakpad didn't catch this? Understanding that seems important. Adding mosescu@ because I know he has looked at this issue.
,
Jul 17
http://shortn/_KPLuKfPssc is the dump. I don't think it's caught because it registers a VEH in RegisterStabilityVEH components/browser_watcher/stability_debugging.cc that sits before crashpad, but because of the exception loop, the crashpad one never gets called. We should probably make sure any additional vectored exception handlers try really hard to not crash.
,
Jul 17
Huh. TlsGetValue returned a value of 1. That's enough to get past the NULL check but not enough to work well as a pointer. Apparently the TLS array was marked as destroyed: // A sentinel value to indicate that the TLS system has been destroyed. void* const kDestroyed = reinterpret_cast<void*>(1); https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?q=f:thread_local_storage.cc&sq=package:chromium&g=0&l=273 So this is basically a use-after-free crash. We either need to fix the root cause of the use-after-free or update ThreadLocalStorage::Slot::Get to check for kDestroyed as well as null.
,
Jul 17
The stack managed 0xc63 (3,171) stack frames before overflowing, but here is where it all started:
c50 chrome!base::ThreadLocalStorage::Slot::Get
c51 chrome!base::debug::GlobalActivityTracker::RecordException
c52 chrome!browser_watcher::`anonymous namespace'::VectoredExceptionHandler
c53 ntdll!RtlpCallVectoredHandlers
c54 ntdll!RtlDispatchException
c55 ntdll!KiUserExceptionDispatch
c56 chrome!base::ThreadLocalStorage::Slot::Get
c57 chrome!base::debug::ScopedLockAcquireActivity::ScopedLockAcquireActivity
c58 chrome!base::internal::LockImpl::Lock
c59 chrome!heap_profiling::`anonymous namespace'::DoSend
c5a chrome!heap_profiling::AllocatorShimLogFree
c5b chrome!heap_profiling::`anonymous namespace'::HookFree
c5c chrome!thread_local_destructor
c5d ntdll!LdrpCallInitRoutine
c5e ntdll!LdrpCallTlsInitializers
c5f ntdll!LdrShutdownThread
c60 ntdll!RtlExitUserThread
c61 kernel32!BaseThreadInitThunk
c62 ntdll!RtlUserThreadStart
Looking at HookFree we see:
void HookFree(const AllocatorDispatch* self, void* address, void* context) {
ScopedAllowFree allow_logging;
const AllocatorDispatch* const next = self->next;
next->free_function(next, address, context);
if (LIKELY(allow_logging)) {
AllocatorShimLogFree(address);
}
}
and looking at ScopedAllowFree I see this comment:
// A ScopedAllow{Free,Alloc} instance must be instantiated in the scope of all
// hooks.
// AllocatorShimLogAlloc/AllocatorShimLogFree must only be called if it
// evaluates to true.
//
// There are two reasons why logging may be disabled.
// 1) To prevent reentrancy from logging code.
// 2) During thread destruction, Chrome TLS has been destroyed and it can no
// longer be used to determine if reentrancy is occurring. Attempting to
// access Chrome TLS after it has been destroyed is disallowed.
So it looks like the bug is that ScopedAllowFree was not notified that we were in the destruction phase. According to CanEnterAllocatorShim() this is supposed to be prevented by the TLS value (really, TLS?) owned by g_prevent_reentrancy_key which is set to 0x27, but I can't figure out where in @tib the actual value of that key for that thread is.
Maybe we need a global g_prevent_reentrancy flag? Anyway this looks like an allocator_shim bug, should probably be owned by erikchen@, CCing him.
,
Jul 17
,
Jul 17
Thanks for the diagnosis Bruce. I agreed with the assessment. It's not safe to call ThreadLocalStorage::Slot::Get post-thread TLS shut-down. In this case, the relevant control flow is: 1) boring_ssl's thread_local_destructor calls into 2) allocator shim's hook_free which calls 3) base::Lock 4) Which calls base::debug::ScopedLockAcquireActivity::ScopedLockAcquireActivity 5) which calls base::ThreadLocalStorage::Slot::Get So either: a) We need to not use base::Lock in allocator_shim [by rolling our own implementation], or else b) base::Lock should not call base::ThreadLocalStorage::Slot::Get, or needs to guard with base::ThreadLocalStorage::HasBeenDestroyed beh. We're reaching the stage of Chromium where I want a simple base, which has a lock implementation which just locks and doesn't do fancy things. I think I slightly prefer (a) over (b). + gab, do you have thoughts/preferences here?
,
Jul 17
I'm so pleased I work with awesome colleagues who can just dive into bugs like this and find root causes! eom.
,
Jul 18
If guarding Lock with base::ThreadLocalStorage::HasBeenDestroyed is doable that seems preferable (i.e. scales to other potential such callers who use Lock like it's just-a-lock(TM)..)
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472 commit fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472 Author: erikchen <erikchen@chromium.org> Date: Wed Jul 18 21:38:37 2018 Guard use of Chrome TLS in GlobalActivityTracker::GetOrCreateTracker. The implementation of heap_profiling uses base::Lock. That uses GlobalActivityTracker::GetOrCreateTracker, which uses Chrome TLS. Since heap_profiling may be used post TLS destruction, all called code must also guard against use of Chrome TLS by checking base::ThreadLocalStorage::HasBeenDestroyed. Bug: 864589 Change-Id: I9b7b61d702a79062f847f17d18b6d30f3681b837 Reviewed-on: https://chromium-review.googlesource.com/1142347 Reviewed-by: Gabriel Charette <gab@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> Cr-Commit-Position: refs/heads/master@{#576224} [modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/base/debug/activity_tracker.h [modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/base/threading/thread_local_storage.h [modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/components/services/heap_profiling/public/cpp/allocator_shim.cc
,
Jul 18
Bug should be fixed. Note that crashpad not catching infinite stack recursion/overflow is probably something we want to fix separately. +scottmg, mark
,
Jul 19
Stack overflow is difficult to handle with 100% coverage on Windows because there’s no sigaltstack() equivalent. We could make our UEF something “frameless” that trampolines out to a custom preallocated stack, but it’s possible that a stack overflow would have made it impossible even for the system to allocate the frame for our trampoline UEF. As it stands now, our UEF allocates a small frame on the current stack. There are a number of crash coverage improvements we’d like, and this is on that list. The best way to do it requires Microsoft’s help, and we’re talking to them.
,
Jul 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/003b1896555d6fd72d72307ea36cb201fa7a6c6c commit 003b1896555d6fd72d72307ea36cb201fa7a6c6c Author: Brian White <bcwhite@chromium.org> Date: Fri Jul 20 17:43:21 2018 Move TLS-destroyed test to base location. Bug: 864589 Change-Id: Idbb08e6c1da5897be06d5ab357dc5a32cfd03bb3 Reviewed-on: https://chromium-review.googlesource.com/1145132 Reviewed-by: Erik Chen <erikchen@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#576915} [modify] https://crrev.com/003b1896555d6fd72d72307ea36cb201fa7a6c6c/base/debug/activity_tracker.cc [modify] https://crrev.com/003b1896555d6fd72d72307ea36cb201fa7a6c6c/base/debug/activity_tracker.h |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by wfh@chromium.org
, Jul 17