New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 864589 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 18
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

browser crash not caught by crashpad

Project Member Reported by wfh@chromium.org, Jul 17

Issue description

Chrome Version: 69.0.3492.0
OS: Win10

Came back to PC and chrome had crashed, but loading chrome there is nothing in chrome://crashes.

There is an event log, however:

Faulting application name: chrome.exe, version: 69.0.3492.0, time stamp: 0x5b4ae878
Faulting module name: chrome.dll, version: 69.0.3492.0, time stamp: 0x5b4ae5dc
Exception code: 0xc0000005
Fault offset: 0x000000000001c640
Faulting process id: 0x26f8
Faulting application start time: 0x01d41d2ddda775eb
Faulting application path: C:\Users\wfh\AppData\Local\Google\Chrome SxS\Application\chrome.exe
Faulting module path: C:\Users\wfh\AppData\Local\Google\Chrome SxS\Application\69.0.3492.0\chrome.dll
Report Id: 8f479fde-1dff-48ec-ba51-80cd2423eabd
Faulting package full name: 
Faulting package-relative application ID: 
 
Cc: brucedaw...@chromium.org
This is:

chrome_7ffcaf7f0000!base::ThreadLocalStorage::Slot::Get+0x20

C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc(371)+0x7

https://chromium.googlesource.com/chromium/src.git/+/69.0.3492.0/base/threading/thread_local_storage.cc#371

without stack, hard to diagnose.
crashes with this on the stack are here: http://shortn/_ntb2iWaQvm maybe this is to do with call to Env::GetInstance() which does access TLS. Probably not actionable without a stack.
Owner: bcwh...@chromium.org
Status: Assigned (was: Untriaged)
H/T brucedawson I had actually turned on local dumps, but forgotten where I'd set them to output to. I do have a dump for this (but no crashpad report).

0:043> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr           Call Site
00 000000c4`264010a0 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371]
01 000000c4`264010d0 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643]
02 000000c4`26401130 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44]
03 000000c4`26401160 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128
04 000000c4`26401200 00007ffd`197fdc1e ntdll!RtlDispatchException+0x6a
05 000000c4`26401900 00007ffc`d571c640 ntdll!KiUserExceptionDispatch+0x2e
06 000000c4`26402020 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371]
07 000000c4`26402050 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643]
08 000000c4`264020b0 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44]
09 000000c4`264020e0 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128
0a 000000c4`26402180 00007ffd`197fdc1e ntdll!RtlDispatchException+0x6a
0b 000000c4`26402880 00007ffc`d571c640 ntdll!KiUserExceptionDispatch+0x2e
0c 000000c4`26402fa0 00007ffc`d57bb99b chrome_7ffcd5700000!base::ThreadLocalStorage::Slot::Get+0x20 [C:\b\c\b\win64_clang\src\base\threading\thread_local_storage.cc @ 371]
0d 000000c4`26402fd0 00007ffc`d6ad5fad chrome_7ffcd5700000!base::debug::GlobalActivityTracker::RecordException+0x2b [C:\b\c\b\win64_clang\src\base\debug\activity_tracker.cc @ 1643]
0e 000000c4`26403030 00007ffd`197c5678 chrome_7ffcd5700000!browser_watcher::`anonymous namespace'::VectoredExceptionHandler+0x2d [C:\b\c\b\win64_clang\src\components\browser_watcher\stability_debugging.cc @ 44]
0f 000000c4`26403060 00007ffd`1976692a ntdll!RtlpCallVectoredHandlers+0x128

[this just keeps repeating over and over]

This appears to be in GlobalActivityTracker::RecordExceptionImpl trying to get the tracker.
Cool! I'm glad the local WER crash dumps helped. Pasting in the link to instructions 'cause I never miss a chance:

https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps

Also note that you need to redo this after every major (~6 months) Windows upgrade.

Can you share a crash dump?

Cc: mosescu@chromium.org
Also, relevant to the chromium-dev@ thread, any ideas on why breakpad didn't catch this? Understanding that seems important. Adding mosescu@ because I know he has looked at this issue.
http://shortn/_KPLuKfPssc is the dump. I don't think it's caught because it registers a VEH in RegisterStabilityVEH components/browser_watcher/stability_debugging.cc that sits before crashpad, but because of the exception loop, the crashpad one never gets called. We should probably make sure any additional vectored exception handlers try really hard to not crash.
Huh. TlsGetValue returned a value of 1. That's enough to get past the NULL check but not enough to work well as a pointer. Apparently the TLS array was marked as destroyed:

// A sentinel value to indicate that the TLS system has been destroyed.
void* const kDestroyed = reinterpret_cast<void*>(1);

https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?q=f:thread_local_storage.cc&sq=package:chromium&g=0&l=273

So this is basically a use-after-free crash. We either need to fix the root cause of the use-after-free or update ThreadLocalStorage::Slot::Get to check for kDestroyed as well as null.

Cc: erikc...@chromium.org
The stack managed 0xc63 (3,171) stack frames before overflowing, but here is where it all started:

c50 chrome!base::ThreadLocalStorage::Slot::Get
c51 chrome!base::debug::GlobalActivityTracker::RecordException
c52 chrome!browser_watcher::`anonymous namespace'::VectoredExceptionHandler
c53 ntdll!RtlpCallVectoredHandlers
c54 ntdll!RtlDispatchException
c55 ntdll!KiUserExceptionDispatch
c56 chrome!base::ThreadLocalStorage::Slot::Get
c57 chrome!base::debug::ScopedLockAcquireActivity::ScopedLockAcquireActivity
c58 chrome!base::internal::LockImpl::Lock
c59 chrome!heap_profiling::`anonymous namespace'::DoSend
c5a chrome!heap_profiling::AllocatorShimLogFree
c5b chrome!heap_profiling::`anonymous namespace'::HookFree
c5c chrome!thread_local_destructor
c5d ntdll!LdrpCallInitRoutine
c5e ntdll!LdrpCallTlsInitializers
c5f ntdll!LdrShutdownThread
c60 ntdll!RtlExitUserThread
c61 kernel32!BaseThreadInitThunk
c62 ntdll!RtlUserThreadStart

Looking at HookFree we see:

void HookFree(const AllocatorDispatch* self, void* address, void* context) {
  ScopedAllowFree allow_logging;

  const AllocatorDispatch* const next = self->next;
  next->free_function(next, address, context);

  if (LIKELY(allow_logging)) {
    AllocatorShimLogFree(address);
  }
}

and looking at ScopedAllowFree I see this comment:

// A ScopedAllow{Free,Alloc} instance must be instantiated in the scope of all
// hooks.
// AllocatorShimLogAlloc/AllocatorShimLogFree must only be called if it
// evaluates to true.
//
// There are two reasons why logging may be disabled.
//   1) To prevent reentrancy from logging code.
//   2) During thread destruction, Chrome TLS has been destroyed and it can no
//      longer be used to determine if reentrancy is occurring. Attempting to
//      access Chrome TLS after it has been destroyed is disallowed.

So it looks like the bug is that ScopedAllowFree was not notified that we were in the destruction phase. According to CanEnterAllocatorShim() this is supposed to be prevented by the TLS value (really, TLS?) owned by g_prevent_reentrancy_key which is set to 0x27, but I can't figure out where in @tib the actual value of that key for that thread is.

Maybe we need a global g_prevent_reentrancy flag? Anyway this looks like an allocator_shim bug, should probably be owned by erikchen@, CCing him.

Owner: erikc...@chromium.org
Looks like crrev.com/c/1076448, but that landed May 30th, so ???
Cc: bcwh...@chromium.org gab@chromium.org
Thanks for the diagnosis Bruce. I agreed with the assessment. 

It's not safe to call ThreadLocalStorage::Slot::Get post-thread TLS shut-down. 

In this case, the relevant control flow is:

1) boring_ssl's thread_local_destructor calls into
2) allocator shim's hook_free  which calls
3) base::Lock
4) Which calls base::debug::ScopedLockAcquireActivity::ScopedLockAcquireActivity
5) which calls base::ThreadLocalStorage::Slot::Get

So either:
  a) We need to not use base::Lock in allocator_shim [by rolling our own implementation], or else
  b) base::Lock should not call base::ThreadLocalStorage::Slot::Get, or needs to guard with base::ThreadLocalStorage::HasBeenDestroyed

beh. We're reaching the stage of Chromium where I want a simple base, which has a lock implementation which just locks and doesn't do fancy things.

I think I slightly prefer (a) over (b).
+ gab, do you have thoughts/preferences here?
I'm so pleased I work with awesome colleagues who can just dive into bugs like this and find root causes! eom.
If guarding Lock with base::ThreadLocalStorage::HasBeenDestroyed is doable that seems preferable (i.e. scales to other potential such callers who use Lock like it's just-a-lock(TM)..)
Project Member

Comment 13 by bugdroid1@chromium.org, Jul 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472

commit fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472
Author: erikchen <erikchen@chromium.org>
Date: Wed Jul 18 21:38:37 2018

Guard use of Chrome TLS in GlobalActivityTracker::GetOrCreateTracker.

The implementation of heap_profiling uses base::Lock. That uses
GlobalActivityTracker::GetOrCreateTracker, which uses Chrome TLS. Since
heap_profiling may be used post TLS destruction, all called code must also guard
against use of Chrome TLS by checking
base::ThreadLocalStorage::HasBeenDestroyed.

Bug:  864589 
Change-Id: I9b7b61d702a79062f847f17d18b6d30f3681b837
Reviewed-on: https://chromium-review.googlesource.com/1142347
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576224}
[modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/base/debug/activity_tracker.h
[modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/base/threading/thread_local_storage.h
[modify] https://crrev.com/fac6d427d8f345a6baf2cc3c64f8d0c7d8ec7472/components/services/heap_profiling/public/cpp/allocator_shim.cc

Cc: scottmg@chromium.org mark@chromium.org
Status: Fixed (was: Assigned)
Bug should be fixed. Note that crashpad not catching infinite stack recursion/overflow is probably something we want to fix separately. 

+scottmg, mark
Stack overflow is difficult to handle with 100% coverage on Windows because there’s no sigaltstack() equivalent.

We could make our UEF something “frameless” that trampolines out to a custom preallocated stack, but it’s possible that a stack overflow would have made it impossible even for the system to allocate the frame for our trampoline UEF.

As it stands now, our UEF allocates a small frame on the current stack.

There are a number of crash coverage improvements we’d like, and this is on that list. The best way to do it requires Microsoft’s help, and we’re talking to them.
Project Member

Comment 16 by bugdroid1@chromium.org, Jul 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/003b1896555d6fd72d72307ea36cb201fa7a6c6c

commit 003b1896555d6fd72d72307ea36cb201fa7a6c6c
Author: Brian White <bcwhite@chromium.org>
Date: Fri Jul 20 17:43:21 2018

Move TLS-destroyed test to base location.

Bug:  864589 
Change-Id: Idbb08e6c1da5897be06d5ab357dc5a32cfd03bb3
Reviewed-on: https://chromium-review.googlesource.com/1145132
Reviewed-by: Erik Chen <erikchen@chromium.org>
Commit-Queue: Brian White <bcwhite@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576915}
[modify] https://crrev.com/003b1896555d6fd72d72307ea36cb201fa7a6c6c/base/debug/activity_tracker.cc
[modify] https://crrev.com/003b1896555d6fd72d72307ea36cb201fa7a6c6c/base/debug/activity_tracker.h

Sign in to add a comment