New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 821494 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Memory leak from failures in GlobalActivityTracker's ConstructTlsVector

Project Member Reported by etienneb@chromium.org, Mar 13 2018

Issue description

Chrome Version: 67.0.3360.0
OS: Windows

Out of Process Heap Profiling is a project that, for a very small set users, records heap allocations and sends back anonymized heap dumps. We've obtained a heap dump that suggests a memory leak due to a recursion in GlobalActivityTracker related to the VectoredExceptionHandler.

As shows in the attached images, there are 30209 objects allocated and still alive from the following stackframe:

    malloc
    operator new(unsigned __int64)
    `anonymous namespace'::ConstructTlsVector
    base::ThreadLocalStorage::Slot::Get()
    base::debug::GlobalActivityTracker::RecordException(...)
    browser_watcher::`anonymous namespace'::VectoredExceptionHandler

This is leaking about 150M of memory.

We received 4 slow-reports today, all from the same version: windows 67.0.3360.0

   76b5d6affd92ed00
   5c144f9b0e998d85
   860e2e7a31ddee14
   c781c67352f4675d


I suspect there is a infinite loop (or recursion) in this code:

https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?type=cs&sq=package:chromium&l=158

TlsVectorEntry* ConstructTlsVector() {

  [...]

    // Allocate an array to store our data.
    TlsVectorEntry* tls_data = new TlsVectorEntry[kThreadLocalStorageSize];
    memcpy(...);
    PlatformThreadLocalStorage::SetTLSValue(key, tls_data);
    return tls_data;


The tls_data is probably the object leaked. If an exception occurs when calling |SetTLSValue|, then the vectored exception handler is called again and a new instance of tls_data is allocated again (and leaked).
 
exception_handler.png
37.2 KB View Download
Cc: ellenpli@chromium.org
Description: Show this description
Cc: mark@chromium.org
Status: Available (was: Untriaged)
The simple fix would be to not create a TLS block if one doesn't already exist but there's no way for outside code to do that.

I'd have to add a parameter to Get() as so:
  void* ThreadLocalStorage::Slot::Get(bool create_if_necessary=true) const {...}
and then pass "false" in my call.

+mark, as a base/threading OWNER, what do you think of that?
Instead of
  ThreadLocalStorage::Slot::Get

We should have both semantic:
  ThreadLocalStorage::Slot::GetIfExist
  ThreadLocalStorage::Slot::GetOrCreate

But, that will be hard to change at large.

The other option is to have:

  ThreadLocalStorage::Slot::DoesExist()

which could be called first if important.

Comment 6 by mark@chromium.org, Mar 13 2018

thread_local_storage.h has instructions for a thread-safe lazily-allocated slot:

https://chromium.googlesource.com/chromium/src/+/fd6d802597d97fa206271a941575f0ebb5fd8df8/base/threading/thread_local_storage.h#99

 99    // A key representing one value stored in TLS. Use as a class member or a
100    // local variable. If you need a static storage duration variable, use the
101    // following pattern with a NoDestructor<Slot>:
102    // void MyDestructorFunc(void* value);
103    // ThreadLocalStorage::Slot& ImportantContentTLS() {
104    //   static NoDestructor<ThreadLocalStorage::Slot> important_content_tls(
105    //       &MyDestructorFunc);
106    //   return *important_content_tls;
107    // }

or have I misunderstood the question?
In this case, it's the TLS block itself that is being allocated multiple times, not the slots within it.

Slot::Get() always creates the block if it doesn't already exist but this can be a problem inside an exception handler if creating the block is what causes the exception.
We are getting a similar bug with the following stackframe:

  (anonymous namespace)::ConstructTlsVector()
  base::ThreadLocalStorage::Slot::Get() const
  profiling::(anonymous 
  namespace)::HookFreeDefiniteSize(base::allocator::AllocatorDispatch const*, 
  void*, unsigned long, void*)
  <...>

I think we should have a better error handling for this code:

  // Allocate an array to store our data.
  TlsVectorEntry* tls_data = new TlsVectorEntry[kThreadLocalStorageSize];
  memcpy(tls_data, stack_allocated_tls_data, sizeof(stack_allocated_tls_data));
  PlatformThreadLocalStorage::SetTLSValue(key, tls_data);
  return tls_data;


The tls_data may leak if there are "rare" error conditions. That can be related to startup/shutdown/memory full or other rare conditions.

If SetTLSValue is failing, the tls_data is leaked since the next call to "ThreadLocalStorage::Slot::Get" will create a new one.
exception.png
37.5 KB View Download
Summary: Memory leak from failures in ConstructTlsVector (was: Potential memory leak due to recursion in GlobalActivityTracker)
Labels: -Pri-3 Pri-1
This should have been filed as a P1.
Why?  By the original report, it only affects builds that have ActivityTracking (i.e. breadcrumbs) enabled which, I believe, doesn't go beyond Dev.
Labels: -Pri-1 Pri-2
Sorry, was doing a bulk sweep on memory bugs that weren't filed at the appropriate priority.

Bumping this down to P2.
Cc: siggi@chromium.org manzagop@chromium.org
Summary: Memory leak from failures in GlobalActivityTracker's ConstructTlsVector (was: Memory leak from failures in ConstructTlsVector)
+manzagop
+siggi
+pamg

Breadcrumbs now also runs on Beta and I thought the plan was to extend it to stable? cc'ing folks who can shed some light here.

I agree that if it's not affecting stable users, then P2 seems appropriate. But this should likely be a blocker to going to stable, if it's indeed causing 200MB memory leaks.
I'm happy to break it into two functions:
- Get() that provides the current functionality
- GetIfExists() that won't create a new one

But if Get() itself can be fixed, that seems a better solution as it would make it "just work".
Cc: gab@chromium.org
The functionality you're asking for exists now, see https://bugs.chromium.org/p/chromium/issues/detail?id=825218#c19 and ThreadLocalStorage::HasBeenDestroyed. 

> But if Get() itself can be fixed, that seems a better solution as it would make it "just work".

See discussion on the bug I linked. Chrome TLS's implementation does not support calls to Get() during thread destruction, after Chrome TLS has been torn down. This is a limitation of the implementation and how we hook into thread teardown on Windows.

I haven't looked to see why GlobalActivityTracker is calling Get() after Chrome TLS destruction. gab@ is the best contact on the base/ side of things.


We should fix the memory leak, but has anyone considered that the exception might be a real issue(TM)? If this is happening during DLL_THREAD_DETACH or its moral equivalents, then the loader's going to swallow and turf the exception.
GlobalActivityTracker is responding to a thrown exception and trying to record the details of such for posterity's sake.
Right, but what I'm saying is that the exception may be due to a bug in Chrome, which would go silently ignored because the loader will swallow exceptions generated at this time in a thread's lifetime. So apart from the memory leak being a bug, we may have a more serious issue in Chrome that's causing this as a byproduct.
Gab, the CL you linked adds HasBeenDestroyed() but what about the case where no TLS segment has yet been created?  I'm thinking that an exception handler should probably not cause it to be created.  If I check HasBeenDestroyed() then that'll only cover only one half of the problem space.

Siggi, you're saying that the exception in this case is occurring but not causing a crash and is so basically going undetected...  right?  If so, there may be breadcrumbs somewhere that already have this exception record even though they crashed later for a different reason.

Comment 20 by gab@chromium.org, Apr 11 2018

I'm inclined to add TLS::Slot::GetIfExists() for such use cases that don't want to create state if not set.
Project Member

Comment 21 by bugdroid1@chromium.org, Apr 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff

commit 2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff
Author: Brian White <bcwhite@chromium.org>
Date: Thu Apr 12 00:53:23 2018

Don't create TLS vector on Get.

Previously, TLS::Slot::Get() would create the TLS vector if it doesn't
yet exist and then return the slot value which of course is null since
it was just created.

Now, just return null instead of creating the vector.  The vector will
be created by Set() if it doesn't exist.

This allows clients to make use of an existing slot in dangerous
situations (such as exception handling) without necessarily creating
the underlying vector at that time.

Bug:  821494 
Change-Id: I404524ec7ccf0da4ea5451eea25b49783a711c04
Reviewed-on: https://chromium-review.googlesource.com/1007682
Commit-Queue: Brian White <bcwhite@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549979}
[modify] https://crrev.com/2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff/base/threading/thread_local_storage.cc

Status: Fixed (was: Available)
That should fix the memory leak from the activity tracker.

Siggi, I suggest you create a new issue to track the cause of the exception.

Sign in to add a comment