Memory leak from failures in GlobalActivityTracker's ConstructTlsVector |
|||||||||
Issue description
Chrome Version: 67.0.3360.0
OS: Windows
Out of Process Heap Profiling is a project that, for a very small set users, records heap allocations and sends back anonymized heap dumps. We've obtained a heap dump that suggests a memory leak due to a recursion in GlobalActivityTracker related to the VectoredExceptionHandler.
As shows in the attached images, there are 30209 objects allocated and still alive from the following stackframe:
malloc
operator new(unsigned __int64)
`anonymous namespace'::ConstructTlsVector
base::ThreadLocalStorage::Slot::Get()
base::debug::GlobalActivityTracker::RecordException(...)
browser_watcher::`anonymous namespace'::VectoredExceptionHandler
This is leaking about 150M of memory.
We received 4 slow-reports today, all from the same version: windows 67.0.3360.0
76b5d6affd92ed00
5c144f9b0e998d85
860e2e7a31ddee14
c781c67352f4675d
I suspect there is a infinite loop (or recursion) in this code:
https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?type=cs&sq=package:chromium&l=158
TlsVectorEntry* ConstructTlsVector() {
[...]
// Allocate an array to store our data.
TlsVectorEntry* tls_data = new TlsVectorEntry[kThreadLocalStorageSize];
memcpy(...);
PlatformThreadLocalStorage::SetTLSValue(key, tls_data);
return tls_data;
The tls_data is probably the object leaked. If an exception occurs when calling |SetTLSValue|, then the vectored exception handler is called again and a new instance of tls_data is allocated again (and leaked).
,
Mar 13 2018
,
Mar 13 2018
The simple fix would be to not create a TLS block if one doesn't already exist but there's no way for outside code to do that.
I'd have to add a parameter to Get() as so:
void* ThreadLocalStorage::Slot::Get(bool create_if_necessary=true) const {...}
and then pass "false" in my call.
+mark, as a base/threading OWNER, what do you think of that?
,
Mar 13 2018
Instead of ThreadLocalStorage::Slot::Get We should have both semantic: ThreadLocalStorage::Slot::GetIfExist ThreadLocalStorage::Slot::GetOrCreate But, that will be hard to change at large.
,
Mar 13 2018
The other option is to have: ThreadLocalStorage::Slot::DoesExist() which could be called first if important.
,
Mar 13 2018
thread_local_storage.h has instructions for a thread-safe lazily-allocated slot: https://chromium.googlesource.com/chromium/src/+/fd6d802597d97fa206271a941575f0ebb5fd8df8/base/threading/thread_local_storage.h#99 99 // A key representing one value stored in TLS. Use as a class member or a 100 // local variable. If you need a static storage duration variable, use the 101 // following pattern with a NoDestructor<Slot>: 102 // void MyDestructorFunc(void* value); 103 // ThreadLocalStorage::Slot& ImportantContentTLS() { 104 // static NoDestructor<ThreadLocalStorage::Slot> important_content_tls( 105 // &MyDestructorFunc); 106 // return *important_content_tls; 107 // } or have I misunderstood the question?
,
Mar 14 2018
In this case, it's the TLS block itself that is being allocated multiple times, not the slots within it. Slot::Get() always creates the block if it doesn't already exist but this can be a problem inside an exception handler if creating the block is what causes the exception.
,
Mar 19 2018
We are getting a similar bug with the following stackframe: (anonymous namespace)::ConstructTlsVector() base::ThreadLocalStorage::Slot::Get() const profiling::(anonymous namespace)::HookFreeDefiniteSize(base::allocator::AllocatorDispatch const*, void*, unsigned long, void*) <...> I think we should have a better error handling for this code: // Allocate an array to store our data. TlsVectorEntry* tls_data = new TlsVectorEntry[kThreadLocalStorageSize]; memcpy(tls_data, stack_allocated_tls_data, sizeof(stack_allocated_tls_data)); PlatformThreadLocalStorage::SetTLSValue(key, tls_data); return tls_data; The tls_data may leak if there are "rare" error conditions. That can be related to startup/shutdown/memory full or other rare conditions. If SetTLSValue is failing, the tls_data is leaked since the next call to "ThreadLocalStorage::Slot::Get" will create a new one.
,
Mar 19 2018
,
Mar 19 2018
This should have been filed as a P1.
,
Mar 19 2018
Why? By the original report, it only affects builds that have ActivityTracking (i.e. breadcrumbs) enabled which, I believe, doesn't go beyond Dev.
,
Mar 22 2018
Sorry, was doing a bulk sweep on memory bugs that weren't filed at the appropriate priority. Bumping this down to P2.
,
Apr 6 2018
+manzagop +siggi +pamg Breadcrumbs now also runs on Beta and I thought the plan was to extend it to stable? cc'ing folks who can shed some light here. I agree that if it's not affecting stable users, then P2 seems appropriate. But this should likely be a blocker to going to stable, if it's indeed causing 200MB memory leaks.
,
Apr 9 2018
I'm happy to break it into two functions: - Get() that provides the current functionality - GetIfExists() that won't create a new one But if Get() itself can be fixed, that seems a better solution as it would make it "just work".
,
Apr 9 2018
The functionality you're asking for exists now, see https://bugs.chromium.org/p/chromium/issues/detail?id=825218#c19 and ThreadLocalStorage::HasBeenDestroyed. > But if Get() itself can be fixed, that seems a better solution as it would make it "just work". See discussion on the bug I linked. Chrome TLS's implementation does not support calls to Get() during thread destruction, after Chrome TLS has been torn down. This is a limitation of the implementation and how we hook into thread teardown on Windows. I haven't looked to see why GlobalActivityTracker is calling Get() after Chrome TLS destruction. gab@ is the best contact on the base/ side of things.
,
Apr 9 2018
We should fix the memory leak, but has anyone considered that the exception might be a real issue(TM)? If this is happening during DLL_THREAD_DETACH or its moral equivalents, then the loader's going to swallow and turf the exception.
,
Apr 9 2018
GlobalActivityTracker is responding to a thrown exception and trying to record the details of such for posterity's sake.
,
Apr 9 2018
Right, but what I'm saying is that the exception may be due to a bug in Chrome, which would go silently ignored because the loader will swallow exceptions generated at this time in a thread's lifetime. So apart from the memory leak being a bug, we may have a more serious issue in Chrome that's causing this as a byproduct.
,
Apr 10 2018
Gab, the CL you linked adds HasBeenDestroyed() but what about the case where no TLS segment has yet been created? I'm thinking that an exception handler should probably not cause it to be created. If I check HasBeenDestroyed() then that'll only cover only one half of the problem space. Siggi, you're saying that the exception in this case is occurring but not causing a crash and is so basically going undetected... right? If so, there may be breadcrumbs somewhere that already have this exception record even though they crashed later for a different reason.
,
Apr 11 2018
I'm inclined to add TLS::Slot::GetIfExists() for such use cases that don't want to create state if not set.
,
Apr 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff commit 2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff Author: Brian White <bcwhite@chromium.org> Date: Thu Apr 12 00:53:23 2018 Don't create TLS vector on Get. Previously, TLS::Slot::Get() would create the TLS vector if it doesn't yet exist and then return the slot value which of course is null since it was just created. Now, just return null instead of creating the vector. The vector will be created by Set() if it doesn't exist. This allows clients to make use of an existing slot in dangerous situations (such as exception handling) without necessarily creating the underlying vector at that time. Bug: 821494 Change-Id: I404524ec7ccf0da4ea5451eea25b49783a711c04 Reviewed-on: https://chromium-review.googlesource.com/1007682 Commit-Queue: Brian White <bcwhite@chromium.org> Reviewed-by: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/heads/master@{#549979} [modify] https://crrev.com/2f86076a210e6f38d7f4cbadc9fd2d0e3899f4ff/base/threading/thread_local_storage.cc
,
Apr 12 2018
That should fix the memory leak from the activity tracker. Siggi, I suggest you create a new issue to track the cause of the exception. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by etienneb@chromium.org
, Mar 13 2018