This is the 3rd time I stumble upon this (CrOS wide profiler, legacy heap profiler, oop heap profiler) so I feel it's worth writing down on a persistent medium
TL;DR
If you use thread-local storage for a heap profiler you really really want to make sure that the pthread_key_create() call (which in Chrome land translates to ThreadLocalStorage::Slot::ctor) happens as early as possible
erikchen: I think you want to make sure that all the TLS things that you use in the heap profiler are initialized early in chrome_main_delegate.cc or similar
dcheng: I think this should be mentioned in the documentation of the various base/thread_local because is quite subtle.
Full story:
When hooking malloc it is very common to have a thread-local storage variable to detect re-entrancy and keep around some context/structs.
https://chromium-review.googlesource.com/c/chromium/src/+/727466 is a very good example for that.
thread-local storage is extremely subtle on Linux/glibc because it can malloc.
Specifically, the implementation shipped in ubuntu at least up to Trusty (didn't check xenial) guarantees that no malloc happens *only* for the first PTHREAD_KEY_2NDLEVEL_SIZE keys.
When the malloc in libc happens, the following reentrancy will happen (ironically, while trying to detect reentrancy)
- some_code calls malloc()
- That enters the malloc hook
- the malloc hook has a TLS like tls_are_we_reentering_the_hook
- the TLS does a pthread_set_specific when initializing itself on the thread for the first time
- pthread_set_specific calls malloc
- the story repeats because we still don't have a tls.
- the stack explodes at some point.
A possible solution is here: https://codereview.chromium.org/2023133003/
essentially make sure that the pthread_key_create is called very early. that will create a slot < PTHREAD_KEY_2NDLEVEL_SIZE which doesn't hit the malloc in libc.
For the records, pthread_key_create is the global thing that has to be called once-per-process for TLS slot (essentially to get the priviledge of using TLS for a specific variable); pthread_set_specific is the thing that has to be called per-thread to set the TLS object.
pthread_set_specific is the thing that has the evil re-entrant malloc
pthread_key_create is the thing that needs to be called early to avoid the re-entrant malloc.
Now, teleporting this to base
pthread_key_create ==> ThreadLocal::Slot::ctor
pthread_set_specific ==> ThreadLocal::Slot::Set()
From: https://github.com/lattera/glibc/blob/master/nptl/pthread_setspecific.c#L72
int
__pthread_setspecific (key, value)
pthread_key_t key;
const void *value;
{
...
idx1st = key / PTHREAD_KEY_2NDLEVEL_SIZE;
idx2nd = key % PTHREAD_KEY_2NDLEVEL_SIZE;
/* This is the second level array. Allocate it if necessary. */
level2 = THREAD_GETMEM_NC (self, specific, idx1st);
if (level2 == NULL)
{
if (value == NULL)
/* We don't have to do anything. The value would in any case
be NULL. We can save the memory allocation. */
return 0;
level2
= (struct pthread_key_data *) calloc (PTHREAD_KEY_2NDLEVEL_SIZE,
sizeof (*level2));
if (level2 == NULL)
return ENOMEM;
THREAD_SETMEM_NC (self, specific, idx1st, level2);
}
/* Pointer to the right array element. */
level2 = &level2[idx2nd];
/* Remember that we stored at least one set of data. */
THREAD_SETMEM (self, specific_used, true);
}