Seeing TSAN lock order inversion reports when running a TSAN-enabled cryptohomed build:
==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=25813)
Cycle in lock order graph: M18 (0x7b0c00000608) => M83 (0x7b1c00000208) => M18
Mutex M83 acquired here while holding mutex M18 in main thread:
#0 0x5555555cc9af in pthread_mutex_lock ??:0:0
#1 0x7ffff72fa1ef in get_table_entry /build/cyan/tmp/portage/app-crypt/trousers-9999/work/trousers-9999/src/tspi/rpc/hosttable.c:173:3
#2 0x555555778fce in cryptohome::TpmImpl::GetVersionInfo(cryptohome::Tpm::TpmVersionInfo*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_impl.cc:3067:8
#3 0x55555575fe46 in cryptohome::TpmInit::SetupTpm(bool) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_init.cc:175:20
#4 0x55555564d7b1 in cryptohome::Crypto::Init(cryptohome::TpmInit*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/crypto.cc:128:16
#5 0x5555556216e6 in cryptohome::Service::Initialize() /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/service.cc:523:17
#6 0x55555561be3d in main /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/cryptohomed.cc:82:17
#7 0x7ffff66d3a94 in __libc_start_main /var/tmp/portage/cross-x86_64-cros-linux-gnu/glibc-2.27-r6/work/glibc-2.27/csu/../csu/libc-start.c:308:0
Hint: use TSAN OPTIONS=second deadlock stack=1 to get more informative warning message
Mutex M18 acquired here while holding mutex M83 in main thread:
#0 0x5555555cc9af in pthread_mutex_lock ??:0:0
#1 0x7ffff72fa11a in remove_table_entry /build/cyan/tmp/portage/app-crypt/trousers-9999/work/trousers-9999/src/tspi/rpc/hosttable.c:137:2
#2 0x55555575fe46 in cryptohome::TpmInit::SetupTpm(bool) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_init.cc:175:20
#3 0x55555564d7b1 in cryptohome::Crypto::Init(cryptohome::TpmInit*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/crypto.cc:128:16
#4 0x5555556216e6 in cryptohome::Service::Initialize() /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/service.cc:523:17
#5 0x55555561be3d in main /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/cryptohomed.cc:82:17
#6 0x7ffff66d3a94 in __libc_start_main /var/tmp/portage/cross-x86_64-cros-linux-gnu/glibc-2.27-r6/work/glibc-2.27/csu/../csu/libc-start.c:308:0
SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/usr/sbin/cryptohomed+0x789ae)
==================
Looking at it some more, I believe the problematic sequence is a CloseContext operation on one thread having acquired the context lock, while another thread is trying to lock that context and queues for the context lock while holding the table lock. The CloseContext operation would now attempt to remove the context from the table, and attempts to acquire the table lock for doing so -> deadlock.
Fortunately, we don't routinely destroy contexts that are accessed from multiple threads (it looks like we have one long-lived context used across multiple threads which we close only at shutdown and other contexts that are created/destroyed for servicing a single Tpm method on a specific thread).
Comment 1 by mnissler@chromium.org
, Nov 27