New issue
Advanced search Search tips

Issue 908845 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Potential deadlocks in trousers due to lock order inversions

Project Member Reported by mnissler@chromium.org, Nov 27

Issue description

Seeing TSAN lock order inversion reports when running a TSAN-enabled cryptohomed build:

==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=25813)
  Cycle in lock order graph: M18 (0x7b0c00000608) => M83 (0x7b1c00000208) => M18

  Mutex M83 acquired here while holding mutex M18 in main thread:
    #0 0x5555555cc9af in pthread_mutex_lock ??:0:0
    #1 0x7ffff72fa1ef in get_table_entry /build/cyan/tmp/portage/app-crypt/trousers-9999/work/trousers-9999/src/tspi/rpc/hosttable.c:173:3
    #2 0x555555778fce in cryptohome::TpmImpl::GetVersionInfo(cryptohome::Tpm::TpmVersionInfo*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_impl.cc:3067:8
    #3 0x55555575fe46 in cryptohome::TpmInit::SetupTpm(bool) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_init.cc:175:20
    #4 0x55555564d7b1 in cryptohome::Crypto::Init(cryptohome::TpmInit*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/crypto.cc:128:16
    #5 0x5555556216e6 in cryptohome::Service::Initialize() /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/service.cc:523:17
    #6 0x55555561be3d in main /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/cryptohomed.cc:82:17
    #7 0x7ffff66d3a94 in __libc_start_main /var/tmp/portage/cross-x86_64-cros-linux-gnu/glibc-2.27-r6/work/glibc-2.27/csu/../csu/libc-start.c:308:0

    Hint: use TSAN OPTIONS=second deadlock stack=1 to get more informative warning message

  Mutex M18 acquired here while holding mutex M83 in main thread:
    #0 0x5555555cc9af in pthread_mutex_lock ??:0:0
    #1 0x7ffff72fa11a in remove_table_entry /build/cyan/tmp/portage/app-crypt/trousers-9999/work/trousers-9999/src/tspi/rpc/hosttable.c:137:2
    #2 0x55555575fe46 in cryptohome::TpmInit::SetupTpm(bool) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/tpm_init.cc:175:20
    #3 0x55555564d7b1 in cryptohome::Crypto::Init(cryptohome::TpmInit*) /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/crypto.cc:128:16
    #4 0x5555556216e6 in cryptohome::Service::Initialize() /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/service.cc:523:17
    #5 0x55555561be3d in main /build/cyan/var/cache/portage/chromeos-base/cryptohome/out/Default/../../../../../../../tmp/portage/chromeos-base/cryptohome-9999/work/cryptohome-9999/platform2/cryptohome/cryptohomed.cc:82:17
    #6 0x7ffff66d3a94 in __libc_start_main /var/tmp/portage/cross-x86_64-cros-linux-gnu/glibc-2.27-r6/work/glibc-2.27/csu/../csu/libc-start.c:308:0

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/usr/sbin/cryptohomed+0x789ae)
==================

Looking at it some more, I believe the problematic sequence is a CloseContext operation on one thread having acquired the context lock, while another thread is trying to lock that context and queues for the context lock while holding the table lock. The CloseContext operation would now attempt to remove the context from the table, and attempts to acquire the table lock for doing so -> deadlock.

Fortunately, we don't routinely destroy contexts that are accessed from multiple threads (it looks like we have one long-lived context used across multiple threads which we close only at shutdown and other contexts that are created/destroyed for servicing a single Tpm method on a specific thread).

 
Here's a CL that seems to fix the issue for me (entirely untested otherwise): https://chromium-review.googlesource.com/c/chromiumos/third_party/trousers/+/1352276
Cc: zuan@chromium.org
Labels: Cros-Hwsec-Ready

Sign in to add a comment