"cryptohome --action=remove" frequently fails on betty-release |
|||
Issue description"cryptohome --action=remove" is failing frequently in betty-release builds when it's run by Tast tests that are performing login. The cryptohome process is segfaulting ( issue 877711 ), but it looks like the root cause is that its D-Bus request to cryptohomed is failing. In the messages file from the run at http://pantheon/storage/browser/chromeos-image-archive/betty-release/R70-10998.0.0/tast_vm_test_results_2/tast_vm_canary/, I see the following: 2018-08-24T02:34:49.683172-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect ... 2018-08-24T02:34:50.690144-05:00 INFO session_manager[2566]: [INFO:session_manager_main.cc(212)] Will wait 3s for graceful browser exit. 2018-08-24T02:34:50.691722-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect 2018-08-24T02:34:50.695196-05:00 INFO session_manager[2566]: [INFO:session_manager_service.cc(148)] SessionManagerService starting ... 2018-08-24T02:34:51.634190-05:00 WARNING crash_reporter[2690]: [user] Received crash notification for cryptohome[2683] sig 11, user 0 group 0 (developer build - not testing - always dumping) 2018-08-24T02:34:51.637153-05:00 INFO crash_reporter[2690]: State of crashed process [2683]: S (sleeping) 2018-08-24T02:34:51.637887-05:00 INFO crash_reporter[2690]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5' 2018-08-24T02:34:51.686951-05:00 INFO crash_reporter[2690]: Stored minidump to /var/spool/crash/cryptohome.20180824.023451.2683.dmp 2018-08-24T02:34:51.687785-05:00 INFO crash_reporter[2690]: Leaving core file at /proc/self/fd/5/cryptohome.20180824.023451.2683.core due to developer image 2018-08-24T02:34:51.697815-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect 2018-08-24T02:34:51.698148-05:00 ERR cryptohomed[2435]: Could not open the TPM 2018-08-24T02:34:51.698277-05:00 INFO cryptohomed[2435]: TPM error 0x3126 (Invalid handle): LoadKeyByUuid: failed LoadKeyByUUID 2018-08-24T02:34:51.698922-05:00 WARNING cryptohomed[2435]: Canceled creating cryptohome key - TPM is not ready. 2018-08-24T02:34:51.704367-05:00 WARNING cryptohomed[2435]: set_tpm() missing or disabled TPM provided. ... 2018-08-24T02:34:52.745247-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect 2018-08-24T02:34:52.745373-05:00 ERR cryptohomed[2435]: ConnectContextAsUser: Could not open the TPM 2018-08-24T02:34:52.745811-05:00 ERR cryptohomed[2435]: ReadPCR: Failed to connect to the TPM. 2018-08-24T02:34:52.745881-05:00 WARNING cryptohomed[2435]: Failed to read PCR1. 2018-08-24T02:34:52.745971-05:00 ERR cryptohomed[2435]: stat() of /mnt/stateful_partition/unencrypted/preserve/attestation.epb failed.: No such file or directory 2018-08-24T02:34:52.746064-05:00 ERR cryptohomed[2435]: Failed to read db.: No such file or directory I'm not sure which of those errors are expected for betty and which indicate a problem. Looking at Service::DoRemoveEx in cryptohome/service.cc and HomeDirs::Remove in cryptohome/homedirs.cc, it looks like the following can produce errors: - bad identifier parameter - empty account ID - failure to delete various directories Andrey, any ideas?
,
Aug 25
After a brief dive into fulll logs: Don't see any cryptohomed errors in the logs. These lines 2018-08-24T02:40:11.981031-05:00 WARNING cryptohomed[2435]: RemoveKeyset: key to remove not found normally go shortly after successful logins, and iirc those are WAI: we always attempt to remove the SmartUnlock keyset, letting it just fail if the key doesn't exist (and it doesn't if smartlock is not used). Other than that (and the fact that there's no TPM) I see no issues from cryptohomed. There are some chrome crashes and some cases when chrome is KILLed instead of controlled termination when the session is closed. But that shouldn't lead to cryptohome --action=remove failures. Will take another look later.
,
Aug 25
Ah, I guess, I know the answer. Tests often request removing the directory for a test user at the start of the test not checking first if it exists or not. And --action=remove fails in this case, which is normal. What's not normal is that cryptohome coredumps, but that's solved in issue 877711 .
,
Aug 25
Hmm. Experimenting locally on a caroline device with a ToT cryptohome, the behavior seems slightly different. --action=remove reports success when passed a nonexistent user: # cryptohome --action=remove --force --user=somebogususer@gmail.com Remove succeeded. The only way I've found to make it fail so far is by passing the currently-logged-in user: # cryptohome --action=remove --force --user=testuser@gmail.com error: CRYPTOHOME_ERROR_REMOVE_FAILED Remove failed. I think that tests generally avoid this by restarting session_manager before wiping the test user's directory.
,
Aug 25
After https://crrev.com/c/1188801 goes in to fix the segfault, we'll hopefully know more. The cryptohome command is logging the D-Bus error to stdout, but I suspect that it's currently getting dropped due to the process crashing before stdout is flushed.
,
Oct 12
,
Oct 31
Shall we close this?
,
Oct 31
Thanks for the ping! Yes, I haven't seen this happening anymore. |
|||
►
Sign in to add a comment |
|||
Comment 1 by apronin@google.com
, Aug 25