New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 877720 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Oct 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Chrome
Pri: 2
Type: Bug



Sign in to add a comment

"cryptohome --action=remove" frequently fails on betty-release

Project Member Reported by derat@chromium.org, Aug 25

Issue description

"cryptohome --action=remove" is failing frequently in betty-release builds when it's run by Tast tests that are performing login. The cryptohome process is segfaulting ( issue 877711 ), but it looks like the root cause is that its D-Bus request to cryptohomed is failing.

In the messages file from the run at http://pantheon/storage/browser/chromeos-image-archive/betty-release/R70-10998.0.0/tast_vm_test_results_2/tast_vm_canary/, I see the following:

2018-08-24T02:34:49.683172-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect
...
2018-08-24T02:34:50.690144-05:00 INFO session_manager[2566]: [INFO:session_manager_main.cc(212)] Will wait 3s for graceful browser exit.
2018-08-24T02:34:50.691722-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect
2018-08-24T02:34:50.695196-05:00 INFO session_manager[2566]: [INFO:session_manager_service.cc(148)] SessionManagerService starting
...
2018-08-24T02:34:51.634190-05:00 WARNING crash_reporter[2690]: [user] Received crash notification for cryptohome[2683] sig 11, user 0 group 0 (developer build - not testing - always dumping)
2018-08-24T02:34:51.637153-05:00 INFO crash_reporter[2690]: State of crashed process [2683]: S (sleeping)
2018-08-24T02:34:51.637887-05:00 INFO crash_reporter[2690]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-08-24T02:34:51.686951-05:00 INFO crash_reporter[2690]: Stored minidump to /var/spool/crash/cryptohome.20180824.023451.2683.dmp
2018-08-24T02:34:51.687785-05:00 INFO crash_reporter[2690]: Leaving core file at /proc/self/fd/5/cryptohome.20180824.023451.2683.core due to developer image
2018-08-24T02:34:51.697815-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect
2018-08-24T02:34:51.698148-05:00 ERR cryptohomed[2435]: Could not open the TPM
2018-08-24T02:34:51.698277-05:00 INFO cryptohomed[2435]: TPM error 0x3126 (Invalid handle): LoadKeyByUuid: failed LoadKeyByUUID
2018-08-24T02:34:51.698922-05:00 WARNING cryptohomed[2435]: Canceled creating cryptohome key - TPM is not ready.
2018-08-24T02:34:51.704367-05:00 WARNING cryptohomed[2435]: set_tpm() missing or disabled TPM provided.
...
2018-08-24T02:34:52.745247-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect
2018-08-24T02:34:52.745373-05:00 ERR cryptohomed[2435]: ConnectContextAsUser: Could not open the TPM
2018-08-24T02:34:52.745811-05:00 ERR cryptohomed[2435]: ReadPCR: Failed to connect to the TPM.
2018-08-24T02:34:52.745881-05:00 WARNING cryptohomed[2435]: Failed to read PCR1.
2018-08-24T02:34:52.745971-05:00 ERR cryptohomed[2435]: stat() of /mnt/stateful_partition/unencrypted/preserve/attestation.epb failed.: No such file or directory
2018-08-24T02:34:52.746064-05:00 ERR cryptohomed[2435]: Failed to read db.: No such file or directory

I'm not sure which of those errors are expected for betty and which indicate a problem.

Looking at Service::DoRemoveEx in cryptohome/service.cc and HomeDirs::Remove in cryptohome/homedirs.cc, it looks like the following can produce errors:

- bad identifier parameter
- empty account ID
- failure to delete various directories

Andrey, any ideas?
 
These are the actual errors:
2018-08-24T02:34:50.691722-05:00 ERR cryptohomed[2435]: TPM error 0x3011 (Communication failure): Error calling Tspi_Context_Connect

Well, w/o looking at full logs, first reaction to the errors from above: betty is a VM and doesn't have a real TPM, that inability to open the TPM device is WAI (either real errors when it tries & fails to find the device, or just built with USE=mocktpm and just returns an error w/o trying). So, these are not real issues.

For crashing cryptohome looks like you have the right CL already.

For why --action=remove fails, need to look at full logs.
After a brief dive into fulll logs: Don't see any cryptohomed errors in the logs.

These lines
2018-08-24T02:40:11.981031-05:00 WARNING cryptohomed[2435]: RemoveKeyset: key to remove not found
normally go shortly after successful logins, and iirc those are WAI: we always attempt to remove the SmartUnlock keyset, letting it just fail if the key doesn't exist (and it doesn't if smartlock is not used).

Other than that (and the fact that there's no TPM) I see no issues from cryptohomed. There are some chrome crashes and some cases when chrome is KILLed instead of controlled termination when the session is closed. But that shouldn't lead to cryptohome --action=remove failures.

Will take another look later.
Ah, I guess, I know the answer. Tests often request removing the directory for a test user at the start of the test not checking first if it exists or not. And --action=remove fails in this case, which is normal. What's not normal is that cryptohome coredumps, but that's solved in  issue 877711 .
Hmm. Experimenting locally on a caroline device with a ToT cryptohome, the behavior seems slightly different. --action=remove reports success when passed a nonexistent user:

  # cryptohome --action=remove --force --user=somebogususer@gmail.com
  Remove succeeded.

The only way I've found to make it fail so far is by passing the currently-logged-in user:

  # cryptohome --action=remove --force --user=testuser@gmail.com
  error: CRYPTOHOME_ERROR_REMOVE_FAILED
  Remove failed.

I think that tests generally avoid this by restarting session_manager before wiping the test user's directory.
After https://crrev.com/c/1188801 goes in to fix the segfault, we'll hopefully know more. The cryptohome command is logging the D-Bus error to stdout, but I suspect that it's currently getting dropped due to the process crashing before stdout is flushed.
Components: OS>Systems>Security
Shall we close this?
Status: WontFix (was: Assigned)
Thanks for the ping! Yes, I haven't seen this happening anymore.

Sign in to add a comment