cryptohomed broken on some DUTs due to interrupted TPM initialization |
||||||||||||
Issue descriptionThe ui.ChromeLogin and ui.ChromeCrashLoggedIn Tast tests both timed out in the rikku-release build at http://stainless/browse/chromeos-autotest-results/217503455-chromeos-test/ while waiting for user directories to be mounted: 2018/07/15 08:43:56 Started test ui.ChromeCrashLoggedIn 2018/07/15 08:43:56 [08:43:56.060] Restarting ui job 2018/07/15 08:43:57 [08:43:56.542] Waiting for org.chromium.SessionManager D-Bus service 2018/07/15 08:43:57 [08:43:56.557] Asking session_manager to enable Chrome testing 2018/07/15 08:43:57 [08:43:56.560] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/07/15 08:43:58 [08:43:57.273] Removing cryptohome for testuser@gmail.com 2018/07/15 08:43:58 [08:43:57.319] Finding OOBE DevTools target 2018/07/15 08:43:58 [08:43:57.643] Connecting to Chrome at ws://127.0.0.1:35071/devtools/page/9C5F22C8CDE07F9123CC40EF1D9F6368 2018/07/15 08:43:58 [08:43:57.736] Waiting for OOBE 2018/07/15 08:44:00 [08:44:00.014] Logging in as user "testuser@gmail.com" 2018/07/15 08:44:00 [08:44:00.077] Waiting for cryptohome /home/user/c37077ca708067994b028ceecfa10fc34571a404 2018/07/15 08:44:56 [08:44:56.065] Error at chrome_crash_logged_in.go:24: Chrome login failed: /home/user/c37077ca708067994b028ceecfa10fc34571a404 not mounted: context deadline exceeded 2018/07/15 08:44:58 Started test ui.ChromeLogin 2018/07/15 08:44:58 [08:44:58.039] Restarting ui job 2018/07/15 08:44:59 [08:44:58.485] Waiting for org.chromium.SessionManager D-Bus service 2018/07/15 08:44:59 [08:44:58.502] Asking session_manager to enable Chrome testing 2018/07/15 08:44:59 [08:44:58.504] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/07/15 08:45:00 [08:44:59.257] Removing cryptohome for testuser@gmail.com 2018/07/15 08:45:00 [08:44:59.299] Finding OOBE DevTools target 2018/07/15 08:45:00 [08:44:59.640] Connecting to Chrome at ws://127.0.0.1:49143/devtools/page/17771F1A6A3DAD8D1648B7697C3537F9 2018/07/15 08:45:00 [08:44:59.697] Waiting for OOBE 2018/07/15 08:45:03 [08:45:02.114] Logging in as user "testuser@gmail.com" 2018/07/15 08:45:03 [08:45:02.221] Waiting for cryptohome /home/user/c37077ca708067994b028ceecfa10fc34571a404 2018/07/15 08:45:58 [08:45:58.043] Error at chrome_login.go:69: Chrome login failed: /home/user/c37077ca708067994b028ceecfa10fc34571a404 not mounted: context deadline exceeded In both cases, I see suspicious-seeming cryptohomed errors in the messages file: 2018-07-15T08:44:00.068466-07:00 INFO cryptohomed[800]: HasAuthorization: TPM Owner password not available. 2018-07-15T08:44:00.068487-07:00 ERR cryptohomed[800]: Destroy() called with insufficient authorization. 2018-07-15T08:44:00.389144-07:00 INFO session_manager[5478]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing! 2018-07-15T08:45:02.210595-07:00 INFO cryptohomed[800]: HasAuthorization: TPM Owner password not available. 2018-07-15T08:45:02.210615-07:00 ERR cryptohomed[800]: Destroy() called with insufficient authorization. 2018-07-15T08:45:02.516675-07:00 INFO session_manager[6207]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing! In a different successful ui.ChromeLogin run on a rikku DUT, I see messages like these from cryptohomed instead: 2018-07-16T08:11:58.112420-07:00 INFO cryptohomed[835]: Migrated (or created) user directory: /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount 2018-07-16T08:11:58.113796-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/Cache 2018-07-16T08:11:58.114023-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/Downloads 2018-07-16T08:11:58.114219-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/GCache 2018-07-16T08:11:58.115666-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/GCache/v1 2018-07-16T08:11:58.116268-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/GCache/v2 2018-07-16T08:11:58.116839-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/GCache/v1/blobs 2018-07-16T08:11:58.117467-07:00 INFO cryptohomed[835]: Creating pass-through directories /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/GCache/v1/tmp 2018-07-16T08:11:58.118710-07:00 INFO cryptohomed[835]: RecursiveCopy: /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/log 2018-07-16T08:11:58.119253-07:00 INFO cryptohomed[835]: RecursiveCopy: /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/.ssh 2018-07-16T08:11:58.119770-07:00 INFO cryptohomed[835]: RecursiveCopy: /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/.pki 2018-07-16T08:11:58.120265-07:00 INFO cryptohomed[835]: RecursiveCopy: /home/.shadow/a62b9c49d923f441cf2a9e6e872563f83c68f682/mount/user/.pki/nssdb Presumably something on the DUT was in bad shape when these tests ran. Maybe an earlier test left them in a weird state. cryptohome owners, any guesses about what might be happening here and how other tests can work around it?
,
Jul 17
Looks like this is /var/log/messages from the device.
,
Jul 17
There is no encstateful key space defined in nvram (and no signs of communication errors from the driver + no encstateful key on disk, which would appear in case of tpm initialization interrupted before it is written to tpm). And there are no other objects that are typically persisted after tpm initialization - specifically, no salting key. Still the TPM is owned, what leads to the initialization errors. [mount-encrypted didn't find encstateful key in TPM and wiped encstateful => on-disk tpm state, including owner password temporarily stored in case of interrupted initializaton, is lost] 2018-07-15T14:45:00.891803+00:00 NOTICE mount-encrypted[1562]: [pid:270] VFS mount state sanity check ok. 2018-07-15T14:45:00.891814+00:00 NOTICE mount-encrypted[1562]: [0715/144450:INFO:tpm.cc(304)] TPM ready 2018-07-15T14:45:00.891825+00:00 NOTICE mount-encrypted[1562]: [0715/144450:INFO:tpm.cc(459)] No encstateful NVRAM area defined. 2018-07-15T14:45:00.891835+00:00 NOTICE mount-encrypted[1562]: [0715/144450:ERROR:tpm1.cc(495)] encstateful space contents absent or too short. 2018-07-15T14:45:00.891845+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:tpm.cc(103)] NVRAM area has been defined but not written. 2018-07-15T14:45:00.891855+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:tpm.cc(103)] NVRAM area has been defined but not written. 2018-07-15T14:45:00.891865+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:tpm.cc(437)] No Lockbox NVRAM area defined. 2018-07-15T14:45:00.891875+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:encryption_key.cc(169)] Using NVRAM as system key; finalization needed. 2018-07-15T14:45:00.891888+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:encryption_key.cc(233)] Attempting to generate fresh NVRAM system key. 2018-07-15T14:45:00.891898+00:00 NOTICE mount-encrypted[1562]: [0715/144451:ERROR:tpm1.cc(495)] encstateful space contents absent or too short. 2018-07-15T14:45:00.891908+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:encryption_key.cc(267)] No usable system key found. 2018-07-15T14:45:00.891918+00:00 NOTICE mount-encrypted[1562]: [0715/144451:ERROR:encryption_key.cc(51)] Data read failed from /mnt/stateful_partition/encrypted.needs-finalization 2018-07-15T14:45:00.891928+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:encryption_key.cc(280)] Generating new encryption key. [No salting key in TPM nvram] 2018-07-15T14:44:54.318434+00:00 INFO cryptohomed[895]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2018-07-15T14:44:54.318513+00:00 WARNING cryptohomed[895]: Canceled creating cryptohome key - TPM is not ready. ... [InstallAttributes are not ready] 2018-07-15T14:44:55.537465+00:00 INFO cryptohomed[895]: Load() called prior to a successful Store() 2018-07-15T14:44:55.537492+00:00 INFO cryptohomed[895]: Resuming interrupted InstallAttributes. (Store needed.) ... [Attempt to delete FWMP fails] 2018-07-15T14:46:28.358027+00:00 INFO cryptohomed[895]: HasAuthorization: TPM Owner password not available. 2018-07-15T14:46:28.358058+00:00 ERR cryptohomed[895]: Destroy() called with insufficient authorization. ... [Attempt to own with new password fails] 2018-07-15T15:01:37.142438+00:00 INFO kernel: [ 1014.247697] tpm_tis tpm_tis: command 0x66 (size 55) returned code 0x1 [GetCapabilityOwner -> AUTHFAIL] 2018-07-15T15:01:37.172403+00:00 INFO kernel: [ 1014.277667] tpm_tis tpm_tis: command 0xba (size 18) returned code 0x22 2018-07-15T15:01:37.178295+00:00 INFO cryptohomed[895]: Configuring TPM, ownership taken: 0. 2018-07-15T15:01:39.703898+00:00 INFO cryptohomed[895]: HasAuthorization: TPM Owner password not available.
,
Jul 17
err correction, it's TPM 1.2, so, of course, no salting key there. :) what it can't find is the legacy cryptohome key in that failed LoadKeyByUuid... so, the only part that's definitely missing from TPM is the encstateful nvram space.
,
Jul 17
Thanks for digging into this, Andrey! Will having a missing encstateful key prevent logins until the DUT is reimaged (or even longer)? And if so, is there any way for tests to detect and recover from this state? Just guessing: Will GetStatusString report tpm.owned as true but tpm.has_cryptohome_key as false, or something similar?
,
Jul 18
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/2c85a6582b4d92d09171722f2a6f09b2faf2117f commit 2c85a6582b4d92d09171722f2a6f09b2faf2117f Author: Daniel Erat <derat@chromium.org> Date: Wed Jul 18 11:08:10 2018 tast-tests: Make cryptohome.WaitForUserMount log status. Make cryptohome.WaitForUserMount log the output of "cryptohome --action=status" if the user dir fails to show up. This is intended to aid in debugging a TPM issue that's making some tests fail. Also update cryptohome.RemoveUserDir to honor the passed context's deadline. BUG= chromium:864282 TEST=manual: used short timeout to force failure and verified that status is logged Change-Id: Ifdd2e65bb7166e5b620b8e74423bdd70a68fe9e8 Reviewed-on: https://chromium-review.googlesource.com/1141334 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/2c85a6582b4d92d09171722f2a6f09b2faf2117f/src/chromiumos/tast/local/cryptohome/cryptohome.go
,
Jul 21
Andrey, here's the cryptohome status from a failed run at http://stainless/browse/chromeos-autotest-results/219248229-chromeos-test/ : 2018/07/20 21:45:25 Started test ui.ChromeCrashLoggedIn 2018/07/20 21:45:25 [21:45:25.000] Restarting ui job 2018/07/20 21:45:26 [21:45:25.498] Waiting for org.chromium.SessionManager D-Bus service 2018/07/20 21:45:26 [21:45:25.512] Asking session_manager to enable Chrome testing 2018/07/20 21:45:26 [21:45:25.513] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/07/20 21:45:26 [21:45:26.234] Removing cryptohome for testuser@gmail.com 2018/07/20 21:45:26 [21:45:26.273] Finding OOBE DevTools target 2018/07/20 21:45:27 [21:45:26.518] Connecting to Chrome at ws://127.0.0.1:58280/devtools/page/314BA59B593934F4505FE0CAA3588ED5 2018/07/20 21:45:27 [21:45:26.635] Waiting for OOBE 2018/07/20 21:45:29 [21:45:28.885] Logging in as user "testuser@gmail.com" 2018/07/20 21:45:29 [21:45:28.950] Waiting for cryptohome /home/user/9003073363b3235a481f0a73b4dc4bbf6640a6c9 2018/07/20 21:46:22 [21:46:22.043] cryptohome status: { "installattrs": { "first_install": false, "initialized": false, "invalid": false, "lockbox_index": 536870916, "lockbox_nvram_version": 2, "secure": false, "size": 0, "version": 1 }, "mounts": [ ], "tpm": { "being_owned": false, "can_connect": true, "can_decrypt": false, "can_encrypt": false, "can_load_srk": false, "can_load_srk_pubkey": false, "enabled": true, "has_context": true, "has_cryptohome_key": false, "has_key_handle": false, "last_error": 8224, "owned": false } } 2018/07/20 21:46:22 [21:46:22.043] Error at chrome_crash_logged_in.go:24: Chrome login failed: /home/user/9003073363b3235a481f0a73b4dc4bbf6640a6c9 not mounted: context deadline exceeded Please let me know if there's anything I can do from within the test code to detect the problem and repair the system.
,
Jul 25
Yes, looked at /var/log/messages as well (generally, messages is more informative for this kind of issues). The TPM is not owned at boot, and by the time the test runs yet (and the output of cryptohome --action=status in #7 tells the same). Here's when the test calls attestation --action=status (note it's still on GMT TZ, so 7 hrs difference in datetimes): 2018-07-21T04:41:12.830257+00:00 NOTICE ag[2504]: autotest server[stack::__init__|_get_cryptohome_status|run] -> ssh_run(cryptohome --action=status) ... 2018-07-21T04:43:57.462046+00:00 NOTICE ag[10787]: autotest server[stack::__init__|_get_cryptohome_status|run] -> ssh_run(cryptohome --action=status) And here's when the ownership is taken: 2018-07-21T04:46:27.442899+00:00 ERR cryptohomed[890]: Taking TPM ownership took 2650ms and evetyually full initialization completes: 2018-07-21T04:46:35.426799+00:00 INFO cryptohomed[890]: Attestation: Prepared successfully (7270 ms). Also, chrome crashes at this point: 2018-07-21T04:46:23.387505+00:00 WARNING crash_reporter[12357]: [user] Received crash notification for chrome[12270] sig 11, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2018-07-21T04:46:23.411299+00:00 INFO session_manager[12253]: [INFO:session_manager_service.cc(297)] Browser process 12270 exited with signal 11 (Segmentation fault) And only after it restarts we start taking TPM ownership (note: Chrome needs to explicitly tell to start taking the ownership - and that happens either during OOBE or after the user logs in). So, it's possible the test doesn't know how to handle the situation of an unowned TPM. If that should never happen, jobs like Repair should make sure that the TPM is left in the owned state before Provision & then tests are run. If that's ok to see the test running on a freshly powerwashed/repaired device, then it should make sure that Chrome instructs cryptohomed to own the TPM and wait till the process is over [it's possible that Repair does reset the TPM owner but doesn't wipe the disk, so doesn't delete /home/chronos/.oobe_completed, for example, which prevents Chrome from doing OOBE on an unowned machine on start; or that the telemetry user session that autotest initiates doesn't do OOBE in any case] Or, alternatively, check if the TPM is owned & initiate it if not through cryptohome CLI interface. And since there's more than one test, it should probably be a common step. If having the TPM unowned at this point is unexpected, then that also deserves investigation.
,
Jul 25
how do we see the history of jobs on that DUT? would be good to check first why the TPM is unowned, and if that's expected.
,
Jul 25
And here's full 'messages' from #7 for this bug's reference.
,
Jul 25
Re #5: 1) a proper recovery from unexpectedly losing stateful partition is going through powerwash, i.e. resetting the TPM owner + wiping out the disk. The reason: Chrome keeps some state in the lost stateful partition, so is left in the inconsistent state, which may lead to side effects (example: b/73351086, e.g. see comments 87 and 90 there). We don't want those side effects to affect test results a few days down the road, when nobody can trace it back to this point. 2) I don't know a reliable way to detect that situation for losing encstateful w/o resetting tpm owner. tpm.owned as true but tpm.has_cryptohome_key as false sounds like a good indicator, but can also happen while the tpm is still in the process of being initialized (owned can be true quite early on, bu the cryptohome key is created after that on the first access to it, i.e. when the user logs in). We could put something onto the stateful partition each time after we clear the TPM in test lab jobs. Similarly, mount-encrypted could create a special file on encstateful, if the tpm was owned when it started, but there was no encstateful key, but that can happen in case of interrupted initialization - if we use that file in autotests, they would have to make sure they don't reboot after owning the tpm before the initialization is complete, or they might get this flag from mount-encrypted. That's a good idea in general: see bugs like issue 652565 and issue 692134 (and there were more iirc). There are also older devices that have no ability to clear TPMs at all, but they will soon be at AUE point, if not already. See issue 762334 , for example. I can't think of other scenarios when the tpm is owned at boot, and there's no encstateful key, but I may be wrong. +mnissler: I doubt we have it on the 'stateful preservation' path, but please correct me if needed.
,
Jul 27
Thanks for digging into the logs! That Chrome segfault is expected and unrelated; a later test is sending SEGV to Chrome to verify crash reporting: 2018/07/20 21:46:23 [21:46:23.283] Sending SIGSEGV to root Chrome process 12270 These tests just want Chrome to be in a state where they can log in using Oobe.loginForTesting(). They don't care about the state of the TPM as long as they can log in and interact with Chrome. The calls at 21:41:12 and 21:43:57 (in PDT) that you quoted in #8 must be from some earlier Autotest code; the Tast run didn't start until 21:45:25. You can see the jobs that ran before this one at http://stainless/search?view=list&first_date=2018-07-20&last_date=2018-07-21&hostname=chromeos4-row3-rack5-host4&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=false&exclude_acts=false&exclude_retried=false&exclude_non_production=false (sort by Started Time). This DUT looks like it was in a broken state for a long time. At 2018-07-20 05:19:28 UTC, login_CryptohomeIncognito failed with "Unhandled LoginException: Failed to restart browser in guest mode (pid 19918)." There are more cryptohome-related failures in other Autotests that run after this one. Going back even farther, I see a failure in a video_VideoReload.vp8 run at 2018-07-19 15:40:15 UTC (http://stainless/browse/chromeos-autotest-results/218735701-chromeos-test/): "Unhandled LoginException: Timed out going through login screen. Cryptohome not mounted. OOBE not dismissed." I didn't look back beyond that. It looks to me like we need to do more to identify and repair DUTs with TPMs that are in a bad state before running tests on them. This seems like it's a cause of general flakiness in existing Autotest-based tests that need to log in. Xixuan, I think that you worked on the TPMStatusVerifier class in server/hosts/cros_repair.py most recently. Do you have any ideas about how to detect and handle this state?
,
Jul 27
,
Jul 27
> Xixuan, I think that you worked on the TPMStatusVerifier class in > server/hosts/cros_repair.py most recently. Do you have any ideas > about how to detect and handle this state? Unfortunately, all xixuan@ did was to move the verifier code from an old location to its current location. I doubt she has any insights into its behavior. I've also worked with the code, but I don't know much more than xixuan@. However, if I've understood the code correctly, it simply extracts state from TPM using a query on cryptohome. The state is reasonably easy to parse, so if the exact settings that we consider to be suspect can be described, there's a decent chance that just about anyone can enhance the function to detect that condition. For experts in the deeper mysteries of the TPM, we may have to turn to dkrahn@ or lgoodby@.
,
Jul 27
,
Jul 29
Laurence, can you tell me which additional check(s) I can add to this code (in Autotest's TPMStatusVerifier class) to identify this case so we don't get unavoidable test failures?
try:
tpm = status['tpm']
if not tpm['enabled']:
raise hosts.AutoservVerifyError(
'TPM is not enabled -- Hardware is not working.')
if not tpm['can_connect']:
raise hosts.AutoservVerifyError(
('TPM connect failed -- '
'last_error=%d.' % tpm['last_error']))
if tpm['owned'] and not tpm['can_load_srk']:
raise hosts.AutoservVerifyError(
'Cannot load the TPM SRK')
if tpm['can_load_srk'] and not tpm['can_load_srk_pubkey']:
raise hosts.AutoservVerifyError(
'Cannot load the TPM SRK public key')
except KeyError:
logging.info('Cannot determine the Crytohome valid status - '
'skipping check.')
Here's the status from a run on a then-broken rikku DUT (see #7):
"tpm": {
"being_owned": false,
"can_connect": true,
"can_decrypt": false,
"can_encrypt": false,
"can_load_srk": false,
"can_load_srk_pubkey": false,
"enabled": true,
"has_context": true,
"has_cryptohome_key": false,
"has_key_handle": false,
"last_error": 8224,
"owned": false
}
And here's what the same now-no-longer-broken DUT (chromeos4-row3-rack5-host4) reports now:
"tpm": {
"being_owned": false,
"can_connect": true,
"can_decrypt": true,
"can_encrypt": true,
"can_load_srk": true,
"can_load_srk_pubkey": true,
"enabled": true,
"has_context": true,
"has_cryptohome_key": true,
"has_key_handle": true,
"last_error": 0,
"owned": true
}
So there are a bunch of differences, but I don't know which (if any) indicate a problem vs. being expected due to the TPM not being owned in the former case (which I assume is expected at times on DUTs in the lab).
,
Jul 31
Re #16: Right, in the first case it's just a seemingly unowned device, which is fine. The problem is: if I read the mount-encrypted log from #3 right, TPM is actually owned. This line says it: 2018-07-15T14:45:00.891814+00:00 NOTICE mount-encrypted[1562]: [0715/144450:INFO:tpm.cc(304)] TPM ready That log also says that a space in tpm nvram where encstateful key is saved was created, but nothing was written to it. Sounds like an interrupted TPM initialization. 2018-07-15T14:45:00.891845+00:00 NOTICE mount-encrypted[1562]: [0715/144451:INFO:tpm.cc(103)] NVRAM area has been defined but not written. One idea: If, when we are in this state, tpm.owned reported by cryptohome --action=status is indeed false, then we can detect it by comparing that tpm.owned to /sys/class/tpm/tpm0/device/owned (on older devices /sys/class/misc/tpm0/device/owned). If that one is true, and tpm.being_owned is false we are probably in this state. And let's restrict it to TPM1.2 (since 2.0 initialization is different). So: - if it is TPM 1.2 (e.g. "tpmc tpmver" returns "1.2" or alt [ -f /usr/sbin/tcsd ]) - if tpm.owned = false, tpm.being_owned = false - if /sys/class/(tpm|misc)/tpm0/device/owned contains 1 we are in this condition.
,
Jul 31
,
Aug 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/996075f2639a844bb642079dc2b8f2200819f888 commit 996075f2639a844bb642079dc2b8f2200819f888 Author: Daniel Erat <derat@chromium.org> Date: Wed Aug 01 17:02:00 2018 tast-tests: Make cryptohome log TPM ownership status. Update cryptohome.WaitForUserMount log the TPM device's ownership status on failure. This will help test a theory that TPM initialization is being interrupted on lab devices. BUG= chromium:864282 TEST=manual: ownership is logged after calling logStatus Change-Id: Id99978db0dcc27f8600e5f881ae9d392ff9d364d Reviewed-on: https://chromium-review.googlesource.com/1157916 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/996075f2639a844bb642079dc2b8f2200819f888/src/chromiumos/tast/local/cryptohome/cryptohome.go
,
Aug 7
http://stainless/browse/chromeos-autotest-results/223846952-chromeos-test/ contains a more-recent failure in ui.ChromeCrashLoggedIn: 2018/08/04 10:32:05 Started test ui.ChromeCrashLoggedIn 2018/08/04 10:32:05 [10:32:05.043] Restarting ui job ... 2018/08/04 10:32:39 [10:32:38.982] Waiting for cryptohome /home/user/59e4b5160f3449717f83b917bcbb5950ee349ffe 2018/08/04 10:33:05 [10:33:05.044] Failed to get cryptohome status 2018/08/04 10:33:05 [10:33:05.044] External command failed: signal: killed 2018/08/04 10:33:05 [10:33:05.045] Command: 'cryptohome' '--action=status' 2018/08/04 10:33:05 [10:33:05.045] Uncaptured output: 2018/08/04 10:33:05 [10:33:05.046] Error at chrome_crash_logged_in.go:24: Chrome login failed: /home/user/59e4b5160f3449717f83b917bcbb5950ee349ffe not mounted: context deadline exceeded It looks like the "cryptohome --action=status" call that was intended to collect more info timed out. This is unexpected -- cryptohome.WaitForUserMount reserves three seconds for it to run. I'll hold off on trying to commit https://crrev.com/c/1157533 until I see a failure indicating that we're actually seeing the state described in #17.
,
Aug 8
To give some context, "cryptohome --action=status" is not guaranteed to finish in 3 seconds: 1) That request to cryptohomed triggers several TPM commands. 2) If there's parallel TPm activity, each TPM command may have to wait on one or more other TPM commands already submitted by other daemons to run on TPM and waiting for their turn (TPM can process one command at a time). 3) There are TPM commands that take multiple seconds. E.g. RSA keypair generation is non-deterministic. For TPM2 it typically takes <10s, but can take 30+s in extreme cases (see e.g. http://b/68167013, go/cr50-keygen-speedup). 4) Those lengthy commands are rare (and commands are rarely sent to TPM in general), but they are concentrated around boot (especially first OOBE boot!) & login time. So, if "cryptohome --action=status" is unlucky, it may have to wait for a couple of lengthy 10+ second commands plus a few faster commands, so a 1+ minute wait is not impossible. See, for example, issue 772187, especially comments 28-29 there, for an example of similar delay in a different context.
,
Aug 17
,
Sep 26
Fixing my embarrassing cc mistake. Sorry. :-(
,
Sep 26
,
Oct 12
,
Oct 17
Is it fixed or does it need more attention?
,
Oct 17
I'll WontFix this for now since I don't think that I've seen this in a few months (and can't test any mitigations as a result). |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by apronin@chromium.org
, Jul 17