Cannot load the TPM SRK public key, completed successfully |
|||||||||||||||||||||||||
Issue descriptionTracking issues for failures with the following symptom: Cannot load the TPM SRK public key, completed successfully
,
May 11 2016
From issue 607068: Looking at the logs, I see there are some errors related to cryptohome: 2016-05-05T22:06:22.825332+00:00 INFO cryptohomed[833]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2016-05-05T22:06:23.269094+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-05-05T22:06:23.273142+00:00 INFO kernel: [ 7.280648] tpm_tis tpm_tis: command 0x21 (size 59) returned code 0x803 2016-05-05T22:06:23.291513+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-05-05T22:06:23.291887+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-05-05T22:06:23.308159+00:00 INFO cryptohomed[833]: TPM error 0x803 (TPM is defending against dictionary attacks and is in some time-out period): WrapRsaKey: Cannot load SRK pub key 2016-05-05T22:06:23.308173+00:00 ERR cryptohomed[833]: Couldn't wrap cryptohome key 2016-05-05T22:06:23.308178+00:00 WARNING cryptohomed[833]: Could not load the device policy file. 2016-05-05T22:06:23.308249+00:00 ERR cryptohomed[833]: Creating new salt at /home/.shadow/salt (0, 0)
,
May 11 2016
Also From issue 607068:
The problem is that the TPM gets into defending mode and could not be used properly after AU (TPM error 0x803 line in #2). Unfortunately, I could not find enough logs to see how it gets into the state. Usually, TPM should work again (get out of defending mode) after some time out. If it keeps failing, we might want to poke at the device to see what is going on.
TPM status from auto test log:
05/05 15:07:22.193 INFO | repair:0279| Verifying this condition: The host's TPM is available and working
05/05 15:07:22.194 DEBUG| ssh_host:0180| Running (ssh) 'cryptohome --action=status'
05/05 15:07:22.542 DEBUG| base_utils:0269| [stdout] {
05/05 15:07:22.543 DEBUG| base_utils:0269| [stdout] "installattrs": {
05/05 15:07:22.543 DEBUG| base_utils:0269| [stdout] "first_install": false,
05/05 15:07:22.543 DEBUG| base_utils:0269| [stdout] "initialized": false,
05/05 15:07:22.543 DEBUG| base_utils:0269| [stdout] "invalid": true,
05/05 15:07:22.544 DEBUG| base_utils:0269| [stdout] "lockbox_index": 536870916,
05/05 15:07:22.544 DEBUG| base_utils:0269| [stdout] "lockbox_nvram_version": 2,
05/05 15:07:22.544 DEBUG| base_utils:0269| [stdout] "secure": true,
05/05 15:07:22.545 DEBUG| base_utils:0269| [stdout] "size": 0,
05/05 15:07:22.545 DEBUG| base_utils:0269| [stdout] "version": 1
05/05 15:07:22.545 DEBUG| base_utils:0269| [stdout] },
05/05 15:07:22.545 DEBUG| base_utils:0269| [stdout] "mounts": [ ],
05/05 15:07:22.545 DEBUG| base_utils:0269| [stdout] "tpm": {
05/05 15:07:22.546 DEBUG| base_utils:0269| [stdout] "being_owned": false,
05/05 15:07:22.546 DEBUG| base_utils:0269| [stdout] "can_connect": true,
05/05 15:07:22.546 DEBUG| base_utils:0269| [stdout] "can_decrypt": false,
05/05 15:07:22.546 DEBUG| base_utils:0269| [stdout] "can_encrypt": false,
05/05 15:07:22.546 DEBUG| base_utils:0269| [stdout] "can_load_srk": true,
05/05 15:07:22.547 DEBUG| base_utils:0269| [stdout] "can_load_srk_pubkey": false,
05/05 15:07:22.547 DEBUG| base_utils:0269| [stdout] "enabled": true,
05/05 15:07:22.547 DEBUG| base_utils:0269| [stdout] "has_context": true,
05/05 15:07:22.547 DEBUG| base_utils:0269| [stdout] "has_cryptohome_key": false,
05/05 15:07:22.547 DEBUG| base_utils:0269| [stdout] "has_key_handle": false,
05/05 15:07:22.548 DEBUG| base_utils:0269| [stdout] "last_error": 1,
05/05 15:07:22.548 DEBUG| base_utils:0269| [stdout] "owned": true
05/05 15:07:22.548 DEBUG| base_utils:0269| [stdout] }
05/05 15:07:22.548 DEBUG| base_utils:0269| [stdout] }
05/05 15:07:22.548 DEBUG| base_utils:0269| [stdout]
05/05 15:07:22.550 ERROR| repair:0284| Failed: The host's TPM is available and working
,
May 11 2016
,
May 11 2016
From issue 607026: Another PFQ failure on tricky due to TPM, similar to the one in issue 607068, but this one with error codes 0x2020 and 0xC. 2016-05-09T21:08:52.317350+00:00 INFO cryptohomed[827]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID 2016-05-09T21:08:52.318018+00:00 INFO kernel: [ 7.325556] tpm_tis tpm_tis: command 0x65 (size 22) returned code 0x0 2016-05-09T21:08:52.325417+00:00 INFO chapsd[819]: Importing opencryptoki objects. 2016-05-09T21:08:52.325498+00:00 WARNING chapsd[819]: Did not find any opencryptoki objects to import. 2016-05-09T21:08:52.331880+00:00 INFO chapsd[819]: Slot 0 ready for token at /var/lib/chaps 2016-05-09T21:08:52.331905+00:00 INFO chapsd[819]: Initializing key hierarchy for token at /var/lib/chaps 2016-05-09T21:08:52.362019+00:00 INFO kernel: [ 7.369603] tpm_tis tpm_tis: command 0x46 (size 14) returned code 0x0 2016-05-09T21:08:52.386067+00:00 INFO kernel: [ 7.393683] tpm_tis tpm_tis: command 0xa (size 10) returned code 0x0 2016-05-09T21:08:52.440046+00:00 INFO kernel: [ 7.447689] tpm_tis tpm_tis: command 0xb (size 36) returned code 0x0 2016-05-09T21:08:52.513044+00:00 INFO kernel: [ 7.520518] tpm_tis tpm_tis: command 0x21 (size 59) returned code 0xc 2016-05-09T21:08:52.542380+00:00 INFO cryptohomed[827]: TPM error 0xc (Invalid keyhandle): WrapRsaKey: Cannot load SRK pub key 2016-05-09T21:08:52.542413+00:00 ERR cryptohomed[827]: Couldn't wrap cryptohome key 2016-05-09T21:08:52.542449+00:00 WARNING cryptohomed[827]: Could not load the device policy file. 2016-05-09T21:08:52.542673+00:00 ERR cryptohomed[827]: Creating new salt at /home/.shadow/salt (0, 0) 2016-05-09T21:08:52.543045+00:00 INFO kernel: [ 7.550485] tpm_tis tpm_tis: command 0xba (size 18) returned code 0x22 2016-05-09T21:08:52.579038+00:00 INFO kernel: [ 7.586484] EXT4-fs (dm-1): resizing filesystem from 4096 to 823368 blocks 2016-05-09T21:08:52.585015+00:00 INFO kernel: [ 7.592868] EXT4-fs (dm-1): resized filesystem to 823368 2016-05-09T21:08:52.686064+00:00 ERR firewalld[851]: Failed to call method: org.freedesktop.DBus.ObjectManager.GetManagedObjects: object_path= /org/chromium/PermissionBroker/ObjectManager: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.PermissionBroker was not provided by any .service files 2016-05-09T21:08:52.686116+00:00 WARNING firewalld[851]: org.chromium.PermissionBroker /org/chromium/PermissionBroker/ObjectManager: Failed to get managed objects 2016-05-09T21:08:52.697393+00:00 INFO permission_broker[859]: firewalld instance created. Putting permission_broker object on D-Bus. 2016-05-09T21:08:52.802017+00:00 INFO kernel: [ 7.809696] tpm_tis tpm_tis: command 0x1f (size 146) returned code 0x0 2016-05-09T21:08:52.806463+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-05-09T21:08:52.818020+00:00 INFO kernel: [ 7.826139] cfg80211: Calling CRDA for country: US 2016-05-09T21:08:52.820045+00:00 INFO kernel: [ 7.827715] cfg80211: Regulatory domain changed to country: US 2016-05-09T21:08:52.820054+00:00 INFO kernel: [ 7.827725] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) 2016-05-09T21:08:52.820056+00:00 INFO kernel: [ 7.827735] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm) 2016-05-09T21:08:52.820059+00:00 INFO kernel: [ 7.827743] cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm) 2016-05-09T21:08:52.820061+00:00 INFO kernel: [ 7.827751] cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm) 2016-05-09T21:08:52.820063+00:00 INFO kernel: [ 7.827760] cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2300 mBm) 2016-05-09T21:08:52.820065+00:00 INFO kernel: [ 7.827768] cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm) 2016-05-09T21:08:52.820067+00:00 INFO kernel: [ 7.827777] cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm) 2016-05-09T21:08:52.827726+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key 2016-05-09T21:08:52.828038+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
,
May 11 2016
Note, this failure does not seem to be associated with a particular DUT.
,
May 25 2016
Possibly the DUT TPM is not owned but the test plows forward anyways?
,
May 27 2016
,
Jun 2 2016
Darren, can you help find an owner? Not an infra issue, removing our component
,
Jun 3 2016
I think it is an infra issue. If you're running a test on a device, that device should be in a reasonable state. If a device's TPM is actively defending against attack the infra should self-repair. For tests that expect the TPM to be owned but do not take any steps to ensure this is the case it's debatable. Some tests take this into their own hands and put the DUT in the state they expect. Probably it's best to continue this practice and the bug can be filed on the owner of the test. stevenjb@ -- can you split this into issues for specific tests vs issues where infra is using bad DUTs (+jrbarnette).
,
Jun 3 2016
Darren: Is there a tpm status command to see when the TPM gets into this state (defending mode?). Before launching a test, I can do a check and at least throw a reasonable exception so this is more discoverable.
,
Jun 3 2016
> stevenjb@ -- can you split this into issues for specific > tests vs issues where infra is using bad DUTs (+jrbarnette). Let's not do that. The failures here are all happening in the verify step after provisioning. The problem with the TPM state was present at the start, before the test ran. Whatever caused it was in the test that _preceded_ the failed provision. We need to look at history prior to the test failures to find what's causing this.
,
Jun 3 2016
> Before launching a test, I can do a check and at least > throw a reasonable exception so this is more discoverable. achuith@ - Following up: Don't do this either. We already check for this TPM error before every test. What we're seeing here is the failure of that very check.
,
Jun 7 2016
,
Jun 10 2016
Issue 616944 has been merged into this issue.
,
Jun 13 2016
,
Jun 15 2016
akeshet@ - this looks like something that should fall into the "DUT task force" list? If so, let's remove the Hotlist-CrOS-Gardener label in favor of whatever we are using to track those.
,
Jun 16 2016
Yes, this would be in that task force. We haven't got a label yet for it. For now I'll make one up.
,
Jun 16 2016
Adding some information: A little while ago, I made a spot check of the history of a few devices with the failure. I found that the problem seemed to occur after running AU tests. N.B. I don't have true hard data to say that AU tests cause this, just a vague, anecdotal memory.
,
Jun 16 2016
Great, thanks! Before you go too crazy with 'dut-health-taskforce' (although it's easy to bulk update later), I would ping ketakid@ about getting an official Hotlist- label (e.g. Hotlist-DUT-Health) in the bug tracker so that auto complete works. It's much easier to just type, e.g. 'DUT' and have the label autocomplete.
,
Jun 18 2016
,
Jun 21 2016
,
Jun 27 2016
,
Jul 1 2016
Richard, did you have a strategy about avoiding this using a verifier? This is still happening.
,
Jul 1 2016
,
Jul 1 2016
<sigh> No, this whole symptom is _because_ of a verifier.
Long run, an ideal solution is to run a reset or verify
task after the last test in a suite.
Short term, we can paper over this problem by running this
command on the DUT prior to one of the reboots during
provisioning:
crossystem clear_tpm_owner_request=1
This should make the problem not appear during provisioning
jobs on most (but not all) boards.
,
Jul 6 2016
And again on tricky: https://uberchromegw.corp.google.com/i/chromeos/builders/tricky-chrome-pfq/builds/2079 Looks similar to comment #3: 07/06 14:36:43.353 INFO | repair:0279| Verifying this condition: The host's TPM is available and working 07/06 14:36:43.354 DEBUG| ssh_host:0180| Running (ssh) 'cryptohome --action=status' 07/06 14:36:43.716 DEBUG| base_utils:0278| [stdout] { 07/06 14:36:43.717 DEBUG| base_utils:0278| [stdout] "installattrs": { 07/06 14:36:43.717 DEBUG| base_utils:0278| [stdout] "first_install": false, 07/06 14:36:43.718 DEBUG| base_utils:0278| [stdout] "initialized": false, 07/06 14:36:43.718 DEBUG| base_utils:0278| [stdout] "invalid": true, 07/06 14:36:43.719 DEBUG| base_utils:0278| [stdout] "lockbox_index": 536870916, 07/06 14:36:43.719 DEBUG| base_utils:0278| [stdout] "lockbox_nvram_version": 2, 07/06 14:36:43.719 DEBUG| base_utils:0278| [stdout] "secure": true, 07/06 14:36:43.720 DEBUG| base_utils:0278| [stdout] "size": 0, 07/06 14:36:43.720 DEBUG| base_utils:0278| [stdout] "version": 1 07/06 14:36:43.721 DEBUG| base_utils:0278| [stdout] }, 07/06 14:36:43.721 DEBUG| base_utils:0278| [stdout] "mounts": [ ], 07/06 14:36:43.721 DEBUG| base_utils:0278| [stdout] "tpm": { 07/06 14:36:43.722 DEBUG| base_utils:0278| [stdout] "being_owned": false, 07/06 14:36:43.722 DEBUG| base_utils:0278| [stdout] "can_connect": true, 07/06 14:36:43.722 DEBUG| base_utils:0278| [stdout] "can_decrypt": false, 07/06 14:36:43.722 DEBUG| base_utils:0278| [stdout] "can_encrypt": false, 07/06 14:36:43.723 DEBUG| base_utils:0278| [stdout] "can_load_srk": true, 07/06 14:36:43.723 DEBUG| base_utils:0278| [stdout] "can_load_srk_pubkey": false, 07/06 14:36:43.723 DEBUG| base_utils:0278| [stdout] "enabled": true, 07/06 14:36:43.724 DEBUG| base_utils:0278| [stdout] "has_context": true, 07/06 14:36:43.724 DEBUG| base_utils:0278| [stdout] "has_cryptohome_key": false, 07/06 14:36:43.724 DEBUG| base_utils:0278| [stdout] "has_key_handle": false, 07/06 14:36:43.725 DEBUG| base_utils:0278| [stdout] "last_error": 1, 07/06 14:36:43.725 DEBUG| base_utils:0278| [stdout] "owned": true 07/06 14:36:43.726 DEBUG| base_utils:0278| [stdout] } 07/06 14:36:43.726 DEBUG| base_utils:0278| [stdout] } 07/06 14:36:43.726 DEBUG| base_utils:0278| [stdout] 07/06 14:36:43.729 ERROR| repair:0284| Failed: The host's TPM is available and working Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 281, in _verify_host self.verify(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 171, in verify 'Cannot load the TPM SRK public key') AutoservVerifyError: Cannot load the TPM SRK public key
,
Jul 7 2016
,
Jul 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/fc3d743701f73b1ced8cc435b5275c6e9f402c75 commit fc3d743701f73b1ced8cc435b5275c6e9f402c75 Author: Richard Barnette <jrbarnette@chromium.org> Date: Thu Jul 07 17:21:31 2016 [autotest] Prevent TPM problems from causing Provision failures. Sometimes, the TPM on a DUT gets into a bad state, such that it can cause test failures. Host verification catches and repairs such problems, but the problem frequently causes provision failures in the process. This changes CrosHost.machine_install() to clear the TPM as part of the standard AU process, which will eliminate all the provisioning failures, at the cost of papering over whatever is causing the problem. BUG= chromium:611121 TEST=test using machine_install on a local instance Change-Id: I6b44e8c71b99839245c08d25cb101575bc47a803 Reviewed-on: https://chromium-review.googlesource.com/358840 Commit-Ready: Richard Barnette <jrbarnette@chromium.org> Tested-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: Shuqian Zhao <shuqianz@chromium.org> [modify] https://crrev.com/fc3d743701f73b1ced8cc435b5275c6e9f402c75/server/hosts/cros_host.py
,
Jul 11 2016
I've asked shuqianz@ for a push to prod for the CL above. After a push, the provision failures should stop (mostly, I think), but the underlying problem could still sometimes cause reset/verify failures and repair during testing.
,
Jul 11 2016
Push is done. $ git log --oneline 321cd4e..11f8aed | grep autotest 9d0968d Make touch_UpdateErrors autotest ignore autotest and touchview failures 254d5b4 [autotest] Include DUT totals in lab inventory log messages. 8e13b68 autotest: record logs after forcing reset when repairing fc20d3d [autotest] Force teststation command to abort when the ssh command is timed out fc3d743 [autotest] Prevent TPM problems from causing Provision failures.
,
Jul 14 2016
,
Aug 29 2016
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d43814e6cc0047c67db8992bfc4a13253788e35d commit d43814e6cc0047c67db8992bfc4a13253788e35d Author: xixuan <xixuan@chromium.org> Date: Fri Sep 23 00:02:02 2016 auto_updater: Add crossystem command in rootfs update to fix TPM issues. Sometimes, the TPM on a DUT gets into a bad state, such that it can cause test failures. Host verification catches and repairs such problems, but the problem frequently causes provision failures in the process. This changes ChromiumOSUpdater.PostCheckRootfsUpdate() to clear the TPM as part of the cros-flash based AU process, which will eliminate all the provisioning failures, at the cost of papering over whatever is causing the problem. BUG=chromium:649543, chromium:611121 TEST=Run provision & repair in local autotest with local DUT and devserver. Change-Id: I9b1736ca04c67df800a48c19a6b173b8c18bcb29 Reviewed-on: https://chromium-review.googlesource.com/388914 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Xixuan Wu <xixuan@chromium.org> [modify] https://crrev.com/d43814e6cc0047c67db8992bfc4a13253788e35d/lib/auto_updater.py
,
Oct 7 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Aug 3 2017
Closing. Please reopen it if its not fixed. Thanks! |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by steve...@chromium.org
, May 11 2016