New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 611121 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jul 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 605749
issue 605952
issue 607026
issue 607068
issue 609376



Sign in to add a comment

Cannot load the TPM SRK public key, completed successfully

Project Member Reported by steve...@chromium.org, May 11 2016

Issue description

Tracking issues for failures with the following symptom:

Cannot load the TPM SRK public key, completed successfully


 
Labels: -Pri-3 Pri-2
Not a super high frequency bug, but there are a total of 22 autofiled issues with this symptom across multiple boards.

Cc: xiy...@chromium.org
From issue 607068:

Looking at the logs, I see there are some errors related to cryptohome:

2016-05-05T22:06:22.825332+00:00 INFO cryptohomed[833]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID
2016-05-05T22:06:23.269094+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
2016-05-05T22:06:23.273142+00:00 INFO kernel: [    7.280648] tpm_tis tpm_tis: command 0x21 (size 59) returned code 0x803
2016-05-05T22:06:23.291513+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
2016-05-05T22:06:23.291887+00:00 ERR shill[885]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
2016-05-05T22:06:23.308159+00:00 INFO cryptohomed[833]: TPM error 0x803 (TPM is defending against dictionary attacks and is in some time-out period): WrapRsaKey: Cannot load SRK pub key
2016-05-05T22:06:23.308173+00:00 ERR cryptohomed[833]: Couldn't wrap cryptohome key
2016-05-05T22:06:23.308178+00:00 WARNING cryptohomed[833]: Could not load the device policy file.
2016-05-05T22:06:23.308249+00:00 ERR cryptohomed[833]: Creating new salt at /home/.shadow/salt (0, 0)

Also From issue 607068:

The problem is that the TPM gets into defending mode and could not be used properly after AU (TPM error 0x803 line in #2). Unfortunately, I could not find enough logs to see how it gets into the state. Usually, TPM should work again (get out of defending mode) after some time out. If it keeps failing, we might want to poke at the device to see what is going on.

TPM status from auto test log:
05/05 15:07:22.193 INFO |            repair:0279| Verifying this condition: The host's TPM is available and working
05/05 15:07:22.194 DEBUG|          ssh_host:0180| Running (ssh) 'cryptohome --action=status'
05/05 15:07:22.542 DEBUG|        base_utils:0269| [stdout] {
05/05 15:07:22.543 DEBUG|        base_utils:0269| [stdout]    "installattrs": {
05/05 15:07:22.543 DEBUG|        base_utils:0269| [stdout]       "first_install": false,
05/05 15:07:22.543 DEBUG|        base_utils:0269| [stdout]       "initialized": false,
05/05 15:07:22.543 DEBUG|        base_utils:0269| [stdout]       "invalid": true,
05/05 15:07:22.544 DEBUG|        base_utils:0269| [stdout]       "lockbox_index": 536870916,
05/05 15:07:22.544 DEBUG|        base_utils:0269| [stdout]       "lockbox_nvram_version": 2,
05/05 15:07:22.544 DEBUG|        base_utils:0269| [stdout]       "secure": true,
05/05 15:07:22.545 DEBUG|        base_utils:0269| [stdout]       "size": 0,
05/05 15:07:22.545 DEBUG|        base_utils:0269| [stdout]       "version": 1
05/05 15:07:22.545 DEBUG|        base_utils:0269| [stdout]    },
05/05 15:07:22.545 DEBUG|        base_utils:0269| [stdout]    "mounts": [  ],
05/05 15:07:22.545 DEBUG|        base_utils:0269| [stdout]    "tpm": {
05/05 15:07:22.546 DEBUG|        base_utils:0269| [stdout]       "being_owned": false,
05/05 15:07:22.546 DEBUG|        base_utils:0269| [stdout]       "can_connect": true,
05/05 15:07:22.546 DEBUG|        base_utils:0269| [stdout]       "can_decrypt": false,
05/05 15:07:22.546 DEBUG|        base_utils:0269| [stdout]       "can_encrypt": false,
05/05 15:07:22.546 DEBUG|        base_utils:0269| [stdout]       "can_load_srk": true,
05/05 15:07:22.547 DEBUG|        base_utils:0269| [stdout]       "can_load_srk_pubkey": false,
05/05 15:07:22.547 DEBUG|        base_utils:0269| [stdout]       "enabled": true,
05/05 15:07:22.547 DEBUG|        base_utils:0269| [stdout]       "has_context": true,
05/05 15:07:22.547 DEBUG|        base_utils:0269| [stdout]       "has_cryptohome_key": false,
05/05 15:07:22.547 DEBUG|        base_utils:0269| [stdout]       "has_key_handle": false,
05/05 15:07:22.548 DEBUG|        base_utils:0269| [stdout]       "last_error": 1,
05/05 15:07:22.548 DEBUG|        base_utils:0269| [stdout]       "owned": true
05/05 15:07:22.548 DEBUG|        base_utils:0269| [stdout]    }
05/05 15:07:22.548 DEBUG|        base_utils:0269| [stdout] }
05/05 15:07:22.548 DEBUG|        base_utils:0269| [stdout] 
05/05 15:07:22.550 ERROR|            repair:0284| Failed: The host's TPM is available and working

Cc: keta...@chromium.org steve...@chromium.org jen...@chromium.org afakhry@chromium.org dkrahn@chromium.org sbasi@chromium.org
From issue 607026:

Another PFQ failure on tricky due to TPM, similar to the one in issue 607068, but this one with error codes 0x2020 and 0xC.

2016-05-09T21:08:52.317350+00:00 INFO cryptohomed[827]: TPM error 0x2020 (Key not found in persistent storage): LoadKeyByUuid: failed LoadKeyByUUID
2016-05-09T21:08:52.318018+00:00 INFO kernel: [    7.325556] tpm_tis tpm_tis: command 0x65 (size 22) returned code 0x0
2016-05-09T21:08:52.325417+00:00 INFO chapsd[819]: Importing opencryptoki objects.
2016-05-09T21:08:52.325498+00:00 WARNING chapsd[819]: Did not find any opencryptoki objects to import.
2016-05-09T21:08:52.331880+00:00 INFO chapsd[819]: Slot 0 ready for token at /var/lib/chaps
2016-05-09T21:08:52.331905+00:00 INFO chapsd[819]: Initializing key hierarchy for token at /var/lib/chaps
2016-05-09T21:08:52.362019+00:00 INFO kernel: [    7.369603] tpm_tis tpm_tis: command 0x46 (size 14) returned code 0x0
2016-05-09T21:08:52.386067+00:00 INFO kernel: [    7.393683] tpm_tis tpm_tis: command 0xa (size 10) returned code 0x0
2016-05-09T21:08:52.440046+00:00 INFO kernel: [    7.447689] tpm_tis tpm_tis: command 0xb (size 36) returned code 0x0
2016-05-09T21:08:52.513044+00:00 INFO kernel: [    7.520518] tpm_tis tpm_tis: command 0x21 (size 59) returned code 0xc
2016-05-09T21:08:52.542380+00:00 INFO cryptohomed[827]: TPM error 0xc (Invalid keyhandle): WrapRsaKey: Cannot load SRK pub key
2016-05-09T21:08:52.542413+00:00 ERR cryptohomed[827]: Couldn't wrap cryptohome key
2016-05-09T21:08:52.542449+00:00 WARNING cryptohomed[827]: Could not load the device policy file.
2016-05-09T21:08:52.542673+00:00 ERR cryptohomed[827]: Creating new salt at /home/.shadow/salt (0, 0)
2016-05-09T21:08:52.543045+00:00 INFO kernel: [    7.550485] tpm_tis tpm_tis: command 0xba (size 18) returned code 0x22
2016-05-09T21:08:52.579038+00:00 INFO kernel: [    7.586484] EXT4-fs (dm-1): resizing filesystem from 4096 to 823368 blocks
2016-05-09T21:08:52.585015+00:00 INFO kernel: [    7.592868] EXT4-fs (dm-1): resized filesystem to 823368
2016-05-09T21:08:52.686064+00:00 ERR firewalld[851]: Failed to call method: org.freedesktop.DBus.ObjectManager.GetManagedObjects: object_path= /org/chromium/PermissionBroker/ObjectManager: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.PermissionBroker was not provided by any .service files
2016-05-09T21:08:52.686116+00:00 WARNING firewalld[851]: org.chromium.PermissionBroker /org/chromium/PermissionBroker/ObjectManager: Failed to get managed objects
2016-05-09T21:08:52.697393+00:00 INFO permission_broker[859]: firewalld instance created. Putting permission_broker object on D-Bus.
2016-05-09T21:08:52.802017+00:00 INFO kernel: [    7.809696] tpm_tis tpm_tis: command 0x1f (size 146) returned code 0x0
2016-05-09T21:08:52.806463+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
2016-05-09T21:08:52.818020+00:00 INFO kernel: [    7.826139] cfg80211: Calling CRDA for country: US
2016-05-09T21:08:52.820045+00:00 INFO kernel: [    7.827715] cfg80211: Regulatory domain changed to country: US
2016-05-09T21:08:52.820054+00:00 INFO kernel: [    7.827725] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
2016-05-09T21:08:52.820056+00:00 INFO kernel: [    7.827735] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm)
2016-05-09T21:08:52.820059+00:00 INFO kernel: [    7.827743] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm)
2016-05-09T21:08:52.820061+00:00 INFO kernel: [    7.827751] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm)
2016-05-09T21:08:52.820063+00:00 INFO kernel: [    7.827760] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2300 mBm)
2016-05-09T21:08:52.820065+00:00 INFO kernel: [    7.827768] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm)
2016-05-09T21:08:52.820067+00:00 INFO kernel: [    7.827777] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm)
2016-05-09T21:08:52.827726+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
2016-05-09T21:08:52.828038+00:00 ERR shill[880]: [ERROR:crypto_des_cbc.cc(103)] Unable to load key matter from /var/lib/whitelist/owner.key
Note, this failure does not seem to be associated with a particular DUT.

Comment 7 by dkrahn@chromium.org, May 25 2016

Possibly the DUT TPM is not owned but the test plows forward anyways?
Components: -Build Infra>Client>ChromeOS
Components: -Infra>Client>ChromeOS
Owner: dkrahn@chromium.org
Darren, can you help find an owner? Not an infra issue, removing our component 
Cc: jrbarnette@chromium.org
Owner: steve...@chromium.org
I think it is an infra issue. If you're running a test on a device, that device should be in a reasonable state. If a device's TPM is actively defending against attack the infra should self-repair.

For tests that expect the TPM to be owned but do not take any steps to ensure this is the case it's debatable. Some tests take this into their own hands and put the DUT in the state they expect. Probably it's best to continue this practice and the bug can be filed on the owner of the test.

stevenjb@ -- can you split this into issues for specific tests vs issues where infra is using bad DUTs (+jrbarnette).
Darren: Is there a tpm status command to see when the TPM gets into this state (defending mode?). 

Before launching a test, I can do a check and at least throw a reasonable exception so this is more discoverable.
> stevenjb@ -- can you split this into issues for specific
> tests vs issues where infra is using bad DUTs (+jrbarnette).

Let's not do that.

The failures here are all happening in the verify step after
provisioning.  The problem with the TPM state was present at
the start, before the test ran.  Whatever caused it was in the
test that _preceded_ the failed provision.

We need to look at history prior to the test failures to find
what's causing this.

> Before launching a test, I can do a check and at least
> throw a reasonable exception so this is more discoverable.

achuith@ - Following up:  Don't do this either.  We already
check for this TPM error before every test.  What we're
seeing here is the failure of that very check.

Status: Available (was: Untriaged)
 Issue 616944  has been merged into this issue.
Labels: Hotlist-CrOS-Gardener
Owner: akes...@chromium.org
akeshet@ - this looks like something that should fall into the "DUT task force" list? If so, let's remove the Hotlist-CrOS-Gardener label in favor of whatever we are using to track those.


Labels: dut-health-taskforce
Yes, this would be in that task force. We haven't got a label yet for it. For now I'll make one up.
Adding some information:  A little while ago, I made a spot check
of the history of a few devices with the failure.  I found that
the problem seemed to occur after running AU tests.

N.B. I don't have true hard data to say that AU tests cause this,
just a vague, anecdotal memory.

Great, thanks!

Before you go too crazy with 'dut-health-taskforce' (although it's easy to bulk update later), I would ping ketakid@ about getting an official Hotlist- label (e.g. Hotlist-DUT-Health) in the bug tracker so that auto complete works. It's much easier to just type, e.g. 'DUT' and have the label autocomplete.

Labels: -dut-health-taskforce dut-health
Labels: -dut-health Hotlist-CrOS-DutHealth
Labels: -Pri-2 Pri-1
Owner: jrbarnette@chromium.org
Richard, did you have a strategy about avoiding this using a verifier?

This is still happening.
Summary: Cannot load the TPM SRK public key, completed successfully (was: PFQ Failures: Cannot load the TPM SRK public key, completed successfully)
<sigh> No, this whole symptom is _because_ of a verifier.

Long run, an ideal solution is to run a reset or verify
task after the last test in a suite.

Short term, we can paper over this problem by running this
command on the DUT prior to one of the reboots during
provisioning:
    crossystem clear_tpm_owner_request=1

This should make the problem not appear during provisioning
jobs on most (but not all) boards.

And again on tricky:
https://uberchromegw.corp.google.com/i/chromeos/builders/tricky-chrome-pfq/builds/2079

Looks similar to comment #3:

07/06 14:36:43.353 INFO |            repair:0279| Verifying this condition: The host's TPM is available and working
07/06 14:36:43.354 DEBUG|          ssh_host:0180| Running (ssh) 'cryptohome --action=status'
07/06 14:36:43.716 DEBUG|        base_utils:0278| [stdout] {
07/06 14:36:43.717 DEBUG|        base_utils:0278| [stdout]    "installattrs": {
07/06 14:36:43.717 DEBUG|        base_utils:0278| [stdout]       "first_install": false,
07/06 14:36:43.718 DEBUG|        base_utils:0278| [stdout]       "initialized": false,
07/06 14:36:43.718 DEBUG|        base_utils:0278| [stdout]       "invalid": true,
07/06 14:36:43.719 DEBUG|        base_utils:0278| [stdout]       "lockbox_index": 536870916,
07/06 14:36:43.719 DEBUG|        base_utils:0278| [stdout]       "lockbox_nvram_version": 2,
07/06 14:36:43.719 DEBUG|        base_utils:0278| [stdout]       "secure": true,
07/06 14:36:43.720 DEBUG|        base_utils:0278| [stdout]       "size": 0,
07/06 14:36:43.720 DEBUG|        base_utils:0278| [stdout]       "version": 1
07/06 14:36:43.721 DEBUG|        base_utils:0278| [stdout]    },
07/06 14:36:43.721 DEBUG|        base_utils:0278| [stdout]    "mounts": [  ],
07/06 14:36:43.721 DEBUG|        base_utils:0278| [stdout]    "tpm": {
07/06 14:36:43.722 DEBUG|        base_utils:0278| [stdout]       "being_owned": false,
07/06 14:36:43.722 DEBUG|        base_utils:0278| [stdout]       "can_connect": true,
07/06 14:36:43.722 DEBUG|        base_utils:0278| [stdout]       "can_decrypt": false,
07/06 14:36:43.722 DEBUG|        base_utils:0278| [stdout]       "can_encrypt": false,
07/06 14:36:43.723 DEBUG|        base_utils:0278| [stdout]       "can_load_srk": true,
07/06 14:36:43.723 DEBUG|        base_utils:0278| [stdout]       "can_load_srk_pubkey": false,
07/06 14:36:43.723 DEBUG|        base_utils:0278| [stdout]       "enabled": true,
07/06 14:36:43.724 DEBUG|        base_utils:0278| [stdout]       "has_context": true,
07/06 14:36:43.724 DEBUG|        base_utils:0278| [stdout]       "has_cryptohome_key": false,
07/06 14:36:43.724 DEBUG|        base_utils:0278| [stdout]       "has_key_handle": false,
07/06 14:36:43.725 DEBUG|        base_utils:0278| [stdout]       "last_error": 1,
07/06 14:36:43.725 DEBUG|        base_utils:0278| [stdout]       "owned": true
07/06 14:36:43.726 DEBUG|        base_utils:0278| [stdout]    }
07/06 14:36:43.726 DEBUG|        base_utils:0278| [stdout] }
07/06 14:36:43.726 DEBUG|        base_utils:0278| [stdout] 
07/06 14:36:43.729 ERROR|            repair:0284| Failed: The host's TPM is available and working
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 281, in _verify_host
    self.verify(host)
  File "/usr/local/autotest/server/hosts/cros_repair.py", line 171, in verify
    'Cannot load the TPM SRK public key')
AutoservVerifyError: Cannot load the TPM SRK public key

Project Member

Comment 29 by bugdroid1@chromium.org, Jul 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/fc3d743701f73b1ced8cc435b5275c6e9f402c75

commit fc3d743701f73b1ced8cc435b5275c6e9f402c75
Author: Richard Barnette <jrbarnette@chromium.org>
Date: Thu Jul 07 17:21:31 2016

[autotest] Prevent TPM problems from causing Provision failures.

Sometimes, the TPM on a DUT gets into a bad state, such that it can
cause test failures.  Host verification catches and repairs such
problems, but the problem frequently causes provision failures in
the process.

This changes CrosHost.machine_install() to clear the TPM as part of
the standard AU process, which will eliminate all the provisioning
failures, at the cost of papering over whatever is causing the
problem.

BUG= chromium:611121 
TEST=test using machine_install on a local instance

Change-Id: I6b44e8c71b99839245c08d25cb101575bc47a803
Reviewed-on: https://chromium-review.googlesource.com/358840
Commit-Ready: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Shuqian Zhao <shuqianz@chromium.org>

[modify] https://crrev.com/fc3d743701f73b1ced8cc435b5275c6e9f402c75/server/hosts/cros_host.py

Owner: shuqianz@chromium.org
I've asked shuqianz@ for a push to prod for the CL above.

After a push, the provision failures should stop (mostly, I think),
but the underlying problem could still sometimes cause reset/verify
failures and repair during testing.

Push is done.
$ git log --oneline 321cd4e..11f8aed | grep autotest
9d0968d Make touch_UpdateErrors autotest ignore autotest and touchview failures
254d5b4 [autotest] Include DUT totals in lab inventory log messages.
8e13b68 autotest: record logs after forcing reset when repairing
fc20d3d [autotest] Force teststation command to abort when the ssh command is timed out
fc3d743 [autotest] Prevent TPM problems from causing Provision failures.
Status: Fixed (was: Available)
Labels: VerifyIn-54
Project Member

Comment 34 by bugdroid1@chromium.org, Oct 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d43814e6cc0047c67db8992bfc4a13253788e35d

commit d43814e6cc0047c67db8992bfc4a13253788e35d
Author: xixuan <xixuan@chromium.org>
Date: Fri Sep 23 00:02:02 2016

auto_updater: Add crossystem command in rootfs update to fix TPM issues.

Sometimes, the TPM on a DUT gets into a bad state, such that it can
cause test failures. Host verification catches and repairs such
problems, but the problem frequently causes provision failures in
the process.

This changes ChromiumOSUpdater.PostCheckRootfsUpdate() to clear the TPM as
part of the cros-flash based AU process, which will eliminate all the
provisioning failures, at the cost of papering over whatever is causing the
problem.

BUG=chromium:649543,  chromium:611121 
TEST=Run provision & repair in local autotest with local DUT and devserver.

Change-Id: I9b1736ca04c67df800a48c19a6b173b8c18bcb29
Reviewed-on: https://chromium-review.googlesource.com/388914
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Xixuan Wu <xixuan@chromium.org>

[modify] https://crrev.com/d43814e6cc0047c67db8992bfc4a13253788e35d/lib/auto_updater.py

Labels: VerifyIn-55

Comment 36 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 37 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 38 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 39 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 40 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61
Status: Verified (was: Fixed)
Closing. Please reopen it if its not fixed. Thanks!

Sign in to add a comment