New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 599686 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Moblab fails the TPM SRK check in verify

Reported by jrbarnette@chromium.org, Mar 31 2016

Issue description

This is follow up from
    /b2/27923610

Four out of the eight moblab units in the lab are failing
with this symptom:

START	----	repair	timestamp=1459462979	localtime=Mar 31 15:22:59	
	GOOD	----	verify.ssh	timestamp=1459463002	localtime=Mar 31 15:23:22	
	GOOD	----	verify.power	timestamp=1459463002	localtime=Mar 31 15:23:22	
	GOOD	----	verify.cros	timestamp=1459463040	localtime=Mar 31 15:24:00	
	FAIL	----	verify.tpm	timestamp=1459463040	localtime=Mar 31 15:24:00	Cannot load the TPM SRK public key
END FAIL	----	repair	timestamp=1459463040	localtime=Mar 31 15:24:00	

As of the most recent Repair/Verify changes, Moblab can't fix
this problem, because it doesn't try power wash.

We have three options:
 1) Drop the TPM check from Moblab verify.
 2) Add powerwash back to the Moblab repair flow.
 3) Add a specialized TPM repair procedure; this could
    be used for both Moblab and CrOS.

2) and 3) are work, and possibly not necessary, so let's try 1)
for a first go around.

 
Project Member

Comment 1 by bugdroid1@chromium.org, Mar 31 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/084063ca2f0b424baae2354b1345a162eb5b9fd4

commit 084063ca2f0b424baae2354b1345a162eb5b9fd4
Author: J. Richard Barnette <jrbarnette@chromium.org>
Date: Thu Mar 31 22:47:21 2016

[autotest] Drop the TPM verifier from Moblab.

The TPM verifier fails on some Moblab units.  However, we have no
repair procedure to fix the problem and at first blush, the verifier
isn't needed for Moblab anyway.  So, we're deleting the check, to
allow Moblab units to go back into service.

BUG= chromium:599686 
TEST=From python shell, call create_moblab_repair_strategy()

Change-Id: Ief770ac9e28360c6b9073a00485f8193b890b314
Reviewed-on: https://chromium-review.googlesource.com/336444
Commit-Queue: Richard Barnette <jrbarnette@chromium.org>
Tested-by: Richard Barnette <jrbarnette@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/084063ca2f0b424baae2354b1345a162eb5b9fd4/server/hosts/cros_repair.py

Comment 2 by krk@chromium.org, Mar 31 2016

Cc: krk@chromium.org
Labels: Proj-Moblab
The change above will allow the Moblab DUTs to pass verify.
However, it may be that there's some test or other feature
in Moblab that can fail when the TPM has the underlying problem.
In that case, it would be necessary to re-instate the TPM verifier,
and add a repair action.  The best option for repair is described
in  bug 599702 .

All duts for guado_moblab is broken due to 'Cannot load the TPM SRK public key' in moblab.

~$ dut-status -b guado_moblab -p bvt -g
chromeos2-row5-rack10-host7
    2016-04-01 06:22:32  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host7/53226909-repair/
    2016-04-01 06:19:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host7/53226743-reset/
    2016-04-01 05:16:08  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/58616985-chromeos-test/
    2016-04-01 05:03:56  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host7/53223197-provision/
chromeos2-row5-rack10-host8
    2016-04-01 06:33:26  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host8/53227429-repair/
    2016-04-01 06:30:45  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host8/53227311-reset/
    2016-04-01 06:25:56  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/58617090-chromeos-test/
    2016-04-01 06:22:33  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host8/53226915-reset/
chromeos2-row5-rack10-host9
    2016-04-01 06:33:26  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host9/53227428-repair/
    2016-04-01 06:30:45  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host9/53227310-reset/
    2016-04-01 06:25:16  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/58617085-chromeos-test/
    2016-04-01 06:22:33  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host9/53226914-reset/
chromeos2-row5-rack10-host10
    2016-04-03 04:37:17  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host10/53351598-repair/
    2016-04-03 04:32:53  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host10/53351380-reset/
    2016-04-03 04:13:23  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/58839818-chromeos-test/
    2016-04-03 04:10:09  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row5-rack10-host10/53350287-reset/

Is it the same issue?
Cc: xixuan@chromium.org
Looks like it, the fix Richard made hasn't been pushed out yet.  
Cc: dkrahn@chromium.org
Yeah, without the fix, anytime a DUT gets this state, it'll
fail repair again.

There's a cheap and easy manual fix while we wait for the push,
it's described in  bug 599702 .  Basically, just run this command:
    ssh $DUT "crossystem clear_tpm_owner_request=1 ; reboot"

Then reverify all the DUTs.

One question is:  Why do these Moblab DUTs get into this state in
the first place?  Really, I think that's not supposed to happen,
although it could be normal, in which case we should rethink our
verification tests.

Thanks! DUTs are back now.
Status: Fixed (was: Assigned)

Comment 9 by benhenry@google.com, Apr 27 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS
Status: Verified (was: Fixed)
Bulk verified

Sign in to add a comment