New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878012 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug


Participants' hotlists:
SIE-infra-request


Sign in to add a comment

New device deployment should be equivalent to repair.

Project Member Reported by nsanders@chromium.org, Aug 27

Issue description

Currently, device repair supports USB recovery but not reflashing corrupted firmware.

For this to work reliably withing ATL: 

* FAFT's servo flashing infrastructure would need generalization to the lab, since the flashing commands depend on the device type and servo type.

* Serial number, hwid, vpd are stored in firmware and not backed up elsewhere, and referenced by autotest infra, and if they are deleted they can't easily be recovered. So the corrupted firmware should be saved before recovery, and the data restored after repair. Or, the known good firmware should be backed up during deployment for easy restore.
 
> Currently, device repair supports USB recovery but not reflashing corrupted firmware.

There _is_ a repair procedure that will reflash corrupted firmware.  The procedure
is only applied to FAFT devices, because those are the only ones expected to need it.

We could enable the procedure more broadly, but without more sophistication,
the result would be that any device that went offline would see its firmware
re-flashed every few hours until the device were fixed.  That could go on for
day or weeks, even months.  That would mean that some devices, after failing,
could find themselves undergoing 500-1000 (or more) extra FPROM write cycles.

Cc: dchan@chromium.org
Labels: labstation
Full recovery from any level of disaster can be performed by:

* extracting a good firmware firmware from a known good image through chromeos-firmwareupdate
* extracting the broken firmware from the DUT
* copying the vpd and hwid to the known good firmware using vpd -f file, gbb_utilty
* flashing the good firmware (maybe ec too) back
* booting from recovery and clearing TPM


Owner: ----
Status: Available (was: Untriaged)
Summary: New device deployment should be equivalent to repair. (was: ATL: firmware recovery via servo should be automated)
> Full recovery from any level of disaster can be performed by:

As noted, there's existing code in Autotest that knows how to perform
all of the necessary steps to update firmware from a known good
version to a DUT using servo.  The code is in use for FAFT DUTs now.
Based on the state of existing FAFT DUTs, that update process already
preserves VPD and HWID.

To be clear:  The work required here has very little to do with code
for flashing firmware via servo:  All of the necessary pieces are
available, in use, and apparently stable.  What's needed is basically the
following:
  * Add code capable of detecting when firmware is corrupted.
  * To the standard repair flow, add a check so that if firmware is
    corrupted, it will use the existing FAFT repair procedure to install
    the lab-standard firmware version for the DUT.  Then it must invoke
    installation from USB, which already triggers recovery mode and clears
    the TPM.
  * Change the `deploy` script to trigger repair, rather than a custom
    installation flow.

The above summary is slightly simplified, but it captures the most
important work.

Cc: englab-sys-cros@google.com
Owner: stagenut@chromium.org
Status: Assigned (was: Available)
Planned for Q4.

Sign in to add a comment