New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 848443 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Feature
ec



Sign in to add a comment

Dogfood devices can be bricked if EC-RO update is interrupted

Project Member Reported by dnojiri@chromium.org, May 31 2018

Issue description

I have seen a couple of cases I suspect the units were bricked because RO update didn't complete. I guess we haven't cared so much as this doesn't happen in normal use cases.

After failed RO update, EC is still running in RW. So, the device can keep running until it's cold-rebooted. The EC has a chance to detect RO  corruption.

AP reboots should be blocked while flashrom is writing EC_RO. I don't know in what context AU script runs. If it runs in the current user process space, logout should be blocked as well. 

Alternatively and preferably, we can do RO update in software sync context. We show 'critical update' screen and AP and EC are under control. So the chance of RO corruption is much smaller.
 
See go/ec-ro-sync for how we could do RO update in software sync.

Cc: reinauer@chromium.org briannorris@chromium.org jwer...@chromium.org
We've been discussing this problem on Scarlet recently as well. Our idea to fix it was just to only update EC-RO, and let the EC-RW be handled by software sync. The advantage here would be that it is much easier to implement (only needs change to the updater, not the firmware, so we don't need separate handling for older devices), and that we would be dogfooding the exact same update flow in firmware that we'd later have on production devices as well.

Two catches are when the EC is already running in RO (i.e. device is in recovery mode) or the disable software sync GBB flag is set... we'd have to make the updater detect that and just fall back to the current mechanism in that case. All of the problems with bricked we've seen have been on AU, not recovery, so that's probably fine.

Comment 3 by hungte@chromium.org, Jun 11 2018

> we'd have to make the updater detect that and just fall back to the current mechanism in that case

Can we have that implemented in flashrom or by some callback from EC? I'd like to prevent the need of using a more complicated logic in firmware updater script since many (external) developers today still simply do flashrom, and we don't want to create a dependency on 'ectool' program (I'm fine if the condition can be retrieved from something like /sys).
Re#2: If you're making changes to EC-RO, then by definition WP is disabled so you can also change the GBB.  You could clear the GBB flag to re-enable software sync, do the update, then restore the GBB flag.

That seems more straightforward than adding a more complex fallback mechanism (or alternatively, adding some NV flag like gbb_disabled_sw_sync_but_we_know_better_so_do_it_anyway_next_boot)

For the recovery case, just update AP RO+RW.  EC RO+RW will get updated on the next non-recovery boot.  

go/ec-ro-sync intentionally does the EC-RO update from AP-RW to keep AP-RO simple and allow patching bugs in the update process.

> Can we have that implemented in flashrom or by some callback from EC? I'd like to prevent the need of using a more complicated logic in firmware updater script since many (external) developers today still simply do flashrom, and we don't want to create a dependency on 'ectool' program (I'm fine if the condition can be retrieved from something like /sys).

I don't think many people are calling flashrom directly for the EC, and if they do, they probably still want it to flash immediately. I don't think it's a good idea to overload a tool that was really just meant to be a dumb SPI ROM flasher with more hidden magic behavior.

We shouldn't need a dependency on ectool. Whether the EC is in RO and whether software sync is enabled can all be queried from vboot directly. At most, we may need to add some more crossystem flags.

> Re#2: If you're making changes to EC-RO, then by definition WP is disabled so you can also change the GBB.  You could clear the GBB flag to re-enable software sync, do the update, then restore the GBB flag.

Sorry, I misremembered some detail here... I think the actual concern was more about software sync being configured out at compile time, less about GBB flags. In that case, we could still query it from VbSharedData via crossystem.

I'm not sure if this if even a case we need to worry about, tbh. I think systems that are mature enough to be dogfooded and AUed should probably also be able to run software sync already. But if there is any concern in that area, we'd have options to deal with it.

> For the recovery case, just update AP RO+RW.  EC RO+RW will get updated on the next non-recovery boot.

I don't think the EC-RO sync feature is in a useable state at the moment. At least I don't see the ebuilds actually adding the EC RO image to CBFS right now. It's also gated by a GBB flag.

Considering that the updater needs to continue working on all old devices (at least all that support updater4.sh, which has been in use for a long time), and that the recovery use case isn't a problem at the moment anyway, I think it's much easier to just leave recovery working as it is. We don't want to backport the RO sync feature to a hundred old firmware branches just for this. It's better to run the same software sync for dogfooders that we'll later use in production anyway.
> I don't think the EC-RO sync feature is in a useable state at the moment. At least I don't see the ebuilds actually adding the EC RO image to CBFS right now. It's also gated by a GBB flag.

The code is there (and even has unit tests), but I don't think anyone's used it.

> Considering that the updater needs to continue working on all old devices (at least all that support updater4.sh, which has been in use for a long time), and that the recovery use case isn't a problem at the moment anyway, I think it's much easier to just leave recovery working as it is. We don't want to backport the RO sync feature to a hundred old firmware branches just for this. It's better to run the same software sync for dogfooders that we'll later use in production anyway.

SGTM for recovery.  Agree backporting to launched devices is a waste of time; we can't change RO on those unless they're RMA'd, and we're not going to change the factory/RMA image for those devices.

Sign in to add a comment