vboot aux-FW update gets into a reboot loop when TCPC is busy and VBOOT_OPROM_MATTERS |
|||
Issue descriptionThe basic vboot aux-FW update flow works like this: 1. check all TCPCs if they need an update (check_vboot_aux_fw()) 2. update all TCPCs that need an update (update_vboot_aux_fw()) 2.1 protect TCPC from access until next reboot after updating If CONFIG_VBOOT_OPROM_MATTERS is selected in coreboot (i.e. display init is skipped on normal mode boot), this flow changes to the following: 1. check all TCPCs if they need an update (check_vboot_aux_fw()) 2. if any TCPC needs a SLOW_UPDATE and oprom_loaded is 0 (i.e. display init was skipped this boot), set oprom_needed flag in NVRAM and reboot 3. update all TCPCs that need an update (update_vboot_aux_fw()) 3.1 protect TCPC from access until next reboot after updating 4. if oprom_loaded is 1 (for no other reason than TCPC slow update, i.e. we're not in dev mode), clear oprom_needed flag in NVRAM and reboot If one of the TCPCs is currently powering the device and there's not enough juice in the battery to survive a power glitch, the EC will refuse updating that TCPC. The basic flow changes to this: 1. check all TCPCs if they need an update (check_vboot_aux_fw()) 2. update all TCPCs that need an update (update_vboot_aux_fw()) 2.1 if TCPC's update_image() returns VBERROR_PERIPHERAL_BUSY, ignore that TCPC and just continue as if update was successful for this boot (will attempt update again on next boot) 2.2 protect TCPC from access until next reboot after updating I don't think these two edge cases play well with each other. When you add them both together, you get: 1. check all TCPCs if they need an update (check_vboot_aux_fw()) 2. if any TCPC needs a SLOW_UPDATE and oprom_loaded is 0 (i.e. display init was skipped this boot), set oprom_needed flag in NVRAM and reboot 3. update all TCPCs that need an update (update_vboot_aux_fw()) 3.1 if TCPC's update_image() returns VBERROR_PERIPHERAL_BUSY, ignore that TCPC and just continue as if update was successful for this boot (will attempt update again on next boot) 3.2 protect TCPC from access until next reboot after updating 4. if oprom_loaded is 1 (for no other reason than TCPC slow update, i.e. we're not in dev mode), clear oprom_needed flag in NVRAM and reboot Meaning if your TCPC is always busy (e.g. battery dead/disconnected and you're powered through a port with outdated TCPC firmware), you get this: 1. boot without display init 2. oh look, my TCPC firmware is outdated! 3. reboot with display init to update TCPC 4. try to update TCPC... oh, wait, it's busy 5. reboot without display init to boot normally 6. goto 2 VBOOT_OPROM_MATTERS is set on Apollolake (which I think includes GLK?)... so I think those platforms should all be affected by this. Has nobody ever noticed this? I guess once you have batteries on most devices it's a very unlikely case to hit, and will always resolve itself eventually after trickle-charging... The fix is a little tricky... I think the cleanest solution would be if we could already return VBERROR_PERIPHERAL_BUSY from check_hash(), not just on update_hash(). That means we either have to send a quick pd_suspend()/pd_resume() to the EC in every check_hash() function (if the version differs), or we'll have to invent some new command to read the BUSY status from the EC. (We can probably not include it in ec_tunnel_status(), which the check_hash() functions already test, because that's only specific to the bus but not to the individual TCPC.)
,
Nov 2
,
Nov 6
In the mixed scenario (VBERROR_PERIPHERAL_BUSY && CONFIG_VBOOT_OPROM_MATTERS), why do we need to reboot into normal mode again after "finishing" the aux_fw update (i.e. skipping the aux_fx due to a busy return). Why can't we just continue the boot into the OS at that point with display drivers loaded into the FW? (it is most likely something we cannot change, I just want to understand better)
,
Nov 6
> Why can't we just continue the boot into the OS at that point with display drivers loaded into the FW? This is a remnant of back when OPROM_MATTERS still meant an actual x86 PCI Option ROM (i.e. a blob of code provided by the graphics adapter vendor that we had no control over). As an extra security precaution, we didn't want to boot the OS in verified mode after we had loaded such potentially less trustworthy code. These days, I'm not sure if that situation still applies anywhere, so maybe we could take it out. But it doesn't really hurt anything either since slow EC updates are rare enough that we don't really care about the extra second. It would fix this bug, but I still think a better fix would be to just never go into the update path in the first place if the TCPC is busy.
,
Nov 6
Thanks for the explanation in #4, that makes sense. I think doing an earlier pd_suspense and use that result to decide whether or not to reboot with graphics makes sense to me.
,
Nov 10
|
|||
►
Sign in to add a comment |
|||
Comment 1 by reinauer@chromium.org
, Nov 2