CCD: servod disconnect and reboot on cr50 update |
|||||||||||
Issue descriptionIf cr50 is updated by an AU or recovery, servod will disconnect and the DUT will reboot. Lab infra should support this usecase. It's likely this will require changes to both CCD and servo_micro/v3 flows. cr50 can be updated in recovery, chromeos_install, AU.
,
Apr 17 2018
cr50 is acting as the servo in this case, so when it's updated, it's the equivalent of unplugging and repluggung servo's USB. cr50's security model involves cold reset of ec and ap on reboot/au. This is usually deferred until a reboot happens naturally, after au is applied. A likely effect of this would be that powe_state: off/on will report fail since servod will disconnect on the indicated reboot. I think you are correct in that ssh won't be affected.
,
Apr 17 2018
You can use power_state:ccd_reset after the update to reinitialize all cr50 the usb endpoints in servod. SSH will be fine.
,
May 9 2018
,
May 10 2018
OK. It's not clear to me what the visible impact of a cr50 update is, or whether that impact would belong under the rubric "bug". IIUC, the cr50 update is part of the AU process, so when we run AU (which is periodically), the cr50 will update during the flow. Only the DUT knows when this happens, and the only visible impacts are 1) The reboot after the update will take longer than normal. 2) Some servod impact that I haven't yet grasped. Can someone characterize what servod will do after the cr50 update? What I want to hear is that servod will fail, and shut down. If that's so, than that's good enough for now. Eventually, we may want to be smarter, but I think we'll need some practical experience to discover what "smarter" really means. If the response from servod is somewhat less graceful that "shut down", then we'll probably need changes...
,
May 10 2018
... passing to mruthven@ to answer the question posed on comment #5.
,
May 10 2018
The servo impact is that for ccd, servo uses the cr50 usb endpoints. When cr50 reboots the usb resets and servod can't communicate with the consoles anymore. Servod doesn't automatically recover from cr50 resets at all. All of the controls/consoles just stop responding. It doesn't crash. if you run dut-control power_state:ccd_reset, usb will be reinitialized.
,
May 10 2018
> Servod doesn't automatically recover from cr50 resets at all. > All of the controls/consoles just stop responding. It doesn't crash. If the controls stop responding, does reading them at least return failure? In particular, what if you try to read the 'power_button' control after a cr50 reset? Also, in the 'Servod' class, there's a method named 'hwinit()'. The source is in servo/servo_server.py. That method is designed to reinitialize servo to a known, sane state. Is there any reason that the 'hwinit()' function couldn't include a call to 'self.reinitialize()'? If it did, then the Autotest lab would be guaranteed to recover from a cr50 reset eventually, since we call 'hwinit()' quite regularly. Finally, if you stop and restart servod, will that also fix anything hanging over from a cr50 reset?
,
May 10 2018
I think servod fails with 'Timeout waiting for response' I think hwinit could be modified to run reinitialize and it should work. Yeah restarting servod will reinitialize usb, so it fixes it.
,
May 11 2018
> I think hwinit could be modified to run reinitialize and it should work. Let's start with that: No matter what else we need to do to deal with this problem, making hwinit() fix the problem is going to be a required part of the solution. Once we've made that change, I'd recommend we close this bug, then wait and see. The change would mean that the lab will revert to working order in ordinary operation, regardless of the particulars of how cr50 updates manifest. It's possible (even likely) that cr50 updates will be rare enough that we won't need to do more than that.
,
Jun 26 2018
,
Jul 26
+jrbarnette to update status
,
Aug 8
,
Sep 11
,
Oct 12
Part of 2019 planning. Updating priority.
,
Oct 16
,
Oct 16
,
Oct 16
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by jrbarnette@chromium.org
, Apr 17 2018