Saw it happened in devices in the FAFT pool. The host was in a loop of Verify -> Repair -> Verify...
It is because getting the lid_open failed.
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack10-host6/1343227-repair/20171408135152/
08/14 13:52:15.388 ERROR| repair:0332| Failed: lid_open control is normal
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
self.verify(host)
File "/usr/local/autotest/server/hosts/servo_repair.py", line 249, in verify
lid_open = host.get_servo().get('lid_open')
File "/usr/local/autotest/server/cros/servo/servo.py", line 484, in get
raise error.TestFail(err_msg)
TestFail: Getting 'lid_open' :: Timeout waiting for response.
The lid_open value is obtained via an EC command. Probably the EC UART didn't work during that time (reason unknown). Manually resetting EC (servo resetting DUT) can recover this issue.
In the current CrOS repair strategy, resetting DUT (servo resetting DUT) depends on a health servo, that requires the lid_open work fine. So there is no way to reset DUT to recover this issue.
To fix it, I'd like to apply a new servo repair rule to cold_reset DUT if lid_open fails.
Comment 1 by bugdroid1@chromium.org
, Aug 17 2017