New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 755678 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Servo Verify failed in getting lid_open; current Repair strategy can't fix it

Project Member Reported by waihong@chromium.org, Aug 15 2017

Issue description

Saw it happened in devices in the FAFT pool. The host was in a loop of Verify -> Repair -> Verify...

It is because getting the lid_open failed.

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack10-host6/1343227-repair/20171408135152/

08/14 13:52:15.388 ERROR|            repair:0332| Failed: lid_open control is normal
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host
    self.verify(host)
  File "/usr/local/autotest/server/hosts/servo_repair.py", line 249, in verify
    lid_open = host.get_servo().get('lid_open')
  File "/usr/local/autotest/server/cros/servo/servo.py", line 484, in get
    raise error.TestFail(err_msg)
TestFail: Getting 'lid_open' :: Timeout waiting for response.


The lid_open value is obtained via an EC command. Probably the EC UART didn't work during that time (reason unknown). Manually resetting EC (servo resetting DUT) can recover this issue.

In the current CrOS repair strategy, resetting DUT (servo resetting DUT) depends on a health servo, that requires the lid_open work fine. So there is no way to reset DUT to recover this issue.

To fix it, I'd like to apply a new servo repair rule to cold_reset DUT if lid_open fails.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Aug 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/bf6f23c1d9818d3f2e277aa5ee8b475fd337b102

commit bf6f23c1d9818d3f2e277aa5ee8b475fd337b102
Author: Wai-Hong Tam <waihong@google.com>
Date: Thu Aug 17 08:56:58 2017

[autotest] Add rule to reboot DUT when failing to get the lid_open value

Some servo controls, like lid_open, requires communicating with DUT
through EC UART console. Failure of this kinds of controls can be
recovered by rebooting the DUT.

BUG= chromium:755678 
TEST=manual
Injectting code to set lid_open to 'no' in order to trigger the failure
of verifying lid_open, executed the repair process, saw the new added
repair action executed to reboot DUT, and the state of servo host become
good.

Change-Id: I44b14d89c6872e8b5620b320249403f69896132e
Reviewed-on: https://chromium-review.googlesource.com/617447
Commit-Ready: Wai-Hong Tam <waihong@google.com>
Tested-by: Wai-Hong Tam <waihong@google.com>
Reviewed-by: Shelley Chen <shchen@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/bf6f23c1d9818d3f2e277aa5ee8b475fd337b102/server/hosts/servo_repair.py

Status: Fixed (was: Available)

Sign in to add a comment