New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 624966 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

[infra] Attempt to repair servo is failing servo based tests, and causes false positive results.

Project Member Reported by ka...@chromium.org, Jun 30 2016

Issue description

Servo based tests fail as servo repair is being attempted with no evident reason.

Running test locally from chroot:

17:27:22 INFO | autoserv| Host (ssh) chromeos1-row1-rack3-host6-servo is alive
17:27:23 INFO | autoserv| Applying an update to the servo host, if necessary.
17:27:26 INFO | autoserv| servo host chromeos1-row1-rack3-host6-servo does not require an update.
17:27:26 INFO | autoserv| servod is running, PID=24447
17:27:26 INFO | autoserv| Attempting to repair servo host chromeos1-row1-rack3-host6-servo.
17:27:26 INFO | autoserv| START	----	reboot	timestamp=1467325646	localtime=Jun 30 17:27:26
17:27:26 INFO | autoserv| GOOD	----	reboot.start	timestamp=1467325646	localtime=Jun 30 17:27:26
17:28:22 INFO | autoserv| [stderr] mux_client_request_session: read from master failed: Broken pipe
17:28:22 INFO | autoserv| [stderr] Warning: Permanently added 'chromeos1-row1-rack3-host6-servo,172.27.212.76' (RSA) to the list of known hosts.
17:28:22 INFO | autoserv| [stderr] Warning: Permanently added 'chromeos1-row1-rack3-host6-servo,172.27.212.76' (RSA) to the list of known hosts.
17:28:23 INFO | autoserv| Master ssh connection to chromeos1-row1-rack3-host6-servo is down.
17:28:23 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_sBFBwxssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack3-host6-servo'
17:28:24 INFO | autoserv| GOOD	----	reboot.verify	timestamp=1467325704	localtime=Jun 30 17:28:24
17:28:24 INFO | autoserv| END GOOD	----	reboot	kernel=4.4.4	localtime=Jun 30 17:28:24	timestamp=1467325704
17:28:44 INFO | autoserv| Pinging host chromeos1-row1-rack3-host6-servo
17:28:44 INFO | autoserv| Host (ssh) chromeos1-row1-rack3-host6-servo is alive
17:28:44 INFO | autoserv| Applying an update to the servo host, if necessary.
17:28:52 INFO | autoserv| servo host chromeos1-row1-rack3-host6-servo does not require an update.
17:28:52 INFO | autoserv| servod is running, PID=598
17:28:52 INFO | autoserv| Failed to repair servo: [Errno 111] Connection refused
17:28:52 INFO | autoserv| Attempting repair via PoE powercycle.
17:29:56 INFO | autoserv| Failed to change outlet status for host: chromeos1-row1-rack3-host6-servo to state: CYCLE.
17:29:56 INFO | autoserv| Failed to repair servo: Power cycling chromeos1-row1-rack3-host6-servo failed: Failed to change outlet status for host: chromeos1-row1-rack3-host6-servo to state: CYCLE.
17:29:56 INFO | autoserv| Exception escaped control file, job aborting:
17:29:56 INFO | autoserv| Traceback (most recent call last):
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/server_job.py", line 690, in run
17:29:56 INFO | autoserv| self._execute_code(server_control_file, namespace)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/server_job.py", line 1187, in _execute_code
17:29:56 INFO | autoserv| execfile(code_file, namespace, namespace)
17:29:56 INFO | autoserv| File "/tmp/test_that_results_abi71L/results-1-platform_ExternalUsbPeripherals.detect/control.srv", line 86, in <module>
17:29:56 INFO | autoserv| parallel_simple(run, machines)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/subcommand.py", line 93, in parallel_simple
17:29:56 INFO | autoserv| function(arg)
17:29:56 INFO | autoserv| File "/tmp/test_that_results_abi71L/results-1-platform_ExternalUsbPeripherals.detect/control.srv", line 44, in run
17:29:56 INFO | autoserv| host = hosts.create_host(machine, servo_args=servo_args)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/factory.py", line 160, in create_host
17:29:56 INFO | autoserv| host_instance = host_class(hostname, **args)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/base_classes.py", line 55, in __init__
17:29:56 INFO | autoserv| super(Host, self).__init__(*args, **dargs)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__
17:29:56 INFO | autoserv| self._initialize(*args, **dargs)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/cros_host.py", line 306, in _initialize
17:29:56 INFO | autoserv| try_lab_servo=try_lab_servo)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/servo_host.py", line 803, in create_servo_host
17:29:56 INFO | autoserv| required_by_test=required_by_test)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/base_classes.py", line 55, in __init__
17:29:56 INFO | autoserv| super(Host, self).__init__(*args, **dargs)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__
17:29:56 INFO | autoserv| self._initialize(*args, **dargs)
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/servo_host.py", line 173, in _initialize
17:29:56 INFO | autoserv| self.repair()
17:29:56 INFO | autoserv| File "/build/squawks/usr/local/build/autotest/server/hosts/servo_host.py", line 718, in repair
17:29:56 INFO | autoserv| '\n'.join(errors))
17:29:56 INFO | autoserv| ServoHostRepairTotalFailure: All attempts at repairing the servo failed:
17:29:56 INFO | autoserv| [Errno 111] Connection refused
17:29:56 INFO | autoserv| Power cycling chromeos1-row1-rack3-host6-servo failed: Failed to change outlet status for host: chromeos1-row1-rack3-host6-servo to state: CYCLE.
17:29:56 INFO | autoserv| INFO	----	----	timestamp=1467325796	job_abort_reason=All attempts at repairing the servo failed: [Errno 111] Connection refused Power cycling chromeos1-row1-rack3-host6-servo failed: Failed to change outlet status for host: chromeos1-row1-rack3-host6-servo to state: CYCLE.	localtime=Jun 30 17:29:56	All attempts at repairing the servo failed:
17:29:56 INFO | autoserv| [Errno 111] Connection refused


Same is happening with scheduled jobs:
Results look good: https://screenshot.googleplex.com/5VWD7cYm6FD

When drill don, tests show as boot.0 as PASSED: https://screenshot.googleplex.com/sg08FxsjuAK

The logs look same(with exception of failed to start XMLRPC server, which should probably not be attempted) - https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/68147789-chromeos-test/chromeos1-row1-rack4-host4/debug/


 

Comment 1 by ka...@chromium.org, Jun 30 2016

Checking on servo host for the tested device:

$ ssh root@chromeos1-row1-rack3-host6-servo
The authenticity of host 'chromeos1-row1-rack3-host6-servo (172.27.212.76)' can't be established.
RSA key fingerprint is SHA256:KCePOPnn92zC0xSXCpe0rWawnrvEDJhTNGPy1PP+V1I.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'chromeos1-row1-rack3-host6-servo,172.27.212.76' (RSA) to the list of known hosts.
localhost ~ # cat /etc/lsb-release 
CHROMEOS_RELEASE_APPID={1BB651DD-C762-3FCF-2A66-CEB4C1096BB1}
CHROMEOS_BOARD_APPID={1BB651DD-C762-3FCF-2A66-CEB4C1096BB1}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
DEVICETYPE=OTHER
CHROMEOS_RELEASE_BOARD=beaglebone_servo
CHROMEOS_DEVSERVER=
GOOGLE_RELEASE=8489.0.0
CHROMEOS_RELEASE_BUILD_NUMBER=8489
CHROMEOS_RELEASE_BRANCH_NUMBER=0
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=dev-channel
CHROMEOS_RELEASE_DESCRIPTION=8489.0.0 (Official Build) dev-channel beaglebone_servo test
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_VERSION=8489.0.0
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
localhost ~ # ps ux | grep servod
root     11413  101  7.6  29168 18776 ?        Rsl  23:21   0:08 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board samus --port 9999
root     11424  0.0  0.1   1500   352 pts/0    S+   23:22   0:00 grep --colour=auto servod
localhost ~ # 

All seems good, but servo-stats fails it
$ servo-stat chromeos1-row1-rack3-host6
chromeos1-row1-rack3-host6 ...ABDEFGH servod failed BOARD=samus CHROMEOS_RELEASE_VERSION=8489.0.0

Comment 2 by ka...@chromium.org, Jul 1 2016

Labels: -Pri-1 Pri-2
Status: Verified (was: Untriaged)
Shrawan re-cabled / restarted servo and DUT, Now it looks good and test is running fine.
Thanks Shrawan.

$ ssh root@chromeos1-row1-rack3-host6-servo
localhost ~ # ps ux | grep servod
root       412  1.5  7.8  30708 19292 ttyO1    Ssl+ 23:44   0:11 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board samus --port 9999
root       434  0.1  6.0  29672 14888 ttyO1    S+   23:44   0:00 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board samus --port 9999
root       435  0.1  6.1  29684 15032 ttyO1    S+   23:44   0:00 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board samus --port 9999
root       610  0.0  0.1   1504   408 pts/1    S+   23:56   0:00 grep --colour=auto servod
localhost ~ # dut-control lid_open
lid_open:yes
localhost ~ # exit
logout
Connection to chromeos1-row1-rack3-host6-servo closed.
kalin@kalin:~$ servo-stat chromeos1-row1-rack3-host6
chromeos1-row1-rack3-host6 ...ABCDEFG is up BOARD=samus CHROMEOS_RELEASE_VERSION=8489.0.0
kalin@kalin:~$ 

Sign in to add a comment