Change global_config to enable ssh tunnel for servo by default |
||||||||||
Issue descriptionNot sure when this started - I had recovered my Goobuntu station few weeks ago and was OOO for the past week. Tried from other Test team members machines, and failed same way. DUT and servo are ping-able and I am able to login through ssh successfully. test_that fails (cr) ((321e2f8...)) kalin@kalin ~/trunk/src/scripts $ test_that --autotest_dir ~/trunk/src/third_party/autotest/files/ --board=peppy chromeos1-row1-rack4-host2 platform_ExternalUsbPeripherals.detect.login_unplug_plug INFO:root:Identity added: /tmp/test_that_results_jWIJfB/testing_rsa (/tmp/test_that_results_jWIJfB/testing_rsa) 12:36:00 INFO | Began logging to /tmp/test_that_results_jWIJfB Adding labels [u'cros-version:ad_hoc_build', u'board:peppy'] to host chromeos1-row1-rack4-host2 14:36:00 INFO | Fetching suite for job named platform_ExternalUsbPeripherals.detect.login_unplug_plug... 14:36:05 INFO | Scheduling suite for job named platform_ExternalUsbPeripherals.detect.login_unplug_plug... 14:36:05 INFO | ... scheduled 1 job(s). 14:36:06 INFO | autoserv| Results placed in /tmp/test_that_results_jWIJfB/results-1-platform_ExternalUsbPeripherals.detect.login_unplug_plug 14:36:06 INFO | autoserv| Logged pid 12915 to /tmp/test_that_results_jWIJfB/results-1-platform_ExternalUsbPeripherals.detect.login_unplug_plug/.autoserv_execute 14:36:06 INFO | autoserv| Starting new HTTP connection (1): metadata.google.internal 14:36:06 INFO | autoserv| Configuration file does not exist, ignoring: /etc/chrome-infra/ts-mon.json 14:36:06 INFO | autoserv| ts_mon monitoring is disabled because the endpoint provided is invalid or not supported: 14:36:06 INFO | autoserv| ts_mon was set up. 14:36:06 INFO | autoserv| I am PID 12915 14:36:06 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_toD8OWssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack4-host2' 14:36:11 INFO | autoserv| get_network_stats: at-start RXbytes 6948907 TXbytes 871208 14:36:11 INFO | autoserv| Not checking if job_repo_url contains autotest packages on ['chromeos1-row1-rack4-host2'] 14:36:11 INFO | autoserv| Processing control file 14:36:11 INFO | autoserv| Verifying this condition: host is available via ssh 14:36:11 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_dJ3Fdvssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack4-host2-servo' 14:36:12 INFO | autoserv| No failed triggers, skipping repair: Power cycle the host with RPM 14:36:12 INFO | autoserv| Verifying this condition: servo BOARD setting is correct 14:36:12 INFO | autoserv| Verifying this condition: servo SERIAL setting is correct 14:36:12 INFO | autoserv| Verifying this condition: servod upstart job is running 14:36:12 INFO | autoserv| Verifying this condition: servod service is taking calls 14:36:12 INFO | autoserv| Failed: servod service is taking calls 14:36:12 INFO | autoserv| Traceback (most recent call last): 14:36:12 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/client/common_lib/hosts/repair.py", line 329, in _verify_host 14:36:12 INFO | autoserv| self.verify(host) 14:36:12 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/hosts/servo_repair.py", line 211, in verify 14:36:12 INFO | autoserv| host.connect_servo() 14:36:12 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/hosts/servo_host.py", line 133, in connect_servo 14:36:12 INFO | autoserv| timeout_sec=self.INITIALIZE_SERVO_TIMEOUT_SECS) 14:36:12 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/client/common_lib/cros/retry.py", line 123, in timeout 14:36:12 INFO | autoserv| default_result = func(*args, **kwargs) 14:36:12 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/cros/servo/servo.py", line 218, in initialize_dut 14:36:12 INFO | autoserv| self._server.hwinit() 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1240, in __call__ 14:36:12 INFO | autoserv| return self.__send(self.__name, args) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1599, in __request 14:36:12 INFO | autoserv| verbose=self.__verbose 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1280, in request 14:36:12 INFO | autoserv| return self.single_request(host, handler, request_body, verbose) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1308, in single_request 14:36:12 INFO | autoserv| self.send_content(h, request_body) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1456, in send_content 14:36:12 INFO | autoserv| connection.endheaders(request_body) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 1049, in endheaders 14:36:12 INFO | autoserv| self._send_output(message_body) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 893, in _send_output 14:36:12 INFO | autoserv| self.send(msg) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 855, in send 14:36:12 INFO | autoserv| self.connect() 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 832, in connect 14:36:12 INFO | autoserv| self.timeout, self.source_address) 14:36:12 INFO | autoserv| File "/usr/lib64/python2.7/socket.py", line 575, in create_connection 14:36:12 INFO | autoserv| raise err 14:36:12 INFO | autoserv| error: [Errno 113] No route to host 14:36:12 INFO | autoserv| Skipping this operation: pwr_button control is normal 14:36:12 INFO | autoserv| Skipping this operation: lid_open control is normal 14:36:12 INFO | autoserv| Attempting this repair action: Start servod with the proper config settings. 14:36:13 INFO | autoserv| Board for DUT is unknown; starting servod assuming a pre-configured board. 14:36:33 INFO | autoserv| Verifying this condition: host is available via ssh 14:36:33 INFO | autoserv| Verifying this condition: servo BOARD setting is correct 14:36:33 INFO | autoserv| Verifying this condition: servo SERIAL setting is correct 14:36:33 INFO | autoserv| Verifying this condition: servod upstart job is running 14:36:34 INFO | autoserv| Verifying this condition: servod service is taking calls 14:36:34 INFO | autoserv| Failed: servod service is taking calls 14:36:34 INFO | autoserv| Traceback (most recent call last): 14:36:34 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/client/common_lib/hosts/repair.py", line 329, in _verify_host 14:36:34 INFO | autoserv| self.verify(host) 14:36:34 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/hosts/servo_repair.py", line 211, in verify 14:36:34 INFO | autoserv| host.connect_servo() 14:36:34 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/hosts/servo_host.py", line 133, in connect_servo 14:36:34 INFO | autoserv| timeout_sec=self.INITIALIZE_SERVO_TIMEOUT_SECS) 14:36:34 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/client/common_lib/cros/retry.py", line 123, in timeout 14:36:34 INFO | autoserv| default_result = func(*args, **kwargs) 14:36:34 INFO | autoserv| File "/mnt/host/source/src/third_party/autotest/files/server/cros/servo/servo.py", line 218, in initialize_dut 14:36:34 INFO | autoserv| self._server.hwinit() 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1240, in __call__ 14:36:34 INFO | autoserv| return self.__send(self.__name, args) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1599, in __request 14:36:34 INFO | autoserv| verbose=self.__verbose 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1280, in request 14:36:34 INFO | autoserv| return self.single_request(host, handler, request_body, verbose) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1308, in single_request 14:36:34 INFO | autoserv| self.send_content(h, request_body) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/xmlrpclib.py", line 1456, in send_content 14:36:34 INFO | autoserv| connection.endheaders(request_body) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 1049, in endheaders 14:36:34 INFO | autoserv| self._send_output(message_body) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 893, in _send_output 14:36:34 INFO | autoserv| self.send(msg) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 855, in send 14:36:34 INFO | autoserv| self.connect() 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/httplib.py", line 832, in connect 14:36:34 INFO | autoserv| self.timeout, self.source_address) 14:36:34 INFO | autoserv| File "/usr/lib64/python2.7/socket.py", line 575, in create_connection 14:36:34 INFO | autoserv| raise err 14:36:34 INFO | autoserv| error: [Errno 113] No route to host ... 14:40:53 INFO | autoserv| File "/home/kalin/trunk/src/third_party/autotest/files/server/autoserv", line 558, in run_autoserv 14:40:53 INFO | autoserv| sys.exit(exit_code) 14:40:53 INFO | autoserv| SystemExit: 1 14:40:53 INFO | autoserv| record_state_duration failed: job_or_task_id=None, hostname=chromeos1-row1-rack4-host2, status=Running ------------------------------------------------------------------------------------------------------------ /tmp/test_that_results_jWIJfB/results-1-platform_ExternalUsbPeripherals.detect.login_unplug_plug [ FAILED ] ------------------------------------------------------------------------------------------------------------ Total PASS: 0/1 (0%) 14:40:54 ERROR| Autoserv encountered unexpected errors when executing jobs. 14:40:54 INFO | Finished running tests. Results can be found in /tmp/test_that_results_jWIJfB or /tmp/test_that_latest Full test log is attached
,
Aug 14 2017
,
Aug 14 2017
,
Aug 15 2017
Starting with this:
14:36:12 INFO | autoserv| Verifying this condition: servod service is taking calls
14:36:12 INFO | autoserv| Failed: servod service is taking calls
Also checking this:
$ servo-stat chromeos1-row1-rack4-host2
chromeos1-row1-rack4-host2 ...ABDEFGH not running servod BOARD=peppy CHROMEOS_RELEASE_VERSION=9672.0.0
Both of those indicate that servod has failed on the DUT's servo.
There's a reasonable chance that restarting servod on the DUT will
fix the problem, at least for a while. That can be done manually;
it should also happen if you click 'Repair' for the DUT.
CrOS Infra doesn't maintain either the DUTs or the servos, so whatever
work is done, it needs to come from whoever maintains the 'chromeos1'
lab.
,
Aug 15 2017
I remember I saw servod running on this servo host. Now I can confirm, without restarting or so actions: $ ssh root@chromeos1-row1-rack4-host2-servo.cros localhost ~ # ps ux | grep servod root 1036 0.1 7.9 48116 19552 ttyO1 Ssl+ Aug14 1:36 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board peppy --port 9999 root 1103 0.0 5.9 29336 14680 ttyO1 S+ Aug14 0:06 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board peppy --port 9999 root 1104 0.0 6.0 29348 14892 ttyO1 S+ Aug14 0:03 /usr/bin/python2.7 /usr/lib/python-exec/python2.7/servod --host 0.0.0.0 --board peppy --port 9999 root 15087 0.0 0.1 1424 372 pts/1 S+ 16:14 0:00 grep --colour=auto servod localhost ~ # dut-control lid_open lid_open:yes
,
Aug 15 2017
,
Aug 15 2017
Also, it is not isolated to the specific host. Same is observed when running test_that against other boards(e.g. chromeos1-row1-rack3-host2), where servod is running.
,
Aug 15 2017
looks like servo-stat script is not correct and needs to be fixed. status servod return a different string then expected: # status servod servod (9999) start/running, process 15425 status servod expects 'servod start/running' I am not sure if test_that uses status command to determine servod status.
,
Aug 15 2017
looks like the problem is this $ dut-control -s chromeos1-row1-rack4-host2-servo -p 9999 pwr_button No route to host but works fine when ssh to servo $ ssh root@chromeos1-row1-rack4-host2-servo dut-control pwr_button pwr_button:release somehow dut-control can locate chromeos1-row1-rack4-host2-servo. If we are running test_that, can should have a way to by pass the check of servod and just run the tests ?
,
Aug 15 2017
Looks like dut-control is trying to setup a xmlrpc connect and basically $ wget http://chromeos1-row1-rack4-host2-servo.cros.corp.google.com:9999 --2017-08-15 12:32:29-- http://chromeos1-row1-rack4-host2-servo.cros.corp.google.com:9999/ Resolving chromeos1-row1-rack4-host2-servo.cros.corp.google.com (chromeos1-row1-rack4-host2-servo.cros.corp.google.com)... 172.27.212.80 Connecting to chromeos1-row1-rack4-host2-servo.cros.corp.google.com (chromeos1-row1-rack4-host2-servo.cros.corp.google.com)|172.27.212.80|:9999... failed: No route to host.
,
Aug 15 2017
this might work ssh -N -L 9999:localhost:9999 root@chromeos1-row1-rack4-host2-servo.cros.corp.google.com $ dut-control pwr_button pwr_button:release
,
Aug 15 2017
but I am not sure how to tie that back to test_that. Any idea ?
,
Aug 21 2017
Pinging this bug. There is still no way to run servo based server side tests.
,
Aug 24 2017
> this might work > ssh -N -L 9999:localhost:9999 root@chromeos1-row1-rack4-host2-servo.cros.corp.google.com Autotest will do this automatically. In the autotest directory from where you run 'test_that', add this to shadow_config.ini (or to global_config.ini): enable_ssh_tunnel_for_servo: True Assuming that works, we need to figure out why that setting isn't present for you already.
,
Aug 24 2017
<sigh> In global_config.ini, there's this: # Flags to enable/disable SSH tunnel connection for servo host. enable_ssh_tunnel_for_servo: False Which would explain how it's set wrong when running test_that.
,
Oct 9 2017
jrbarnett@ - should a change be submitted to update global_config.ini with enable_ssh_tunnel_for_servo: True ? I am running into the same issue but updating this value fixes the issue... https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/global_config.ini?l=381
,
Nov 27 2017
OK. I'm hearing that the workaround of changing the setting in global_config.ini works. As a fix, it's imperfect, but I think that that change is the right thing to do for now.
,
Nov 28 2017
How are folks supposed to find out about this workaround? Can we at least log something that makes this obvious?
,
Nov 29 2017
> How are folks supposed to find out about this workaround?
Realistically, they aren't; that's why we have to fix this bug.
> Can we at least log something that makes this obvious?
That's (substantially) more expensive than the fix:
crosreview.com/797290
,
Nov 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/20e1c47af7b77f14cafe35a42d5ec57d1ec808c0 commit 20e1c47af7b77f14cafe35a42d5ec57d1ec808c0 Author: Richard Barnette <jrbarnette@chromium.org> Date: Thu Nov 30 09:13:15 2017 [autotest] Enable servo ssh tunnel by default This changes the config setting for `enable_ssh_tunnel_for_servo` to default to True. This setting is required if the target DUT is in the test lab. For DUTs at a user's desktop, the setting still works, although it's unnecessary. BUG= chromium:755267 TEST=None Change-Id: I009fed8d83210f198f362ca6ae1b0ea8c4a82cba Reviewed-on: https://chromium-review.googlesource.com/797290 Commit-Ready: Richard Barnette <jrbarnette@chromium.org> Tested-by: Richard Barnette <jrbarnette@chromium.org> Reviewed-by: danny chan <dchan@chromium.org> [modify] https://crrev.com/20e1c47af7b77f14cafe35a42d5ec57d1ec808c0/global_config.ini
,
Nov 30 2017
,
Jul 30
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by ka...@chromium.org
, Aug 14 2017