test_that attempts servo update for servo tests that use lab DUTs |
|||||||||
Issue description(cr) ((d880765...)) kalin@kalin ~/trunk/src/scripts $ test_that --autotest_dir ~/trunk/src/third_party/autotest/files/ --board=squawks chromeos2-row10-rack10-host13 platform_SuspendResumeTiming WARNING:root:Failed to import ts_mon, monitoring is disabled: No module named urllib.parse INFO:root:Identity added: /tmp/test_that_results_H19VJ3/testing_rsa (/tmp/test_that_results_H19VJ3/testing_rsa) 18:29:06 INFO | Began logging to /tmp/test_that_results_H19VJ3 Adding labels [u'cros-version:ad_hoc_build', u'board:squawks'] to host chromeos2-row10-rack10-host13 20:29:07 INFO | Fetching suite for job named platform_SuspendResumeTiming... 20:29:10 INFO | Scheduling suite for job named platform_SuspendResumeTiming... 20:29:10 INFO | ... scheduled 1 job(s). 20:29:10 INFO | autoserv| WARNING:root:Failed to import ts_mon, monitoring is disabled: No module named urllib.parse 20:29:10 INFO | autoserv| Results placed in /tmp/test_that_results_H19VJ3/results-1-platform_SuspendResumeTiming 20:29:10 INFO | autoserv| Logged pid 329 to /tmp/test_that_results_H19VJ3/results-1-platform_SuspendResumeTiming/.autoserv_execute 20:29:10 INFO | autoserv| I am PID 329 20:29:10 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_TFNpPwssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpJrzlUN -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row10-rack10-host13' 20:29:15 INFO | autoserv| Chameleon chromeos2-row10-rack10-host13-chameleon is not accessible. Please file a bug to test lab 20:29:15 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_JjMvZissh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row10-rack10-host13' 20:29:15 INFO | autoserv| get_network_stats: at-start RXbytes 83941685 TXbytes 18149066 20:29:15 INFO | autoserv| Not checking if job_repo_url contains autotest packages on ['chromeos2-row10-rack10-host13'] 20:29:15 INFO | autoserv| Processing control file 20:29:15 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_taUmHxssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpFSK9Fb -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row10-rack10-host13' 20:29:17 INFO | autoserv| Verifying this condition: host is available via ssh 20:29:17 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_n6306Pssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row10-rack10-host13-servo' 20:29:17 INFO | autoserv| No failed triggers, skipping repair: Power cycle the host with RPM 20:29:17 INFO | autoserv| Verifying this condition: servo BOARD setting is correct 20:29:17 INFO | autoserv| Verifying this condition: servo SERIAL setting is correct 20:29:17 INFO | autoserv| Verifying this condition: servod upstart job is running 20:29:18 INFO | autoserv| Verifying this condition: servod service is taking calls 20:29:18 INFO | autoserv| Waiting 60 seconds for XMLRPC server to start. 20:29:18 INFO | autoserv| <class 'socket.error'>([Errno 111] Connection refused) 20:29:18 INFO | autoserv| Retrying in 1.430969 seconds... 20:29:19 INFO | autoserv| XMLRPC server started successfully. 20:29:24 INFO | autoserv| Setting usb_mux_oe1 to on 20:29:26 INFO | autoserv| Verifying this condition: pwr_button control is normal 20:29:26 INFO | autoserv| Verifying this condition: lid_open control is normal 20:29:27 INFO | autoserv| No failed triggers, skipping repair: Start servod with the proper config settings. 20:29:27 INFO | autoserv| No failed triggers, skipping repair: Wait for update, then reboot servo host. 20:29:27 INFO | autoserv| Verifying this condition: servo host software is up-to-date 20:29:27 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status 20:29:27 INFO | autoserv| * Command: 20:29:27 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"' 20:29:27 INFO | autoserv| Exit status: 255 20:29:27 INFO | autoserv| Duration: 0.251973867416 20:29:27 INFO | autoserv| ) 20:29:27 INFO | autoserv| Retrying in 2.231108 seconds... 20:29:30 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status 20:29:30 INFO | autoserv| * Command: 20:29:30 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"' 20:29:30 INFO | autoserv| Exit status: 255 20:29:30 INFO | autoserv| Duration: 0.252958059311 20:29:30 INFO | autoserv| ) 20:29:30 INFO | autoserv| Retrying in 3.515122 seconds... 20:29:33 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status 20:29:33 INFO | autoserv| * Command: 20:29:33 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"' 20:29:33 INFO | autoserv| Exit status: 255 20:29:33 INFO | autoserv| Duration: 0.24486207962 20:29:33 INFO | autoserv| ) 20:29:33 INFO | autoserv| Retrying in 2.322010 seconds... 20:29:36 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status 20:29:36 INFO | autoserv| * Command: 20:29:36 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"' 20:29:36 INFO | autoserv| Exit status: 255 20:29:36 INFO | autoserv| Duration: 0.246438980103 20:29:36 INFO | autoserv| ) 20:29:36 INFO | autoserv| Retrying in 3.505334 seconds... 20:29:40 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status 20:29:40 INFO | autoserv| * Command: 20:29:40 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"' 20:29:40 INFO | autoserv| Exit status: 255 20:29:40 INFO | autoserv| Duration: 0.24378991127 20:29:40 INFO | autoserv| ) 20:29:40 INFO | autoserv| Retrying in 2.897331 seconds... ^C20:29:41 WARNI| Received SIGINT or SIGTERM. Cleaning up and exiting. 20:29:41 WARNI| Sending SIGINT to autoserv process. Waiting up to 5 seconds for cleanup. As noted before - It seems local issue to my repo/chroot/setup, as harpreet@ successfully runs a test against same local DUT and servo hosts. ssh to dev server is good: (cr) ((d880765...)) kalin@kalin ~/trunk/src/scripts $ ssh 100.115.219.133 Last login: Wed Dec 7 12:49:49 2016 from jrbarnette.mtv.corp.google.com chromeos-test@chromeos4-devserver5:~$ My ~/.ssh/config content: 1 Host * 2 ForwardAgent yes 3 PreferredAuthentications publickey,gssapi-with-mic,hostbased,keyboard-interactive,password 4 VerifyHostKeyDNS no 5 StrictHostKeyChecking no 6 LogLevel quiet 7 UserKnownHostsFile /dev/null 8 9 Host 100.107.160.* 100.115.245.* 172.17.40.* 100.115.219.* 100.115.245.199 10 User chromeos-test 11 IdentityFile %d/.ssh/chromium 12 Protocol 2 13 StrictHostKeyChecking no 14 LogLevel quiet 15 UserKnownHostsFile /dev/null
,
Dec 19 2016
,
Jan 4 2017
,
Jan 31 2017
We already have the ability to distinguish "running in the lab", I will fix this.
,
Feb 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/10ad79bfccf045d0ff3978ac63241ca374da0bc5 commit 10ad79bfccf045d0ff3978ac63241ca374da0bc5 Author: Simran Basi <sbasi@google.com> Date: Mon Feb 06 20:47:04 2017 [autotest] Don't run verify on servo_host when run via test_that Users of test_that against lab duts will hit failures due to the servo_repair _UpdateVerifierverification process requiring access to the devservers in the lab which is not easily do-able due to subnet restrictions. Therefore these checks will be skipped when not run in a lab environment. BUG= chromium:675284 TEST=test_that chromeos4-row1-rack10-host15.cros platform_SuspendResumeTiming Change-Id: I971b3ba89208cc64bf4178362b6267deade6a7d7 Reviewed-on: https://chromium-review.googlesource.com/435498 Commit-Ready: Simran Basi <sbasi@chromium.org> Tested-by: Simran Basi <sbasi@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> [modify] https://crrev.com/10ad79bfccf045d0ff3978ac63241ca374da0bc5/server/hosts/servo_repair.py
,
Feb 6 2017
,
Mar 4 2017
,
May 30 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by jrbarnette@chromium.org
, Dec 19 2016Summary: test_that attempts servo update servo tests using lab DUTs (was: [servo autotest] Unable to run test against DUTs in lab. Failing servo host verification with 'Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed')
The basic problem is that for test_that is attempting to update the servo. This happens because of a chain of requirements: * The test requires servo. * Because a servo is required a ServoHost is created, and verify() is called. * verify() includes a check for "servo software up-to-date". Due to ACL restrictions, talking to a devserver now requires ssh access to the devserver. Although that can be configured user by user, that's not a scalable solution, and not really an appropriate requirement for running test_that in the first place. Some potential options: * Somehow distinguish "running in the lab" from "running test_that", and skip the update check in the latter case. * Don't retry access to the devserver when the ssh exit code is 255. * Don't try to resolve the devserver unless we determine that an update is needed (but this would still fail if the servo is actually out-of-date). * Work around this specific failure by figuring out why devserver access is failing in this case but not others.