New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 675284 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Restricted
  • Only users with Google permission may make changes.



Sign in to add a comment

test_that attempts servo update for servo tests that use lab DUTs

Project Member Reported by ka...@chromium.org, Dec 17 2016

Issue description

(cr) ((d880765...)) kalin@kalin ~/trunk/src/scripts $ test_that --autotest_dir ~/trunk/src/third_party/autotest/files/ --board=squawks chromeos2-row10-rack10-host13 platform_SuspendResumeTiming
WARNING:root:Failed to import ts_mon, monitoring is disabled: No module named urllib.parse
INFO:root:Identity added: /tmp/test_that_results_H19VJ3/testing_rsa (/tmp/test_that_results_H19VJ3/testing_rsa)
18:29:06 INFO | Began logging to /tmp/test_that_results_H19VJ3
Adding labels [u'cros-version:ad_hoc_build', u'board:squawks'] to host chromeos2-row10-rack10-host13
20:29:07 INFO | Fetching suite for job named platform_SuspendResumeTiming...
20:29:10 INFO | Scheduling suite for job named platform_SuspendResumeTiming...
20:29:10 INFO | ... scheduled 1 job(s).
20:29:10 INFO | autoserv| WARNING:root:Failed to import ts_mon, monitoring is disabled: No module named urllib.parse
20:29:10 INFO | autoserv| Results placed in /tmp/test_that_results_H19VJ3/results-1-platform_SuspendResumeTiming
20:29:10 INFO | autoserv| Logged pid 329 to /tmp/test_that_results_H19VJ3/results-1-platform_SuspendResumeTiming/.autoserv_execute
20:29:10 INFO | autoserv| I am PID 329
20:29:10 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_TFNpPwssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpJrzlUN -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row10-rack10-host13'
20:29:15 INFO | autoserv| Chameleon chromeos2-row10-rack10-host13-chameleon is not accessible. Please file a bug to test lab
20:29:15 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_JjMvZissh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row10-rack10-host13'
20:29:15 INFO | autoserv| get_network_stats: at-start RXbytes 83941685 TXbytes 18149066
20:29:15 INFO | autoserv| Not checking if job_repo_url contains autotest packages on ['chromeos2-row10-rack10-host13']
20:29:15 INFO | autoserv| Processing control file
20:29:15 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_taUmHxssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpFSK9Fb -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos2-row10-rack10-host13'
20:29:17 INFO | autoserv| Verifying this condition: host is available via ssh
20:29:17 INFO | autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_n6306Pssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row10-rack10-host13-servo'
20:29:17 INFO | autoserv| No failed triggers, skipping repair:  Power cycle the host with RPM
20:29:17 INFO | autoserv| Verifying this condition: servo BOARD setting is correct
20:29:17 INFO | autoserv| Verifying this condition: servo SERIAL setting is correct
20:29:17 INFO | autoserv| Verifying this condition: servod upstart job is running
20:29:18 INFO | autoserv| Verifying this condition: servod service is taking calls
20:29:18 INFO | autoserv| Waiting 60 seconds for XMLRPC server to start.
20:29:18 INFO | autoserv| <class 'socket.error'>([Errno 111] Connection refused)
20:29:18 INFO | autoserv| Retrying in 1.430969 seconds...
20:29:19 INFO | autoserv| XMLRPC server started successfully.
20:29:24 INFO | autoserv| Setting usb_mux_oe1 to on
20:29:26 INFO | autoserv| Verifying this condition: pwr_button control is normal
20:29:26 INFO | autoserv| Verifying this condition: lid_open control is normal
20:29:27 INFO | autoserv| No failed triggers, skipping repair:  Start servod with the proper config settings.
20:29:27 INFO | autoserv| No failed triggers, skipping repair:  Wait for update, then reboot servo host.
20:29:27 INFO | autoserv| Verifying this condition: servo host software is up-to-date
20:29:27 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
20:29:27 INFO | autoserv| * Command:
20:29:27 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'
20:29:27 INFO | autoserv| Exit status: 255
20:29:27 INFO | autoserv| Duration: 0.251973867416
20:29:27 INFO | autoserv| )
20:29:27 INFO | autoserv| Retrying in 2.231108 seconds...
20:29:30 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
20:29:30 INFO | autoserv| * Command:
20:29:30 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'
20:29:30 INFO | autoserv| Exit status: 255
20:29:30 INFO | autoserv| Duration: 0.252958059311
20:29:30 INFO | autoserv| )
20:29:30 INFO | autoserv| Retrying in 3.515122 seconds...
20:29:33 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
20:29:33 INFO | autoserv| * Command:
20:29:33 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'
20:29:33 INFO | autoserv| Exit status: 255
20:29:33 INFO | autoserv| Duration: 0.24486207962
20:29:33 INFO | autoserv| )
20:29:33 INFO | autoserv| Retrying in 2.322010 seconds...
20:29:36 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
20:29:36 INFO | autoserv| * Command:
20:29:36 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'
20:29:36 INFO | autoserv| Exit status: 255
20:29:36 INFO | autoserv| Duration: 0.246438980103
20:29:36 INFO | autoserv| )
20:29:36 INFO | autoserv| Retrying in 3.505334 seconds...
20:29:40 INFO | autoserv| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
20:29:40 INFO | autoserv| * Command:
20:29:40 INFO | autoserv| ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'
20:29:40 INFO | autoserv| Exit status: 255
20:29:40 INFO | autoserv| Duration: 0.24378991127
20:29:40 INFO | autoserv| )
20:29:40 INFO | autoserv| Retrying in 2.897331 seconds...
^C20:29:41 WARNI| Received SIGINT or SIGTERM. Cleaning up and exiting.
20:29:41 WARNI| Sending SIGINT to autoserv process. Waiting up to 5 seconds for cleanup.


As noted before - It seems local issue to my repo/chroot/setup, as harpreet@ successfully runs a test against same local DUT and servo hosts.

ssh to dev server is good:
(cr) ((d880765...)) kalin@kalin ~/trunk/src/scripts $ ssh 100.115.219.133 
Last login: Wed Dec  7 12:49:49 2016 from jrbarnette.mtv.corp.google.com
chromeos-test@chromeos4-devserver5:~$

My ~/.ssh/config content:
1 Host *
  2  ForwardAgent yes
  3  PreferredAuthentications publickey,gssapi-with-mic,hostbased,keyboard-interactive,password
  4  VerifyHostKeyDNS no
  5  StrictHostKeyChecking no
  6  LogLevel quiet
  7  UserKnownHostsFile /dev/null
  8 
  9 Host 100.107.160.* 100.115.245.* 172.17.40.*  100.115.219.* 100.115.245.199
 10  User chromeos-test
 11  IdentityFile %d/.ssh/chromium
 12  Protocol 2
 13  StrictHostKeyChecking no
 14  LogLevel quiet
 15  UserKnownHostsFile /dev/null

 
 
Status: Available (was: Untriaged)
Summary: test_that attempts servo update servo tests using lab DUTs (was: [servo autotest] Unable to run test against DUTs in lab. Failing servo host verification with 'Command <ssh 100.115.219.133 'curl "http://100.115.219.133:8082/check_health?"'> failed')
The basic problem is that for test_that is attempting to update
the servo.  This happens because of a chain of requirements:
  * The test requires servo.
  * Because a servo is required a ServoHost is created, and
    verify() is called.
  * verify() includes a check for "servo software up-to-date".

Due to ACL restrictions, talking to a devserver now requires
ssh access to the devserver.  Although that can be configured user
by user, that's not a scalable solution, and not really an appropriate
requirement for running test_that in the first place.

Some potential options:
  * Somehow distinguish "running in the lab" from "running test_that",
    and skip the update check in the latter case.
  * Don't retry access to the devserver when the ssh exit code is 255.
  * Don't try to resolve the devserver unless we determine that an
    update is needed (but this would still fail if the servo is actually
    out-of-date).
  * Work around this specific failure by figuring out why devserver
    access is failing in this case but not others.

Summary: test_that attempts servo update for servo tests that use lab DUTs (was: test_that attempts servo update servo tests using lab DUTs)
Labels: Hotlist-Fixit

Comment 4 by sbasi@chromium.org, Jan 31 2017

Owner: sbasi@chromium.org
Status: Started (was: Available)
We already have the ability to distinguish "running in the lab", I will fix this.
Project Member

Comment 5 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/10ad79bfccf045d0ff3978ac63241ca374da0bc5

commit 10ad79bfccf045d0ff3978ac63241ca374da0bc5
Author: Simran Basi <sbasi@google.com>
Date: Mon Feb 06 20:47:04 2017

[autotest] Don't run verify on servo_host when run via test_that

Users of test_that against lab duts will hit failures due to the
servo_repair _UpdateVerifierverification process requiring access
to the devservers in the lab which is not easily do-able due
to subnet restrictions.

Therefore these checks will be skipped when not run in a lab
environment.

BUG= chromium:675284 
TEST=test_that chromeos4-row1-rack10-host15.cros platform_SuspendResumeTiming

Change-Id: I971b3ba89208cc64bf4178362b6267deade6a7d7
Reviewed-on: https://chromium-review.googlesource.com/435498
Commit-Ready: Simran Basi <sbasi@chromium.org>
Tested-by: Simran Basi <sbasi@chromium.org>
Reviewed-by: Simran Basi <sbasi@chromium.org>

[modify] https://crrev.com/10ad79bfccf045d0ff3978ac63241ca374da0bc5/server/hosts/servo_repair.py

Comment 6 by sbasi@chromium.org, Feb 6 2017

Labels: cros-infra-fixedit-q117
Status: Fixed (was: Started)

Comment 8 by dchan@google.com, May 30 2017

Labels: VerifyIn-60

Comment 9 by dchan@chromium.org, Aug 1 2017

Labels: VerifyIn-61

Comment 10 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment