servo label detection is flaky |
||||||
Issue descriptionIt seems that the current detection mechanism for servo is flaky due to the servo label getting removed and added multiple times for the same host (one example: chromeos1-row5-rack6-host1). Need a smarter label detecting mechanism (or robust).
,
May 4 2016
And that's because the reset job doesn't pass in 'try_lab_servo=True' when creating the machine. https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/server/control_segments/reset?rcl=7b271d0305b5f6ba1c746f3b6098695e9377f038&l=17
,
May 4 2016
I expect that the fix here is to change the servo label detection to call 'utils.host_is_in_lab_zone()' on the servo host name.
,
May 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d9dfa585eb054fb726cd7db409d48996aeebf7c0 commit d9dfa585eb054fb726cd7db409d48996aeebf7c0 Author: Kevin Cheng <kevcheng@chromium.org> Date: Wed May 04 16:37:34 2016 [autotest] Update Servo label exists method Previously ServoLabel checked if the _servo_host object was not None. That wasn't always true since during the reset job, the _servo_host object does not get created since 'try_lab_servo' isn't true when creating the host. This uses the method that servo_host module uses to check for prior to creating the _servo_host object. BUG= chromium:609143 TEST=locally with chromeos1-row5-rack6-host1 Change-Id: Ic4b65a28351f01d129e37b84ac4a9f097171fa0b Reviewed-on: https://chromium-review.googlesource.com/342153 Commit-Ready: Kevin Cheng <kevcheng@chromium.org> Tested-by: Kevin Cheng <kevcheng@chromium.org> Reviewed-by: Kalin Stoyanov <kalin@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/d9dfa585eb054fb726cd7db409d48996aeebf7c0/server/hosts/servo_host.py [modify] https://crrev.com/d9dfa585eb054fb726cd7db409d48996aeebf7c0/server/hosts/cros_label.py
,
May 6 2016
fix has been pushed and so far hosts seem to be keeping their servo label. Kalin, please verify and let me know if there are any other outliers that are losing their servo label.
,
May 11 2016
It is good for now. Will re-open if issues occur.
,
May 24 2016
I think this change broke one of out usage. We use beaglebone connect to a pixel-C, basically the beaglebone is our DUT which happen to run servod. My current work around is to not do any check in update_image (basically call return) +waihong to explain more... here is the test_that output: https://storage.googleapis.com/chromiumos-test-logs/bugfiles/cros/609143/a.tar.gz trace stack 05/24 15:45:48.518 INFO | test_runner_utils:0198| autoserv| Pinging host 100.96.49.67 05/24 15:45:48.529 INFO | test_runner_utils:0198| autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_5V85Hnssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 100.96.49.67' 05/24 15:45:48.915 INFO | test_runner_utils:0198| autoserv| Host (ssh) 100.96.49.67 is alive 05/24 15:45:48.915 INFO | test_runner_utils:0198| autoserv| Applying an update to the servo host, if necessary. 05/24 15:45:49.083 INFO | test_runner_utils:0198| autoserv| Exception escaped control file, job aborting: 05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| Traceback (most recent call last): 05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/server_job.py", line 684, in run 05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| self._execute_code(server_control_file, namespace) 05/24 15:45:49.085 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/server_job.py", line 1182, in _execute_code 05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| execfile(code_file, namespace, namespace) 05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| File "/tmp/test_that_results_VQkBAT/results-1-firmware_ECBootTime/control.srv", line 30, in <module> 05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| parallel_simple(run_ecboottime, machines) 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/subcommand.py", line 93, in parallel_simple 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| function(arg) 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/tmp/test_that_results_VQkBAT/results-1-firmware_ECBootTime/control.srv", line 26, in run_ecboottime 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| host = hosts.create_host(machine, servo_args=servo_args) 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/factory.py", line 160, in create_host 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| host_instance = host_class(hostname, **args) 05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/base_classes.py", line 58, in __init__ 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| super(Host, self).__init__(*args, **dargs) 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__ 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| self._initialize(*args, **dargs) 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/cros_host.py", line 306, in _initialize 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| try_lab_servo=try_lab_servo) 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 785, in create_servo_host 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| return ServoHost(required_by_test=True, is_in_lab=False, **servo_args) 05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/base_classes.py", line 58, in __init__ 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| super(Host, self).__init__(*args, **dargs) 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__ 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self._initialize(*args, **dargs) 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 167, in _initialize 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self.verify() 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 255, in verify 05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self.verify_software() 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 588, in verify_software 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| self.update_image(wait_for_update=False) 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/usr/lib64/python2.7/site-packages/statsd/timer.py", line 95, in _decorator 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| return function(*args, **kwargs) 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 524, in update_image 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| target_build)[3] 05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/site_utils.py", line 86, in ParseBuildName 05/24 15:45:49.091 INFO | test_runner_utils:0198| autoserv| raise ParseBuildNameException('%s is a malformed build name.' % name) 05/24 15:45:49.091 INFO | test_runner_utils:0198| autoserv| ParseBuildNameException: beaglebone_servo-release/beaglebone_servo-release/R53-8368.0.0 is a malformed build name.
,
May 24 2016
I think it came from a recent push (see crbug.com/614500) and I don't think it has anything to do with the cls in this bug.
,
Jun 27 2016
Closing... please feel free to reopen if its not fixed.
,
Jun 27 2016
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by kevcheng@chromium.org
, May 4 2016