New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 609143 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

servo label detection is flaky

Project Member Reported by kevcheng@chromium.org, May 4 2016

Issue description

It seems that the current detection mechanism for servo is flaky due to the servo label getting removed and added multiple times for the same host (one example: chromeos1-row5-rack6-host1).

Need a smarter label detecting mechanism (or robust).
 
Something about the reset job doesn't have the servo host properly initialized (but the chameleon does...)

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row5-rack6-host1/737831-reset/20160405091546/debug/
And that's because the reset job doesn't pass in 'try_lab_servo=True' when creating the machine.  

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/server/control_segments/reset?rcl=7b271d0305b5f6ba1c746f3b6098695e9377f038&l=17
I expect that the fix here is to change the servo label detection
to call 'utils.host_is_in_lab_zone()' on the servo host name.

Project Member

Comment 4 by bugdroid1@chromium.org, May 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d9dfa585eb054fb726cd7db409d48996aeebf7c0

commit d9dfa585eb054fb726cd7db409d48996aeebf7c0
Author: Kevin Cheng <kevcheng@chromium.org>
Date: Wed May 04 16:37:34 2016

[autotest] Update Servo label exists method

Previously ServoLabel checked if the _servo_host object was not None.
That wasn't always true since during the reset job, the _servo_host
object does not get created since 'try_lab_servo' isn't true when
creating the host.  This uses the method that servo_host module uses
to check for prior to creating the _servo_host object.

BUG= chromium:609143 
TEST=locally with chromeos1-row5-rack6-host1

Change-Id: Ic4b65a28351f01d129e37b84ac4a9f097171fa0b
Reviewed-on: https://chromium-review.googlesource.com/342153
Commit-Ready: Kevin Cheng <kevcheng@chromium.org>
Tested-by: Kevin Cheng <kevcheng@chromium.org>
Reviewed-by: Kalin Stoyanov <kalin@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/d9dfa585eb054fb726cd7db409d48996aeebf7c0/server/hosts/servo_host.py
[modify] https://crrev.com/d9dfa585eb054fb726cd7db409d48996aeebf7c0/server/hosts/cros_label.py

Status: Fixed (was: Assigned)
fix has been pushed and so far hosts seem to be keeping their servo label.

Kalin, please verify and let me know if there are any other outliers that are losing their servo label.

Comment 6 by ka...@chromium.org, May 11 2016

Status: Verified (was: Fixed)
It is good for now. Will re-open if issues occur.

Comment 7 by dchan@google.com, May 24 2016

Cc: waihong@chromium.org shchen@chromium.org
Status: Assigned (was: Verified)
I think this change broke one of out usage. We use beaglebone connect to a pixel-C, basically the beaglebone is our DUT which happen to run servod.  My current work around is to not do any check in update_image (basically call return)
+waihong to explain more...

here is the test_that output:
https://storage.googleapis.com/chromiumos-test-logs/bugfiles/cros/609143/a.tar.gz


trace stack
05/24 15:45:48.518 INFO | test_runner_utils:0198| autoserv| Pinging host 100.96.49.67
05/24 15:45:48.529 INFO | test_runner_utils:0198| autoserv| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_5V85Hnssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 100.96.49.67'
05/24 15:45:48.915 INFO | test_runner_utils:0198| autoserv| Host (ssh) 100.96.49.67 is alive
05/24 15:45:48.915 INFO | test_runner_utils:0198| autoserv| Applying an update to the servo host, if necessary.
05/24 15:45:49.083 INFO | test_runner_utils:0198| autoserv| Exception escaped control file, job aborting:
05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| Traceback (most recent call last):
05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/server_job.py", line 684, in run
05/24 15:45:49.084 INFO | test_runner_utils:0198| autoserv| self._execute_code(server_control_file, namespace)
05/24 15:45:49.085 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/server_job.py", line 1182, in _execute_code
05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| execfile(code_file, namespace, namespace)
05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| File "/tmp/test_that_results_VQkBAT/results-1-firmware_ECBootTime/control.srv", line 30, in <module>
05/24 15:45:49.086 INFO | test_runner_utils:0198| autoserv| parallel_simple(run_ecboottime, machines)
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/subcommand.py", line 93, in parallel_simple
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| function(arg)
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/tmp/test_that_results_VQkBAT/results-1-firmware_ECBootTime/control.srv", line 26, in run_ecboottime
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| host = hosts.create_host(machine, servo_args=servo_args)
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/factory.py", line 160, in create_host
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| host_instance = host_class(hostname, **args)
05/24 15:45:49.087 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/base_classes.py", line 58, in __init__
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| super(Host, self).__init__(*args, **dargs)
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| self._initialize(*args, **dargs)
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/cros_host.py", line 306, in _initialize
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| try_lab_servo=try_lab_servo)
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 785, in create_servo_host
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| return ServoHost(required_by_test=True, is_in_lab=False, **servo_args)
05/24 15:45:49.088 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/base_classes.py", line 58, in __init__
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| super(Host, self).__init__(*args, **dargs)
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 70, in __init__
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self._initialize(*args, **dargs)
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 167, in _initialize
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self.verify()
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/client/common_lib/hosts/base_classes.py", line 255, in verify
05/24 15:45:49.089 INFO | test_runner_utils:0198| autoserv| self.verify_software()
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 588, in verify_software
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| self.update_image(wait_for_update=False)
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/usr/lib64/python2.7/site-packages/statsd/timer.py", line 95, in _decorator
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| return function(*args, **kwargs)
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/hosts/servo_host.py", line 524, in update_image
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| target_build)[3]
05/24 15:45:49.090 INFO | test_runner_utils:0198| autoserv| File "/build/smaug/usr/local/build/autotest/server/site_utils.py", line 86, in ParseBuildName
05/24 15:45:49.091 INFO | test_runner_utils:0198| autoserv| raise ParseBuildNameException('%s is a malformed build name.' % name)
05/24 15:45:49.091 INFO | test_runner_utils:0198| autoserv| ParseBuildNameException: beaglebone_servo-release/beaglebone_servo-release/R53-8368.0.0 is a malformed build name.



Status: Fixed (was: Assigned)
I think it came from a recent push (see crbug.com/614500) and I don't think it has anything to do with the cls in this bug.
Closing... please feel free to reopen if its not fixed.
Status: Verified (was: Fixed)

Sign in to add a comment