Can't reach the servo chromeos1-row1-rack9-host6-servo |
|||||
Issue descriptionThe host chromeos1-row1-rack9-host6 (pool:faft-test-tot) failed to repair. According to the log, it failed to access the servo. https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack9-host6/272165-repair/20162308113556/debug/ 08/23 12:03:40.089 INFO | servo_host:0687| Attempting to repair servo host chromeos1-row1-rack9-host6-servo. 08/23 12:03:40.089 INFO | server_job:0129| START ---- reboot timestamp=1471979020 localtime=Aug 23 12:03:40 08/23 12:03:40.090 INFO | server_job:0129| GOOD ---- reboot.start timestamp=1471979020 localtime=Aug 23 12:03:40 08/23 12:03:40.097 INFO | abstract_ssh:0757| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_CPYAxassh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack9-host6-servo' 08/23 12:04:41.533 WARNI| base_utils:0910| run process timeout (60) fired on: /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_CPYAxassh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack9-host6-servo " if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::wait_down|get_boot_id|run] -> ssh_run(if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi)\";fi; if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi" 08/23 12:04:43.767 ERROR| base_utils:0278| [stderr] mux_client_request_session: read from master failed: Broken pipe 08/23 12:04:44.873 INFO | abstract_ssh:0743| Master ssh connection to chromeos1-row1-rack9-host6-servo is down. However, I can ping and login to the same servo chromeos1-row1-rack9-host6-servo from my desktop. Dan, is there any change recently?
,
Aug 24 2016
all dut-control commands stall and it looks like the EC is spewing a bunch of stuff on its console: # miniterm.py -b 115200 -p /dev/pts/0 ... +[2830530.597052+[2830530.608430 HC err 1] +[2830530.609281 HC err 1] +[2830530.610130 HC err 1] +[2830530.610979 HC err 1] +[2830530.631863 HC err 1] +[2830530.632714 HC err 1] +[2830530.633563 HC err 1] +[2830530.6344+[2830537.888500 HC err 1] +[2830537.889349 HC err 1] +[2830537.890198 HC err 1] +[2830537.891047 HC err 1] +[2830537.891896 HC err 1] +[2830537.892745 HC err 1] +[2830537.893594 HC err 1] +[2830537.894443 HC err 1] +[2830537.895292 HC err 1] +[2830+[2830537.944571 HC err 1] +[2830537.945420 HC err 1] +[2830537.946269 HC err 1] +[2830537.947118 HC err 1] +[2830537.947967 HC err 1] +[2830537.948816 HC err 1] +[2830537.949665 HC err 1] +[2830537.950514 HC err 1] +[2830537.951363 HC err 1] +[2830537.952212 HC err 1] +[2830537.953061 HC err 1] +[2830537.953910 HC err 1] +[2830537.954759 HC err 1] +[2830537.955608 HC err 1] +[2830537.956457 HC --- exit --- Perhaps the EC is in a weird state causing issues for servod?
,
Aug 24 2016
+haoweiw Can you help us to take a look at the dut regarding to #2?
,
Aug 24 2016
Could you try and reset the EC? (I think you accomplish that by pressing power button and refresh button together).
,
Aug 24 2016
It is in B40 lab. It stays in a loop of rebooting -> firmware screen -> OS splash screen -> rebooting. Pressing Power + F3 is still the same. This host was fine before. Probably a hardware issue.
,
Aug 24 2016
It seems not worth wasting time on it. Danny, please help swap it to a good one.
,
Aug 25 2016
,
Aug 26 2016
unit removed, will replace with jerry.
,
Nov 9 2016
I just checked the status of this and noticed that it hadn't run in awhile. Went to lab and the machine was missing. Is there any status on the jerry replacement? Can we replace it with an existing jerry? Right now we are not running FAFT on tot at all anymore.
,
Nov 9 2016
Hi, I just locate an extra unit that we have and place it there. asset #C036086 |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dshi@chromium.org
, Aug 24 2016