New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 640297 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Can't reach the servo chromeos1-row1-rack9-host6-servo

Project Member Reported by waihong@chromium.org, Aug 23 2016

Issue description

The host chromeos1-row1-rack9-host6 (pool:faft-test-tot) failed to repair.

According to the log, it failed to access the servo.
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack9-host6/272165-repair/20162308113556/debug/

08/23 12:03:40.089 INFO |        servo_host:0687| Attempting to repair servo host chromeos1-row1-rack9-host6-servo.
08/23 12:03:40.089 INFO |        server_job:0129| 	START	----	reboot	timestamp=1471979020	localtime=Aug 23 12:03:40	
08/23 12:03:40.090 INFO |        server_job:0129| 		GOOD	----	reboot.start	timestamp=1471979020	localtime=Aug 23 12:03:40	
08/23 12:03:40.097 INFO |      abstract_ssh:0757| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_CPYAxassh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack9-host6-servo'
08/23 12:04:41.533 WARNI|        base_utils:0910| run process timeout (60) fired on: /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_CPYAxassh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=180 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos1-row1-rack9-host6-servo " if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::wait_down|get_boot_id|run] -> ssh_run(if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi)\";fi; if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi"
08/23 12:04:43.767 ERROR|        base_utils:0278| [stderr] mux_client_request_session: read from master failed: Broken pipe
08/23 12:04:44.873 INFO |      abstract_ssh:0743| Master ssh connection to chromeos1-row1-rack9-host6-servo is down.

However, I can ping and login to the same servo chromeos1-row1-rack9-host6-servo from my desktop.

Dan, is there any change recently?

 

Comment 1 by dshi@chromium.org, Aug 24 2016

Cc: kevcheng@chromium.org
First finding is that servod failed to start.
Second thing is that /var/lib/servod/config_9999 is empty, it should have "BOARD=..." setting.

I manually updated config_9999 file, and servod is running now. try to reverify the dut.
all dut-control commands stall and it looks like the EC is spewing a bunch of stuff on its console:

# miniterm.py -b 115200 -p /dev/pts/0 
...
+[2830530.597052+[2830530.608430 HC err 1]
+[2830530.609281 HC err 1]
+[2830530.610130 HC err 1]
+[2830530.610979 HC err 1]
+[2830530.631863 HC err 1]
+[2830530.632714 HC err 1]
+[2830530.633563 HC err 1]
+[2830530.6344+[2830537.888500 HC err 1]
+[2830537.889349 HC err 1]
+[2830537.890198 HC err 1]
+[2830537.891047 HC err 1]
+[2830537.891896 HC err 1]
+[2830537.892745 HC err 1]
+[2830537.893594 HC err 1]
+[2830537.894443 HC err 1]
+[2830537.895292 HC err 1]
+[2830+[2830537.944571 HC err 1]
+[2830537.945420 HC err 1]
+[2830537.946269 HC err 1]
+[2830537.947118 HC err 1]
+[2830537.947967 HC err 1]
+[2830537.948816 HC err 1]
+[2830537.949665 HC err 1]
+[2830537.950514 HC err 1]
+[2830537.951363 HC err 1]
+[2830537.952212 HC err 1]
+[2830537.953061 HC err 1]
+[2830537.953910 HC err 1]
+[2830537.954759 HC err 1]
+[2830537.955608 HC err 1]
+[2830537.956457 HC 
--- exit ---
 

Perhaps the EC is in a weird state causing issues for servod?

Comment 3 by dshi@chromium.org, Aug 24 2016

Cc: haoweiw@google.com
+haoweiw
Can you help us to take a look at the dut regarding to #2?
Could you try and reset the EC? (I think you accomplish that by pressing power button and refresh button together).
Cc: dshi@chromium.org
Owner: dchan@chromium.org
It is in B40 lab. It stays in a loop of rebooting -> firmware screen -> OS splash screen -> rebooting. Pressing Power + F3 is still the same.

This host was fine before. Probably a hardware issue.
It seems not worth wasting time on it.

Danny, please help swap it to a good one.

Comment 7 by dchan@google.com, Aug 25 2016

Status: Assigned (was: Untriaged)

Comment 8 by dchan@google.com, Aug 26 2016

unit removed, will replace with jerry.

Comment 9 by shchen@google.com, Nov 9 2016

I just checked the status of this and noticed that it hadn't run in awhile.  Went to lab and the machine was missing.  Is there any status on the jerry replacement?  Can we replace it with an existing jerry?  Right now we are not running FAFT on tot at all anymore.
Status: Fixed (was: Assigned)
Hi, I just locate an extra unit that we have and place it there.

asset #C036086

Sign in to add a comment