New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 594176 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Mar 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

daisy_skate-chrome-pfq provision failing repeatedly

Project Member Reported by sha...@chromium.org, Mar 11 2016

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/daisy_skate-chrome-pfq

provision                                 FAIL: Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: 

Output below this line is for buildbot consumption:
@@@STEP_LINK@provision: 17 reports, FAIL: Unhandled AutoservSSHTimeout: ('ssh timed out', * Command:@https://code.google.com/p/chromium/issues/detail?id=589367@@@
@@@STEP_LINK@Flaky test dashboard view for test provision@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=provision@@@
Will return from run_suite with status: INFRA_FAILURE


Is this a case of a recent update poisoning daisy_skate devices, or some kind of infra problem?
 

Comment 1 by sha...@chromium.org, Mar 11 2016

Labels: Infra-ChromeOS
Cc: jrbarnette@chromium.org
logs:
03/10 19:42:53.427 DEBUG|          ssh_host:0153| Running (ssh) '/usr/bin/update_engine_client -status 2>&1 | grep CURRENT_OP'
03/10 19:42:53.696 DEBUG|        base_utils:0268| [stdout] CURRENT_OP=UPDATE_STATUS_IDLE
03/10 19:42:53.705 INFO |        servo_host:0510| servo host chromeos4-row9-rack7-host11-servo does not require an update.
03/10 19:42:53.707 DEBUG|          ssh_host:0153| Running (ssh) 'test -f /var/lib/servod/config'
03/10 19:42:53.885 DEBUG|          ssh_host:0153| Running (ssh) 'pgrep servod'
03/10 19:42:54.101 DEBUG|        base_utils:0268| [stdout] 476
03/10 19:42:54.102 DEBUG|        base_utils:0268| [stdout] 547
03/10 19:42:54.102 DEBUG|        base_utils:0268| [stdout] 548
03/10 19:42:54.106 INFO |        servo_host:0381| servod is running, PID=476,547,548
03/10 19:42:55.310 INFO |             servo:0496| Setting usb_mux_oe1 to on
03/10 19:42:55.653 INFO |             servo:0496| Setting prtctl4_pwren to off
03/10 19:42:57.915 DEBUG|             servo:0225| Servo initialized, version is servo_v3
03/10 19:42:57.915 INFO |        servo_host:0540| Sanity checks pass on servo host chromeos4-row9-rack7-host11-servo
03/10 19:42:57.982 DEBUG|          ssh_host:0153| Running (ssh) 'test ! -e /var/log/messages || cp -f /var/log/messages /var/tmp/messages.autotest_start'
03/10 19:42:57.983 INFO |      abstract_ssh:0749| Starting master ssh connection '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_1XvZtNssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row9-rack7-host11'
03/10 19:42:57.984 DEBUG|        base_utils:0177| Running '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_1XvZtNssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row9-rack7-host11'
03/10 19:43:03.052 INFO |      abstract_ssh:0764| Timed out waiting for master-ssh connection to be established.
03/10 19:45:06.505 ERROR|        base_utils:0268| [stderr] ssh: connect to host chromeos4-row9-rack7-host11 port 22: Connection timed out
03/10 19:45:06.506 INFO |            remote:0074| Failed to copy /var/log/messages at startup: ('ssh timed out', * Command: 
    /usr/bin/ssh -a -x    -o ControlPath=/tmp/_autotmp_1XvZtNssh-
    master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
    -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
    ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22
    chromeos4-row9-rack7-host11 "export LIBC_FATAL_STDERR_=1; test ! -e
    /var/log/messages || cp -f /var/log/messages
    /var/tmp/messages.autotest_start"
Exit status: 255
Duration: 123.423555851

+@jrbarnette, the ssh timeout could also be caused by the servo update?
Here's the error that triggered the failure:
    03/10 19:45:06.505 ERROR|        base_utils:0268| [stderr] ssh: connect to host chromeos4-row9-rack7-host11 port 22: Connection timed out

That's a problem with the DUT, not the servo.

I think (but I haven't checked in depth) that most servo
failures will be ignored in this context.


Adding more information, I think many of the failures in question
happen because the DUT is offline when provision starts.

I don't know why the DUT is offline; that needs investigation.
For the specific failure on chromeos4-row9-rack7-host11, I checked
the history:  The DUT was offline at the start of provisioning.
The provisioning failure triggered repair, and the servo repaired
the DUT by power cycling it.

The DUT is in service now, and seems to be running tests successfully.

Comment 6 by autumn@chromium.org, Mar 14 2016

Owner: jrbarnette@chromium.org
Status: Started (was: Untriaged)
@ richard - is there any more work needed on this? Should we use this as the FR for fixing provisioning to be smarter in the future? 
Labels: OS-Chrome
Status: WontFix (was: Started)
I've filed  bug 594828  for improvements to Provision task
failure diagnosis.

I don't think that for this failure there's anything else to
be done.

Comment 8 by benhenry@google.com, Apr 27 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS

Sign in to add a comment