New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 693787 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

winky-paladin failed at HWTest stage due to SSHConnectionError

Project Member Reported by shuqianz@chromium.org, Feb 17 2017

Issue description

winky-paladin failed at HWTest Stage due to SSHConnectionError two times in a row:
https://uberchromegw.corp.google.com/i/chromeos/builders/winky-paladin/builds/766
https://uberchromegw.corp.google.com/i/chromeos/builders/winky-paladin/builds/765

In build 766:
chromeos4-row3-rack12-host7 failed to run provision_AutoUpdate.double test due to Provisioning failure:
DevServerException: CrOS auto-update failed for host chromeos4-row3-rack12-host7: SSHConnectionError: ssh: connect to host chromeos4-row3-rack12-host7 port 22: Connection timed out

In build 765
chromeos4-row3-rack12-host15 failed to run login_RemoteOwnership test also due to SSHConnectionError:
ssh: connect to host chromeos4-row3-rack12-host15 port 22: Connection timed out
  Traceback (most recent call last):
    File "/usr/local/autotest/client/common_lib/test.py", line 804, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 461, in execute
      dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 376, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/server/site_tests/provision_AutoUpdate/provision_AutoUpdate.py", line 113, in run_once
      force_full_update=force)
    File "/usr/local/autotest/server/afe_utils.py", line 232, in machine_install_and_update_labels
      *args, **dargs)
    File "/usr/local/autotest/server/hosts/cros_host.py", line 728, in machine_install_by_devserver
      full_update=force_full_update)
    File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2013, in auto_update
      raise DevServerException(error_msg % (host_name, error_list[0]))
  DevServerException: CrOS auto-update failed for host chromeos4-row3-rack12-host15: SSHConnectionError: ssh: connect to host chromeos4-row3-rack12-host15 port 22: Connection timed out

Why the winky-paladin DUTs always failed to SSH into during test? Is this a network flaky? However, it happened two times in a row.
 

Comment 1 by xixuan@chromium.org, Feb 17 2017

DUT cannot reboot itself after provision. Repair also cannot fail ssh check:

FAIL	----	verify.ssh	timestamp=1487359572	localtime=Feb 17 11:26:12	No answer to ping from chromeos4-row3-rack12-host15
	START	----	repair.rpm	timestamp=1487359572	localtime=Feb 17 11:26:12	
		FAIL	----	repair.rpm	timestamp=1487359817	localtime=Feb 17 11:30:17	chromeos4-row3-rack12-host15 is still offline after powercycling
	END FAIL	----	repair.rpm	timestamp=1487359817	localtime=Feb 17 11:30:17	
	START	----	repair.sysrq	timestamp=1487359817	localtime=Feb 17 11:30:17	
		FAIL	----	repair.sysrq	timestamp=1487360106	localtime=Feb 17 11:35:06	chromeos4-row3-rack12-host15 is still offline after reset.
	END FAIL	----	repair.sysrq	timestamp=1487360106	localtime=Feb 17 11:35:06	
	START	----	repair.servoreset	timestamp=1487360106	localtime=Feb 17 11:35:06	
		GOOD	----	verify.ssh	timestamp=1487360137	localtime=Feb 17 11:35:37	
	END GOOD	----	repair.servoreset	timestamp=1487360137	localtime=Feb 17 11:35:37


I think it's the same of  Issue 692342 , We solve that one by upgrade firmware right?
jrbarnette@, is this the same cause?
 Bug 692342  is hardware specific, and can only occur on kevin.
In this case, something caused the DUT to crash and stay down
during provisioning.  One possible cause would be a bug in the
Chrome OS image we installed, but we'd need more detail about
when the DUT crashed to guess at what happened.

> Why the winky-paladin DUTs always failed to SSH into during test?
> Is this a network flaky? However, it happened two times in a row.

I checked provision jobs on CQ DUTs for the past 24 hours.  There were
two provision failures in two different builds:

chromeos4-row3-rack12-host7
    2017-02-17 13:47:50  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack12-host7/307906-provision/

chromeos4-row3-rack12-host15
    2017-02-17 10:38:40  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack12-host15/307693-provision/

Repair showed both DUTs had the same symptoms:
  * Offline at the outset
  * Toggling AC power and keyboard sysrq didn't get the device's
    attention.
  * The devices rebooted cleanly after servo reset.

The symptoms are somewhat consistent with  bug 677572 , but the winky
devices seem to be using smsc75xx devices for ethernet, and that bug
complains about a problem with smsc95xx devices.

Eventlog from chromeos4-row3-rack12-host15, truncated around
the failure event:
133 | 2017-02-16 10:12:38 | System Reset
134 | 2017-02-16 11:07:36 | Kernel Event | Clean Shutdown
135 | 2017-02-16 11:07:37 | System boot | 10399
136 | 2017-02-16 11:07:37 | System Reset
137 | 2017-02-16 11:12:06 | Kernel Event | Clean Shutdown
138 | 2017-02-16 11:12:06 | System boot | 10400
139 | 2017-02-16 11:12:06 | System Reset
140 | 2017-02-16 11:13:08 | Kernel Event | Clean Shutdown
141 | 2017-02-16 11:13:08 | System boot | 10401
142 | 2017-02-16 11:13:08 | System Reset
143 | 2017-02-16 13:58:30 | Kernel Event | Clean Shutdown


Same information relative to chromeos4-row3-rack12-host7:
99 | 2017-02-16 13:58:21 | System Reset
100 | 2017-02-16 14:02:41 | Kernel Event | Clean Shutdown
101 | 2017-02-16 14:02:42 | System boot | 7423
102 | 2017-02-16 14:02:42 | System Reset
103 | 2017-02-16 14:03:36 | Kernel Event | Clean Shutdown
104 | 2017-02-16 14:03:37 | System boot | 7424
105 | 2017-02-16 14:03:37 | System Reset
106 | 2017-02-16 16:33:44 | Kernel Event | Clean Shutdown

I suspect there's nothing much to see here...
The information for chromeos4-row3-rack12-host7 was too truncated.
Here's what it should have been:
96 | 2017-02-16 11:12:45 | System Reset
97 | 2017-02-16 13:58:21 | Kernel Event | Clean Shutdown
98 | 2017-02-16 13:58:21 | System boot | 7422
99 | 2017-02-16 13:58:21 | System Reset
100 | 2017-02-16 14:02:41 | Kernel Event | Clean Shutdown
101 | 2017-02-16 14:02:42 | System boot | 7423
102 | 2017-02-16 14:02:42 | System Reset
103 | 2017-02-16 14:03:36 | Kernel Event | Clean Shutdown
104 | 2017-02-16 14:03:37 | System boot | 7424
105 | 2017-02-16 14:03:37 | System Reset
106 | 2017-02-16 16:33:44 | Kernel Event | Clean Shutdown

Comment 9 by xixuan@chromium.org, Mar 19 2018

Status: WontFix (was: Untriaged)

Sign in to add a comment