New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 701693 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug



Sign in to add a comment

SSH connection fails for veyron_speedy-paladin/veyron_mighty-paladin

Project Member Reported by shunhsingou@chromium.org, Mar 15 2017

Issue description

There are many ssh connection fails for veyron_speedy-paladin/veyron_mighty-paladin, and also for veyron_minnie-android-pfq and  samus-android-pfq.

e.g. https://luci-milo.appspot.com/buildbot/chromeos/veyron_mighty-paladin/4672


[Test-Logs]: Suite job: ABORT
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row4-rack13-host15: SSHConnectionError: ssh: connect to host chromeos4-row4-rack13-host15 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host11: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host11 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host17 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host4 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host5 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host6: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host6 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host10: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host10 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host14: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host14 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host17 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host19: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host19 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host2: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host2 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host4 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host5 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host7: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host7 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host9: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host9 port 22: Connection timed out
 
Labels: -Pri-3 Pri-0

Comment 2 by xixuan@chromium.org, Mar 15 2017

Cc: englab-sys-cros@google.com

Comment 3 by xixuan@chromium.org, Mar 15 2017

Cc: jrbarnette@chromium.org

Comment 4 by xixuan@chromium.org, Mar 15 2017

do we have a bad build?

Comment 5 by xixuan@chromium.org, Mar 15 2017

Cc: semenzato@chromium.org

Comment 6 by xixuan@chromium.org, Mar 15 2017

Looks like guado_moblab DUTs also cannot get back from rebooting:

https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin

Comment 7 by xixuan@chromium.org, Mar 15 2017

Same for https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin

veyron_minnie DUT (take chromeos4-row9-rack9-host1 for example) also cannot be provisioned due to it's offline after rebooting. The symptom is shown as "No hosts available for tests".

Comment 8 by xixuan@chromium.org, Mar 15 2017

Cc: -smbar...@chromium.org akes...@chromium.org
The builds starting which DUTs have this issue:

veyron_minnie-paladin/R59-9367.0.0-rc6
veyron_mighty-paladin/R59-9367.0.0-rc6
veyron_speedy-paladin/R59-9367.0.0-rc6
guado_moblab-paladin/R59-9367.0.0-rc6

Comment 9 by xixuan@chromium.org, Mar 15 2017

Cc: smbar...@chromium.org
Cc: dgarr...@chromium.org
Owner: xixuan@chromium.org
Status: Assigned (was: Untriaged)
Cc: diand...@chromium.org
Owner: smbar...@chromium.org
I don't know what happened and guess it should be the problem of a bad build. Handing it to one of our sheriffs @smbarber since he is looking for the smoking guns.

Comment 12 Deleted

Cc: -akes...@chromium.org edjee@google.com
Thanks #12. Your contribution is valued.

These CLs went in yesterday to move depmod from cros-kernel2.eclass to build_image:
https://chromium-review.googlesource.com/c/446890/
https://chromium-review.googlesource.com/c/446687/

Canaries are fine but we're running into some situation where the paladins aren't running depmod. As a result the affected boards fail to start shill (tries to talk to cfg80211 module) and never get an IP.
Cc: -smbar...@chromium.org akes...@chromium.org

Comment 15 Deleted

This seems to have also brought down most of the chromiumos.chromium waterfall [1], example at [2]. No logs to confirm, but CLs match up.

1: https://build.chromium.org/p/chromiumos.chromium/waterfall
2: https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-chromium-pfq-informational/builds/11351 
Project Member

Comment 17 by bugdroid1@chromium.org, Mar 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/a9fc48daf46ed9e3dbb79d1736308af83f297dc4

commit a9fc48daf46ed9e3dbb79d1736308af83f297dc4
Author: Stephen Barber <smbarber@chromium.org>
Date: Wed Mar 15 22:02:27 2017

Revert "cros-kernel2: remove the outputs of "depmod""

This reverts commit a1e0da70dcea4e0f28e85cda038b876b9ec958e9.

Reason for revert:  http://crbug.com/701693 

Original change's description:
> cros-kernel2: remove the outputs of "depmod"
> 
> Since CL:446687, "build_image" script will run "depmod". So kernel
> packages should not own the outputs of "depmod".
> 
> BUG= chromium:695675 
> TEST=See CL:446687
> CQ-DEPEND=CL:446687,CL:451798
> 
> Change-Id: I3909503599f9af0e95a02613fb43384c6badc270
> Reviewed-on: https://chromium-review.googlesource.com/446890
> Commit-Ready: Edward Jee <edjee@google.com>
> Tested-by: Edward Jee <edjee@google.com>
> Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
> 

TBR=vapier@chromium.org,keescook@chromium.org,edjee@google.com,chirantan@chromium.org,andreyu@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= chromium:695675 , chromium:701693 

Change-Id: I88c8832b9232a1d73dbe0f729b62053dd0213881
Reviewed-on: https://chromium-review.googlesource.com/455341
Commit-Queue: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/a9fc48daf46ed9e3dbb79d1736308af83f297dc4/eclass/cros-kernel2.eclass

Comment 18 by mqg@chromium.org, Mar 16 2017

Not sure if related at all, but when I updated kernel from chroot today, my DUT (on my desk) completely lost internet/wifi/ethernet after rebooting
If you see in dmesg that shill is SIGABRT'ing over and over, this is that issue. Reverting the CL in #17 and rebuilding your kernel should fix it.
> [ ... ] Reverting the CL in #17 and rebuilding your kernel should fix it.

I assume we've tested that building with the revert above fixes
the problem.  I'm also seeing that the latest paladin builds
have the revert, but still fail...

Looking at the CL, I see it just changes the kernel eclass file.
I'm wondering if we need to bump some kernel ebuild version number?


re #21 the commit in #20 uprevs the kernel ebuilds so the next binpkgs should have depmod output again.
Status: Fixed (was: Assigned)
Do we find any reason that why the original CL get merged?
> Do we find any reason that why the original CL get merged?

I suspect that it's because the original CL didn't include the
kernel ebuild rev bumps.  That should have meant that the CL
wasn't actually built in at the time of the CQ run.  I can't
explain how we went from there to a failing CQ, and I can't
explain why the failures didn't hit the canary.

Sign in to add a comment