Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users
Status: Fixed
Owner:
Closed: Mar 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug



Sign in to add a comment
SSH connection fails for veyron_speedy-paladin/veyron_mighty-paladin
Project Member Reported by shunhsingou@chromium.org, Mar 15 2017 Back to list
There are many ssh connection fails for veyron_speedy-paladin/veyron_mighty-paladin, and also for veyron_minnie-android-pfq and  samus-android-pfq.

e.g. https://luci-milo.appspot.com/buildbot/chromeos/veyron_mighty-paladin/4672


[Test-Logs]: Suite job: ABORT
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row4-rack13-host15: SSHConnectionError: ssh: connect to host chromeos4-row4-rack13-host15 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host11: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host11 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host17 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host4 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host5 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host6: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host6 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host10: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host10 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host14: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host14 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host17 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host19: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host19 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host2: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host2 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host4 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host5 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host7: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host7 port 22: Connection timed out
[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host9: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host9 port 22: Connection timed out
 
Labels: -Pri-3 Pri-0
Comment 2 by xixuan@chromium.org, Mar 15 2017
Cc: englab-sys-cros@google.com
Comment 3 by xixuan@chromium.org, Mar 15 2017
Cc: jrbarnette@chromium.org
Comment 4 by xixuan@chromium.org, Mar 15 2017
do we have a bad build?
Comment 5 by xixuan@chromium.org, Mar 15 2017
Cc: semenzato@chromium.org
Comment 6 by xixuan@chromium.org, Mar 15 2017
Looks like guado_moblab DUTs also cannot get back from rebooting:

https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin
Comment 7 by xixuan@chromium.org, Mar 15 2017
Same for https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin

veyron_minnie DUT (take chromeos4-row9-rack9-host1 for example) also cannot be provisioned due to it's offline after rebooting. The symptom is shown as "No hosts available for tests".
Comment 8 by xixuan@chromium.org, Mar 15 2017
Cc: -smbar...@chromium.org akes...@chromium.org
The builds starting which DUTs have this issue:

veyron_minnie-paladin/R59-9367.0.0-rc6
veyron_mighty-paladin/R59-9367.0.0-rc6
veyron_speedy-paladin/R59-9367.0.0-rc6
guado_moblab-paladin/R59-9367.0.0-rc6
Comment 9 by xixuan@chromium.org, Mar 15 2017
Cc: smbar...@chromium.org
Cc: dgarr...@chromium.org
Owner: xixuan@chromium.org
Status: Assigned
Cc: diand...@chromium.org
Owner: smbar...@chromium.org
I don't know what happened and guess it should be the problem of a bad build. Handing it to one of our sheriffs @smbarber since he is looking for the smoking guns.
Comment 12 Deleted
Cc: -akes...@chromium.org edjee@google.com
Thanks #12. Your contribution is valued.

These CLs went in yesterday to move depmod from cros-kernel2.eclass to build_image:
https://chromium-review.googlesource.com/c/446890/
https://chromium-review.googlesource.com/c/446687/

Canaries are fine but we're running into some situation where the paladins aren't running depmod. As a result the affected boards fail to start shill (tries to talk to cfg80211 module) and never get an IP.
Cc: -smbar...@chromium.org akes...@chromium.org
Comment 15 Deleted
This seems to have also brought down most of the chromiumos.chromium waterfall [1], example at [2]. No logs to confirm, but CLs match up.

1: https://build.chromium.org/p/chromiumos.chromium/waterfall
2: https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-chromium-pfq-informational/builds/11351 
Project Member Comment 17 by bugdroid1@chromium.org, Mar 15 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/a9fc48daf46ed9e3dbb79d1736308af83f297dc4

commit a9fc48daf46ed9e3dbb79d1736308af83f297dc4
Author: Stephen Barber <smbarber@chromium.org>
Date: Wed Mar 15 22:02:27 2017

Revert "cros-kernel2: remove the outputs of "depmod""

This reverts commit a1e0da70dcea4e0f28e85cda038b876b9ec958e9.

Reason for revert:  http://crbug.com/701693 

Original change's description:
> cros-kernel2: remove the outputs of "depmod"
> 
> Since CL:446687, "build_image" script will run "depmod". So kernel
> packages should not own the outputs of "depmod".
> 
> BUG= chromium:695675 
> TEST=See CL:446687
> CQ-DEPEND=CL:446687,CL:451798
> 
> Change-Id: I3909503599f9af0e95a02613fb43384c6badc270
> Reviewed-on: https://chromium-review.googlesource.com/446890
> Commit-Ready: Edward Jee <edjee@google.com>
> Tested-by: Edward Jee <edjee@google.com>
> Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>
> 

TBR=vapier@chromium.org,keescook@chromium.org,edjee@google.com,chirantan@chromium.org,andreyu@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= chromium:695675 , chromium:701693 

Change-Id: I88c8832b9232a1d73dbe0f729b62053dd0213881
Reviewed-on: https://chromium-review.googlesource.com/455341
Commit-Queue: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/a9fc48daf46ed9e3dbb79d1736308af83f297dc4/eclass/cros-kernel2.eclass

Comment 18 by mqg@chromium.org, Mar 16 2017
Not sure if related at all, but when I updated kernel from chroot today, my DUT (on my desk) completely lost internet/wifi/ethernet after rebooting
If you see in dmesg that shill is SIGABRT'ing over and over, this is that issue. Reverting the CL in #17 and rebuilding your kernel should fix it.
> [ ... ] Reverting the CL in #17 and rebuilding your kernel should fix it.

I assume we've tested that building with the revert above fixes
the problem.  I'm also seeing that the latest paladin builds
have the revert, but still fail...

Looking at the CL, I see it just changes the kernel eclass file.
I'm wondering if we need to bump some kernel ebuild version number?


re #21 the commit in #20 uprevs the kernel ebuilds so the next binpkgs should have depmod output again.
Status: Fixed
Do we find any reason that why the original CL get merged?
> Do we find any reason that why the original CL get merged?

I suspect that it's because the original CL didn't include the
kernel ebuild rev bumps.  That should have meant that the CL
wasn't actually built in at the time of the CQ run.  I can't
explain how we went from there to a failing CQ, and I can't
explain why the failures didn't hit the canary.

Sign in to add a comment