SSH connection fails for veyron_speedy-paladin/veyron_mighty-paladin |
|||||||||||
Issue descriptionThere are many ssh connection fails for veyron_speedy-paladin/veyron_mighty-paladin, and also for veyron_minnie-android-pfq and samus-android-pfq. e.g. https://luci-milo.appspot.com/buildbot/chromeos/veyron_mighty-paladin/4672 [Test-Logs]: Suite job: ABORT [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row4-rack13-host15: SSHConnectionError: ssh: connect to host chromeos4-row4-rack13-host15 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host11: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host11 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host17 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host4 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host5 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host6: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host6 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host10: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host10 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host14: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host14 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host17 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host19: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host19 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host2: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host2 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host4 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host5 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host7: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host7 port 22: Connection timed out [Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host9: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host9 port 22: Connection timed out
,
Mar 15 2017
,
Mar 15 2017
,
Mar 15 2017
do we have a bad build?
,
Mar 15 2017
,
Mar 15 2017
Looks like guado_moblab DUTs also cannot get back from rebooting: https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin
,
Mar 15 2017
Same for https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin veyron_minnie DUT (take chromeos4-row9-rack9-host1 for example) also cannot be provisioned due to it's offline after rebooting. The symptom is shown as "No hosts available for tests".
,
Mar 15 2017
The builds starting which DUTs have this issue: veyron_minnie-paladin/R59-9367.0.0-rc6 veyron_mighty-paladin/R59-9367.0.0-rc6 veyron_speedy-paladin/R59-9367.0.0-rc6 guado_moblab-paladin/R59-9367.0.0-rc6
,
Mar 15 2017
,
Mar 15 2017
,
Mar 15 2017
I don't know what happened and guess it should be the problem of a bad build. Handing it to one of our sheriffs @smbarber since he is looking for the smoking guns.
,
Mar 15 2017
Thanks #12. Your contribution is valued. These CLs went in yesterday to move depmod from cros-kernel2.eclass to build_image: https://chromium-review.googlesource.com/c/446890/ https://chromium-review.googlesource.com/c/446687/ Canaries are fine but we're running into some situation where the paladins aren't running depmod. As a result the affected boards fail to start shill (tries to talk to cfg80211 module) and never get an IP.
,
Mar 15 2017
,
Mar 15 2017
This seems to have also brought down most of the chromiumos.chromium waterfall [1], example at [2]. No logs to confirm, but CLs match up. 1: https://build.chromium.org/p/chromiumos.chromium/waterfall 2: https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-chromium-pfq-informational/builds/11351
,
Mar 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/a9fc48daf46ed9e3dbb79d1736308af83f297dc4 commit a9fc48daf46ed9e3dbb79d1736308af83f297dc4 Author: Stephen Barber <smbarber@chromium.org> Date: Wed Mar 15 22:02:27 2017 Revert "cros-kernel2: remove the outputs of "depmod"" This reverts commit a1e0da70dcea4e0f28e85cda038b876b9ec958e9. Reason for revert: http://crbug.com/701693 Original change's description: > cros-kernel2: remove the outputs of "depmod" > > Since CL:446687, "build_image" script will run "depmod". So kernel > packages should not own the outputs of "depmod". > > BUG= chromium:695675 > TEST=See CL:446687 > CQ-DEPEND=CL:446687,CL:451798 > > Change-Id: I3909503599f9af0e95a02613fb43384c6badc270 > Reviewed-on: https://chromium-review.googlesource.com/446890 > Commit-Ready: Edward Jee <edjee@google.com> > Tested-by: Edward Jee <edjee@google.com> > Reviewed-by: Chirantan Ekbote <chirantan@chromium.org> > TBR=vapier@chromium.org,keescook@chromium.org,edjee@google.com,chirantan@chromium.org,andreyu@google.com NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= chromium:695675 , chromium:701693 Change-Id: I88c8832b9232a1d73dbe0f729b62053dd0213881 Reviewed-on: https://chromium-review.googlesource.com/455341 Commit-Queue: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/a9fc48daf46ed9e3dbb79d1736308af83f297dc4/eclass/cros-kernel2.eclass
,
Mar 16 2017
Not sure if related at all, but when I updated kernel from chroot today, my DUT (on my desk) completely lost internet/wifi/ethernet after rebooting
,
Mar 16 2017
If you see in dmesg that shill is SIGABRT'ing over and over, this is that issue. Reverting the CL in #17 and rebuilding your kernel should fix it.
,
Mar 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/25a8e98baf18ca240d4b0d87034a6ebb979a42ef commit 25a8e98baf18ca240d4b0d87034a6ebb979a42ef Author: Stephen Barber <smbarber@chromium.org> Date: Thu Mar 16 00:25:18 2017 manual uprev for kernel ebuilds We want our depmod output again, so force rebuilding the kernels from scratch. BUG= chromium:701693 TEST=kernel builds Change-Id: I4fd0ed826c544362833751d88fbd26d2a246c98e Reviewed-on: https://chromium-review.googlesource.com/455349 Reviewed-by: Stephen Barber <smbarber@chromium.org> Commit-Queue: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> [rename] https://crrev.com/25a8e98baf18ca240d4b0d87034a6ebb979a42ef/sys-kernel/chromeos-kernel-3_10/chromeos-kernel-3_10-3.10.18-r965.ebuild [rename] https://crrev.com/25a8e98baf18ca240d4b0d87034a6ebb979a42ef/sys-kernel/chromeos-kernel-3_8/chromeos-kernel-3_8-3.8.11-r549.ebuild [rename] https://crrev.com/25a8e98baf18ca240d4b0d87034a6ebb979a42ef/sys-kernel/chromeos-kernel-3_18/chromeos-kernel-3_18-3.18-r1710.ebuild [rename] https://crrev.com/25a8e98baf18ca240d4b0d87034a6ebb979a42ef/sys-kernel/chromeos-kernel-4_4/chromeos-kernel-4_4-4.4.52-r680.ebuild [rename] https://crrev.com/25a8e98baf18ca240d4b0d87034a6ebb979a42ef/sys-kernel/chromeos-kernel-3_14/chromeos-kernel-3_14-3.14-r1754.ebuild
,
Mar 16 2017
> [ ... ] Reverting the CL in #17 and rebuilding your kernel should fix it. I assume we've tested that building with the revert above fixes the problem. I'm also seeing that the latest paladin builds have the revert, but still fail... Looking at the CL, I see it just changes the kernel eclass file. I'm wondering if we need to bump some kernel ebuild version number?
,
Mar 16 2017
re #21 the commit in #20 uprevs the kernel ebuilds so the next binpkgs should have depmod output again.
,
Mar 16 2017
,
Mar 17 2017
Do we find any reason that why the original CL get merged?
,
Mar 17 2017
> Do we find any reason that why the original CL get merged? I suspect that it's because the original CL didn't include the kernel ebuild rev bumps. That should have meant that the CL wasn't actually built in at the time of the CQ run. I can't explain how we went from there to a failing CQ, and I can't explain why the failures didn't hit the canary. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by shunhsingou@chromium.org
, Mar 15 2017