New issue
Advanced search Search tips

Issue 738520 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 739583
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

CQ failure: whirlwind provision failure

Project Member Reported by ayatane@chromium.org, Jun 30 2017

Issue description

Tracking bug for failure: https://luci-milo.appspot.com/buildbot/chromeos/whirlwind-paladin/8436


1 DUT failed to reboot after provision: chromeos4-row10-jetstream-host8
 
Cc: lgoo...@chromium.org
Labels: -Pri-2 Pri-1
Owner: ayatane@chromium.org
Status: Assigned (was: Untriaged)
The failure sequence suggests it was a bad CL.

Assigning to the primary deputy to triage to determine
which CL.

lgoodby@ noted this at crosoncall; he may know more by now.

Two successive whirlwind-paladin CQ runs last night failed with all DUTs failing to provision (whirlwind-paladin builds 8436 and 8435).

The provisions generally failed due to system-services not running after provisioning (verify.cros legacy host verification failed).

It appeared to me that the services most likely eventually started because verify.jetstream passed after multiple retries. This verifies that a jetstream service is up and running. It typically does not need to retry, but in this case it retried for close to 1 minute before succeeding.

In the case of chromeos4-row10-jetstream-host8, the host was not reachable after provisioning. Possibly whatever was slowing down boot up on the other DUTs was making this DUT boot slowly enough to time out during SSH setup.

This view is useful for seeing the provisioning failures around 5:30 PM and 7:00 PM yesterday:

  https://viceroy.corp.google.com/chromeos/suite_details?build_id=1633110

Provisioning appears normal today.

Looking through logs from chromeos4-row10-jetstream-host8:

The host became SSHable in the subsequent repair after a repair.servoreset.

Looking at the logs pulled after reset, noticed some bluetooth issues, unclear if it is related:

1970-01-01T00:01:08.750285+00:00 WARNING kernel: [   68.773063] udevd[144]: seq 1024 '/devices/soc.2/usb30.5/10000000.dwc3/xhci-hcd.1.auto/usb3/3-1/3-1:1.0/bluetooth/hci0' is taking a long time

2017-06-27T14:16:13.282770+00:00 ERR kernel: [  188.773128] udevd[144]: seq 1024 '/devices/soc.2/usb30.5/10000000.dwc3/xhci-hcd.1.auto/usb3/3-1/3-1:1.0/bluetooth/hci0' killed

2017-06-27T14:16:13.282823+00:00 ERR kernel: [  188.774317] udevd[144]: worker [713] failed while handling '/devices/soc.2/usb30.5/10000000.dwc3/xhci-hcd.1.auto/usb3/3-1/3-1:1.0/bluetooth/hci0'

Cc: akes...@chromium.org xixuan@chromium.org
Owner: akes...@chromium.org
Status: WontFix (was: Assigned)
Maybe flake, +current deputy fyi
I don't believe this was a flake: something went in to R61-9718.0.0 that greatly increased the time for whirlwinds to become fully operational. This in turn caused whirlwind host verification to fail 100% since R61-9718. See crbug/739583.
Mergedinto: 739583
Status: Duplicate (was: WontFix)
This was due to a real failure related to https://chromium-review.googlesource.com/c/437525/

Sign in to add a comment