New issue
Advanced search Search tips

Issue 771684 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Android devices drop offline; always brought back by host reboot

Project Member Reported by bpastene@chromium.org, Oct 4 2017

Issue description

Usually 2-3 devices on a host will disappear, and reappear after the next host reboot. This is polluting device-repair tickets since the device doesn't need any help, it's the host. It would be preferred if this type of failure was detected and either healed directly by some means on the bot, or cause a host reboot at the next available opportunity.

Scanning through the logs, it looks like something goes awry on the host when the devices drop offline.

- At some point, *all* devices drop offline simultaneously:
Oct  3 16:41:05 build380-m1 kernel: [15208.029805] usb 1-4.4.3: USB disconnect, device number 51
Oct  3 16:41:05 build380-m1 kernel: [15208.285344] usb 1-4.3: USB disconnect, device number 45
Oct  3 16:41:06 build380-m1 kernel: [15208.796439] usb 1-4.1: USB disconnect, device number 42
Oct  3 16:41:06 build380-m1 kernel: [15208.924012] usb 1-4.2: USB disconnect, device number 54
Oct  3 16:41:06 build380-m1 kernel: [15209.051814] usb 1-4.4.2: USB disconnect, device number 48
Oct  3 16:41:06 build380-m1 kernel: [15209.307613] usb 1-4.4.1: USB disconnect, device number 31
Oct  3 16:41:07 build380-m1 kernel: [15210.074365] usb 1-4.4.4: USB disconnect, device number 57

As they reconnect, some spit out errors like:
Oct  3 16:41:19 build380-m1 kernel: [15222.182994] hub 1-4:1.0: hub_port_status failed (err = -71)
Oct  3 16:41:19 build380-m1 kernel: [15222.343259] hub 1-4:1.0: port 3 disabled by hub (EMI?), re-enabling...

I think the scheduled and synchronized launching of all 7 containers is somehow making the hub fall over. I might want to stagger or fuzz the timing on these things.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jun 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/712941a462ea0a98f3e341c5b8dbbfe1945d55f2

commit 712941a462ea0a98f3e341c5b8dbbfe1945d55f2
Author: Ben Pastene <bpastene@chromium.org>
Date: Fri Jun 01 16:58:39 2018

android_docker: Reboot the host if no devices are seen and uptime > 1hr.

Bug: 771684
Change-Id: I757ee140ea63d1b58d3bb4e8c8734139c1f51591
Reviewed-on: https://chromium-review.googlesource.com/1073638
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>

[modify] https://crrev.com/712941a462ea0a98f3e341c5b8dbbfe1945d55f2/infra/services/android_docker/__main__.py

Sign in to add a comment