Usually 2-3 devices on a host will disappear, and reappear after the next host reboot. This is polluting device-repair tickets since the device doesn't need any help, it's the host. It would be preferred if this type of failure was detected and either healed directly by some means on the bot, or cause a host reboot at the next available opportunity.
Scanning through the logs, it looks like something goes awry on the host when the devices drop offline.
- At some point, *all* devices drop offline simultaneously:
Oct 3 16:41:05 build380-m1 kernel: [15208.029805] usb 1-4.4.3: USB disconnect, device number 51
Oct 3 16:41:05 build380-m1 kernel: [15208.285344] usb 1-4.3: USB disconnect, device number 45
Oct 3 16:41:06 build380-m1 kernel: [15208.796439] usb 1-4.1: USB disconnect, device number 42
Oct 3 16:41:06 build380-m1 kernel: [15208.924012] usb 1-4.2: USB disconnect, device number 54
Oct 3 16:41:06 build380-m1 kernel: [15209.051814] usb 1-4.4.2: USB disconnect, device number 48
Oct 3 16:41:06 build380-m1 kernel: [15209.307613] usb 1-4.4.1: USB disconnect, device number 31
Oct 3 16:41:07 build380-m1 kernel: [15210.074365] usb 1-4.4.4: USB disconnect, device number 57
As they reconnect, some spit out errors like:
Oct 3 16:41:19 build380-m1 kernel: [15222.182994] hub 1-4:1.0: hub_port_status failed (err = -71)
Oct 3 16:41:19 build380-m1 kernel: [15222.343259] hub 1-4:1.0: port 3 disabled by hub (EMI?), re-enabling...
I think the scheduled and synchronized launching of all 7 containers is somehow making the hub fall over. I might want to stagger or fuzz the timing on these things.
Comment 1 by bugdroid1@chromium.org
, Jun 1 2018