New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: ----

Sign in to add a comment

provision failures: DUT cannot recover from reboot at post check of rootfs update

Project Member Reported by, Nov 15 2017

Issue description


Due to there's a '\n' at the end of the error messages, builder page only show the first provision error. Will have a fix for that.

We can find useful error messages here:

Another one
Project Member

Comment 2 by, Nov 18 2017

The following revision refers to this bug:

commit d0e6c9442800fec993b880936fcf0fba376c3315
Author: Xixuan Wu <>
Date: Sat Nov 18 04:18:48 2017

autotest: remove line breaks in provision error messages.

TEST=Ran unittest.

Change-Id: Ic747f0d412fef5e5b9bc8b1f0768e939576424c0
Commit-Ready: Ben Chan <>
Tested-by: Xixuan Wu <>
Reviewed-by: Ben Chan <>


Comment 4 by, Nov 20 2017

the last CQ just failed:

Will the latest push_to_prod fix this ?

Comment 5 by, Nov 20 2017

No, the fix is only for better error reporting.

Assign to current sheriff for further checking whether there's any bad CL/bug in ToT with this failed build.

Comment 6 by, Nov 21 2017

I don't see a CL that relates to coral specifically.

Also seeing this dump:

[1;31m15:06:52: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.TestLabFailure'>: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/", line 442, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/", line 701, in Run
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/", line 319, in PerformStage
    raise cmd_result.to_raise
TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/builders/", line 120, in _RunParallelStages
  File "/b/c/cbuild/repository/chromite/lib/", line 679, in RunParallelSteps
    return [queue.get_nowait() for queue in queues]
  File "/b/c/cbuild/repository/chromite/lib/", line 676, in RunParallelSteps
  File "/usr/lib/python2.7/", line 24, in __exit__
  File "/b/c/cbuild/repository/chromite/lib/", line 562, in ParallelTasks
    raise BackgroundFailure(exc_infos=errors)
BackgroundFailure: <class 'chromite.lib.failures_lib.TestLabFailure'>: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/", line 442, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/", line 701, in Run
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/", line 319, in PerformStage
    raise cmd_result.to_raise
TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) **

Comment 7 by, Nov 21 2017

HWTest fails due to DUT cannot get back to online in provision and loses connection "forever" until next repair:

Could sheriff find a proper owner to check why DUT loses connection in reboot? Maybe it's another case of ethernet dongle problem like Issue 768542, Issue 777920.

Comment 8 by, Nov 21 2017

Hi Ben,

How did you verify the ethernet dongle error issue in the runs last week?

*** The provision_AutoUpdate log indicated that issue happened around 11/20/17 13:55:51

START provision_AutoUpdate  provision_AutoUpdate  timestamp=1511214951  localtime=Nov 20 13:55:51 
    FAIL  provision_AutoUpdate  provision_AutoUpdate  timestamp=1511217103  localtime=Nov 20 14:31:43 Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row4-rack10-host12: 0) ChromiumOSUpdateError: chromeos2-row4-rack10-host12 cannot recover from reboot at post check of stateful update, 1) SSHConnectionError: ssh: connect to host port 22: Connection timed out

*** It wasn't clear what happened around that time frame, but the Ethernet adapter issue did appear on 11/21/17 07:57:03. The fix was to flip the adapter back to normal:

*** The Linksys Ethernet adapter went into the UMS mode (13b1:0040)

2017-11-21T07:57:03.752833+09:00 INFO kernel: [    4.137747] usb 2-6: new SuperSpeed USB device number 4 using xhci_hcd
2017-11-21T07:57:03.752837+09:00 INFO kernel: [    4.149265] usb 2-6: New USB device found, idVendor=13b1, idProduct=0040
2017-11-21T07:57:03.752838+09:00 INFO kernel: [    4.149284] usb 2-6: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2017-11-21T07:57:03.752839+09:00 INFO kernel: [    4.149293] usb 2-6: Product: Linksys USB3GIGV1
2017-11-21T07:57:03.752841+09:00 INFO kernel: [    4.149301] usb 2-6: Manufacturer: Linksys
2017-11-21T07:57:03.752842+09:00 INFO kernel: [    4.149308] usb 2-6: SerialNumber: 000001000000
2017-11-21T07:57:03.752844+09:00 INFO kernel: [    4.150731] usb-storage 2-6:1.0: USB Mass Storage device detected
2017-11-21T07:57:03.752849+09:00 INFO kernel: [    4.152463] scsi host0: usb-storage 2-6:1.0


*** mist was invoked to switch the Ethernet adapter back Ethernet mode. It did so by performing an USB port reset. You may ignore the mist error for now. The adapter disconnected from USB after port reset (see below).

2017-11-21T07:57:04.309199+09:00 ERR mist[344]: Could not perform a USB port reset: ErrorNotFound
2017-11-21T07:57:04.309227+09:00 ERR mist[344]: Could not switch device '/sys/devices/pci0000:00/0000:00:15.0/usb2/2-6' into the modem mode.


*** The Ethernet adapter went back to Ethernet mode (13b1:0041) and eth1 came up.

2017-11-21T07:57:04.666075+09:00 INFO kernel: [    5.736506] usb 2-6: new SuperSpeed USB device number 5 using xhci_hcd
2017-11-21T07:57:04.677641+09:00 INFO kernel: [    5.747937] usb 2-6: New USB device found, idVendor=13b1, idProduct=0041
2017-11-21T07:57:04.677676+09:00 INFO kernel: [    5.747948] usb 2-6: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2017-11-21T07:57:04.677679+09:00 INFO kernel: [    5.747951] usb 2-6: Product: Linksys USB3GIGV1
2017-11-21T07:57:04.677681+09:00 INFO kernel: [    5.747954] usb 2-6: Manufacturer: Linksys
2017-11-21T07:57:04.677682+09:00 INFO kernel: [    5.747957] usb 2-6: SerialNumber: 000001000000
2017-11-21T07:57:04.786091+09:00 INFO kernel: [    5.856737] usb 2-6: reset SuperSpeed USB device number 5 using xhci_hcd
2017-11-21T07:57:04.826574+09:00 INFO kernel: [    5.895002] r8152 2-6:1.0 eth1: v1.08.3


*** shill picked up eth1 and DHCP went through

2017-11-21T07:57:05.331061+09:00 INFO shill[1154]: [] Device created: eth1 index 4
2017-11-21T07:57:05.333486+09:00 INFO shill[1154]: [] ethernet service 2 constructed.
2017-11-21T07:57:05.334367+09:00 INFO shill[1154]: [] Device eth1 registered.
2017-11-21T07:57:05.334812+09:00 INFO shill[1154]: [] Device eth1 updated: enabled
2017-11-21T07:57:05.391153+09:00 INFO shill[1154]: [] Registering eth1 with manager.
2017-11-21T07:57:05.391476+09:00 INFO kernel: [    6.461892] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
2017-11-21T07:57:09.100050+09:00 INFO kernel: [   10.171072] r8152 2-6:1.0 eth1: carrier on
2017-11-21T07:57:09.100087+09:00 INFO kernel: [   10.171115] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
2017-11-21T07:57:09.921730+09:00 INFO dhcpcd[1321]: eth1: rebinding lease of
2017-11-21T07:57:09.921944+09:00 INFO dhcpcd[1321]: eth1: sending REQUEST (xid 0xd5237bc5), next in 4.1 seconds
I logged onto chromeos2-row4-rack10-host12 but couldn't find any relevant log on 11/20/17. Seems like the stateful was wiped after a full recovery. Subsequent log did indicate that we were able to recover the Ethernet adapter that went into mass storage mode, so I'm not quite sure what led to the AU error on 11/20/17.

Have we seen other instances today?
Thanks for all the detailed analysis, Ben!

The commit queue was green for 4 consecutive times last night.  That seems to imply that the provisioning failure wasn't due to ethernet flakiness?

Ben, how often would the ethernet dongle issue come up in the CQ last week?  Do you think after 4 straight clean runs you would call it fixed?
It's not exactly the same reason, both DUTs failed to reboot at post check of stateful update.

Comment 14 by, Nov 22 2017

I think we saw another failure this afternoon

Comment 15 by, Nov 22 2017

Labels: Chase-Pending
Labels: -Chase-Pending
Ben any status update on this?
The SSHConnectionError could be a symptom of a similar issue or completely different issue (e.g. issue 788584, 777152, 733425). The fix to  issue 452686  specifically targets the Ethernet adapters mentioned above when they are enumerated as a USB storage. Other issues like 788584 seem to indicate that the adapter failed to enumerate over USB, which will require a different fix.

Since Chrome OS build 10141.0.0, we should be able to switch the following Ethernet adapters back to Ethernet mode once we detect they are in the USB mass storage mode (more details can be found in  issue 452686 ):

- Samsung USB Ethernet Adapter AA-AE2N12B [04e8:a100 -> 04e8:a101]
- Linksys USB3GIGV1 USB 3.0 Gigabit Ethernet Adapter [13b1:0040 -> 13b1:0041]
- Realtek RTL8153 based Ethernet adapters [0bda:8152 -> 0bda:8153]

From what I observed in my local reboot stress tests, these adapters were switched back to Ethernet mode as expected. The logs (in comment #9) seemed to also confirm that the switching did happen. We can examine the before-repair logs for the DUTs used in those failed paladin runs to confirm. Is that any easy to pull those logs out?

You can run dut-status:

xixuan@xixuan0:~/chromiumos/src/third_party/autotest/files$ dut-status chromeos2-row4-rack10-host12 -s '2017-11-20 12:00:00' -u '2017-11-20 17:00:00' -f
2017-11-20 16:57:47  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61825265-reset/
    2017-11-20 16:53:49  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/157458549-chromeos-test/
    2017-11-20 16:41:41  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61825216-provision/
    2017-11-20 14:37:08  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61824851-repair/
    2017-11-20 13:55:17  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61824131-provision/ ---the failed provision_AutoUpdate in #9
    2017-11-20 12:11:04  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/157415526-chromeos-test/
    2017-11-20 12:09:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61823915-reset/
    2017-11-20 12:07:02  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/157415520-chromeos-test/
    2017-11-20 12:05:58  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61823904-reset/
    2017-11-20 12:03:15  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/157415515-chromeos-test/
    2017-11-20 12:02:15  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61823879-reset/
    2017-11-20 12:00:19  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/157415503-chromeos-test/
Re #18: Thanks!

According to http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row4-rack10-host12/61824851-repair:

The Linksys Ethernet adapter failed to enumerate over USB:

2017-11-21T07:45:42.017981+09:00 ERR kernel: [ 1138.446978] usb 2-6: device not accepting address 113, error -71
2017-11-21T07:45:42.038304+09:00 ERR kernel: [ 1138.467844] usb usb2-port6: unable to enumerate USB device

# FAIL	----	verify.ssh	timestamp=1511218042	localtime=Nov 20 14:47:22	No answer to ping from chromeos2-row4-rack10-host12

# START	----	repair.servoreset	timestamp=1511218528	localtime=Nov 20 14:55:28

2017-11-21T07:55:31.352744+09:00 INFO kernel: [    2.250611] usb 2-6: new SuperSpeed USB device number 3 using xhci_hcd
2017-11-21T07:55:31.352746+09:00 INFO kernel: [    2.262477] usb 2-6: New USB device found, idVendor=13b1, idProduct=0041
2017-11-21T07:55:31.352752+09:00 INFO kernel: [    2.262497] usb 2-6: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2017-11-21T07:55:31.352755+09:00 INFO kernel: [    2.262506] usb 2-6: Product: Linksys USB3GIGV1
2017-11-21T07:55:31.352756+09:00 INFO kernel: [    2.262514] usb 2-6: Manufacturer: Linksys
2017-11-21T07:55:31.352757+09:00 INFO kernel: [    2.262526] usb 2-6: SerialNumber: 000001000000
w.r.t. #19: this looks like issue 788584
Components: Infra>Client>ChromeOS>CI
Components: -Infra>Client>ChromeOS
Owner: ----

Sign in to add a comment