"Repair" puts DUT in boot loop
Reported by
chasing....@gmail.com,
Nov 14
|
||
Issue description1. Note that the connection between moblab and DUT is moblab -> ethernet -> access point -> wifi -> DUT 2. I can't describe exactly how to reproduce this behavior, but I so far observed that boot loop happens after "repair" job. 3. It doesn't happen after every repair, I have experience 1 time on nocturne, and 2 times on nami 4. When in boot loop, provision can't really happen, so DUT can't really run jobs
,
Nov 14
Are there any logs you can share?
,
Nov 14
Just as a note if the DUT has lost network connectivity it will reboot every 3 mins, you can remove the file /mnt/stateful_partition/.labmachine to stop this reboot. I fear with your unusual networking you might be triggering this network recovery loop, ggrundler can help with how this is triggered as he implemented it.
,
Nov 14
That does seem to be the reboot cadence that I am getting.
,
Nov 14
(On the DUT) rm /mnt/stateful_partition/.labmachine worked. When the boot loop was happening, it would happen even if I take the DUT off moblab network.
,
Nov 14
Yes the boot loop has nothing to do with moblab, test images that are deemed to be in the lab try to recover their usb network connection if it drops out, the last step in that recovery is to reboot the device, there are logs in /var/log/messages as to what other steps it takes. Sadly all of this is necessary because we are not able to get a truly stable USB Ethernet connection when there are regular reboots.
,
Nov 14
Ah yes ... our friend recover_duts, https://chromium.googlesource.com/chromiumos/platform/crostestutils/+/master/recover_duts/README Should see something like this in syslog, "All ethernet recovery methods have failed. Rebooting."
,
Nov 27
Sounds like the mystery is solved here.
,
Nov 27
Just want to add one more observation, once /mnt/stateful_partition/.labmachine exists and DUT is in boot loop, "repair" from moblab get stuck in the middle of the process.
,
Nov 27
Are there logs you can share for the failure beyond 'reboot loop'? Lets either attach those here and re-open after changing the summary or file a new bug to address that. |
||
►
Sign in to add a comment |
||
Comment 1 by mqg@chromium.org
, Nov 14Owner: haddowk@chromium.org