New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 905092 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

"Repair" puts DUT in boot loop

Reported by chasing....@gmail.com, Nov 14

Issue description

1. Note that the connection between moblab and DUT is moblab -> ethernet -> access point -> wifi -> DUT
2. I can't describe exactly how to reproduce this behavior, but I so far observed that boot loop happens after "repair" job.
3. It doesn't happen after every repair, I have experience 1 time on nocturne, and 2 times on nami
4. When in boot loop, provision can't really happen, so DUT can't really run jobs
 
Cc: puthik@chromium.org mqg@chromium.org tbroch@chromium.org mattmallett@chromium.org
Owner: haddowk@chromium.org
Are there any logs you can share?
Just as a note if the DUT has lost network connectivity it will reboot every 3 mins,  you can remove the file /mnt/stateful_partition/.labmachine to stop this reboot.

I fear with your unusual networking you might be triggering this network recovery loop, ggrundler can help with how this is triggered as he implemented it.


That does seem to be the reboot cadence that I am getting. 
(On the DUT) rm /mnt/stateful_partition/.labmachine worked. 

When the boot loop was happening, it would happen even if I take the DUT off moblab network. 
Yes the boot loop has nothing to do with moblab, test images that are deemed to be in the lab try to recover their usb network connection if it drops out, the last step in that recovery is to reboot the device, there are logs in /var/log/messages as to what other steps it takes.

Sadly all of this is necessary because we are not able to get a truly stable USB Ethernet connection when there are regular reboots.
Ah yes ... our friend recover_duts,

https://chromium.googlesource.com/chromiumos/platform/crostestutils/+/master/recover_duts/README

Should see something like this in syslog,

"All ethernet recovery methods have failed. Rebooting."


Status: Fixed (was: Untriaged)
Sounds like the mystery is solved here.
Just want to add one more observation, once /mnt/stateful_partition/.labmachine exists and DUT is in boot loop, "repair" from moblab get stuck in the middle of the process. 
Are there logs you can share for the failure beyond 'reboot loop'?  Lets either attach those here and re-open after changing the summary or file a new bug to address that.

Sign in to add a comment