New issue
Advanced search Search tips

Issue 918161 link

Starred by 3 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

nyan_big-paladin rebooted by check_ethernet Autotest hook during testing

Project Member Reported by dhanyaganesh@chromium.org, Dec 28

Issue description

Sheriff here. I don't see any video.WebRTCCamera harmful changes in the build. derat@, Could you please take a look?

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8925920109826078720

Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 800, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 464, in execute
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 371, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/tast/tast.py", line 158, in run_once
    self._parse_results(run_failed)
  File "/usr/local/autotest/server/site_tests/tast/tast.py", line 369, in _parse_results
    raise error.TestFail(failure_msg)
TestFail: 1 failed: video.WebRTCCamera; 2 missing: video.WebRTCPeerConnCameraH264 video.WebRTCPeerConnCameraVP8
 
Cc: akes...@chromium.org dgarr...@chromium.org derat@chromium.org ihf@chromium.org
Components: -Tests>Tast Infra>Client>ChromeOS>Test
Owner: dhanyaganesh@chromium.org
Summary: nyan_big-paladin rebooted by check_ethernet Autotest hook during testing (was: nyan_big-paladin fails video.WebRTCCamera)
The message shown in the waterfall is correct here: "Lost SSH connection to chromeos4-row5-rack10-host3:22: EOF, Test did not finish"

So the DUT went away in the middle of testing. From full.txt in the logs dir:

...
2018/12/27 23:29:04 [23:29:04.268] Connecting to Chrome at ws://127.0.0.1:43698/devtools/page/E11B487492CE7C4F76EA34D59474E9CF
2018/12/27 23:32:19 Got global error mid-test: timed out after waiting 3m15.138s for next message
2018/12/27 23:32:30 Failed to run tests: context deadline exceeded
2018/12/27 23:32:30 Lost SSH connection to chromeos4-row5-rack10-host3:22: EOF
2018/12/27 23:32:30 Collecting system information
2018/12/27 23:32:30 Connecting to chromeos4-row5-rack10-host3:22
2018/12/27 23:32:31 Copying /tmp/tast_logs.893059439 from host to /usr/local/autotest/results/lxc_job_folder/tast/results/system_logs
2018/12/27 23:32:31 Cleaning /tmp/tast_logs.893059439 on host
2018/12/27 23:32:31 Copying /tmp/tast_crashes.070222338 from host to /usr/local/autotest/results/lxc_job_folder/tast/results/crashes
2018/12/27 23:32:31 Cleaning /tmp/tast_crashes.070222338 on host
...

If you look at the messages file in the logs dir, you can see that Autotest apparently decided to reboot the DUT:

...
2018-12-28T07:30:13.999154+00:00 NOTICE check_ethernet.hook[11175]: Attempting recovery method "toggle_usb_ports"
2018-12-28T07:30:14.088031+00:00 NOTICE check_ethernet.hook[11207]: Rescanning  /sys/bus/usb/drivers/hub/[0-9]*-0:1.0
2018-12-28T07:30:14.139271+00:00 NOTICE check_ethernet.hook[11229]: Rescanning  /sys/bus/usb/drivers/hub/[0-9]*-0:1.0
2018-12-28T07:30:14.148237+00:00 NOTICE check_ethernet.hook[11233]: Attempting recovery method "reload_ethernet_drivers"
2018-12-28T07:30:14.191206+00:00 NOTICE check_ethernet.hook[11253]: Rescanning  /sys/bus/usb/drivers/hub/[0-9]*-0:1.0
2018-12-28T07:30:14.200061+00:00 NOTICE check_ethernet.hook[11257]: All ethernet recovery methods have failed. Rebooting.
...
2018-12-28T07:30:16.717123+00:00 NOTICE pre-shutdown[11344]: Shutting down for reboot: other-request-to-powerd
2018-12-28T07:30:16.732386+00:00 INFO chapsd[1232]: Shutdown triggered by signal 15.
2018-12-28T07:30:16.732474+00:00 INFO chapsd[1232]: chapsd Daemon::OnShutdown invoked.
2018-12-28T07:30:16.734021+00:00 INFO metrics_daemon[3034]: [INFO:metrics_daemon.cc(494)] Got org.freedesktop.DBus.NameOwnerChanged D-Bus signal
2018-12-28T07:30:16.734583+00:00 INFO chapsd[1232]: SlotManagerImpl is shutting down.
2018-12-28T07:30:16.734611+00:00 INFO chapsd[1232]: Waiting for worker thread for slot 0 to exit.
2018-12-28T07:30:16.734636+00:00 INFO chapsd[1232]: Unloading keys for slot 0.
2018-12-28T07:30:16.735264+00:00 INFO btdispatch[2533]: Power manager becomes not available
2018-12-27T23:30:21.702496-08:00 INFO kernel: [    0.000000] Booting Linux on physical CPU 0x0
2018-12-27T23:30:21.702697-08:00 INFO kernel: [    0.000000] Initializing cgroup subsys cpu
2018-12-27T23:30:21.702706-08:00 NOTICE kernel: [    0.000000] Linux version 3.10.18 (chrome-bot@swarm-cros-425) (gcc version 4.9.x 20150123 (prerelease) (4.9.2_cos_gg_4.9.2-r199-ac6128e0a17a52f011797f33ac3e7d6273a9368d_4.9.2-r199) ) #1 SMP Thu Dec 27 22:30:41 PST 2018
...

Someone on the infra side might know more about what triggered this.
See also the corresponding powerd log file:

[1228/073014.528298:INFO:daemon.cc(1006)] Got RequestRestart message from :1.243 with reason other-request-to-powerd (recover_duts check_ethernet hook failed)
Cc: dhanyaganesh@chromium.org
Owner: pprabhu@chromium.org
Forwarding to lab deputy. I'm holding off on marking nyan_big experimental. 
Labels: Hotlist-Deputy
Owner: ----
Status: Untriaged (was: Assigned)
Was late in the day, punting to this week's deputy list.

Autotest isn't rebooting the DUT. All DUTs in the lab have the check-ethernet script installed, which reboots the DUT if the DUT can not reach a well known internet IP for long enough. This is to ensure we have a chance at reclaiming DUTs that lose ethernet connectivity.

So the question is, why did the DUT not have ethernet connectivity for an extended period during that test?


Sign in to add a comment