New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 801718 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

samus: Host did not return from reboot

Project Member Reported by linben@chromium.org, Jan 12 2018

Issue description

samus-nyc-android-pfq failed:
https://luci-milo.appspot.com/buildbot/chromeos/samus-nyc-android-pfq/1429

There's no clear reason from the test logs (the error logs are both blank for both the original run and the re-run), but the build failure does say this:

cheets_ClobberStateful: ABORT: Host did not return from reboot

Need to check in DUT history to see what went wrong when reboot was attempted.
 

Comment 1 by linben@chromium.org, Jan 13 2018

Was able to get some more information by looking at the DUT's history.

The test failure happened around 6:47 am:
START	cheets_ClobberStateful	cheets_ClobberStateful	timestamp=1515768440	localtime=Jan 12 06:47:20	
	START	----	----	timestamp=1515768461	localtime=Jan 12 06:47:41	
		GOOD	----	sysinfo.before	timestamp=1515768461	localtime=Jan 12 06:47:41	
	END GOOD	----	----	timestamp=1515768461	localtime=Jan 12 06:47:41	
	START	----	----	timestamp=1515768473	localtime=Jan 12 06:47:53	
		GOOD	----	sysinfo.iteration.before	timestamp=1515768473	localtime=Jan 12 06:47:53	
	END GOOD	----	----	timestamp=1515768473	localtime=Jan 12 06:47:53	
	START	----	reboot	timestamp=1515768480	localtime=Jan 12 06:48:00	
		GOOD	----	reboot.start	timestamp=1515768480	localtime=Jan 12 06:48:00	
		GOOD	----	reboot.verify	timestamp=1515768494	localtime=Jan 12 06:48:14	
	END GOOD	----	reboot	kernel=3.14.0	localtime=Jan 12 06:48:15	timestamp=1515768495	
	START	----	reboot	timestamp=1515768535	localtime=Jan 12 06:48:55	
		GOOD	----	reboot.start	timestamp=1515768535	localtime=Jan 12 06:48:55	
		ABORT	----	reboot.verify	timestamp=1515768858	localtime=Jan 12 06:54:18	Host did not return from reboot
	END FAIL	----	reboot	timestamp=1515768858	localtime=Jan 12 06:54:18	Host did not return from reboot

Looking at the DUT's logs, I was able to find these:https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row3-rack10-host15/1697110-provision/
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row3-rack10-host15/1697154-repair

Which are the closest tasks that ran near that timeframe.

Provision stack:
START	----	provision	timestamp=1515769286	localtime=Jan 12 07:01:26	
	GOOD	----	verify.servo_ssh	timestamp=1515769292	localtime=Jan 12 07:01:32	
	GOOD	----	verify.update	timestamp=1515769301	localtime=Jan 12 07:01:41	
	GOOD	----	verify.brd_config	timestamp=1515769302	localtime=Jan 12 07:01:42	
	GOOD	----	verify.ser_config	timestamp=1515769303	localtime=Jan 12 07:01:43	
	GOOD	----	verify.job	timestamp=1515769304	localtime=Jan 12 07:01:44	
	GOOD	----	verify.servod	timestamp=1515769309	localtime=Jan 12 07:01:49	
	GOOD	----	verify.pwr_button	timestamp=1515769309	localtime=Jan 12 07:01:49	
	GOOD	----	verify.lid_open	timestamp=1515769309	localtime=Jan 12 07:01:49	
	GOOD	----	verify.PASS	timestamp=1515769309	localtime=Jan 12 07:01:49	
	START	provision_AutoUpdate	provision_AutoUpdate	timestamp=1515769309	localtime=Jan 12 07:01:49	
		FAIL	provision_AutoUpdate	provision_AutoUpdate	timestamp=1515769477	localtime=Jan 12 07:04:37	Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: 
      /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_WGOt33ssh-master/socket
      -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
      -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
      ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
      chromeos2-row3-rack10-host15 "export LIBC_FATAL_STDERR_=1; if type
      \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack:
      :get_chromeos_release_milestone|_get_lsb_release_content|run] ->
      ssh_run(cat \\\"/etc/lsb-release\\\")\";fi; cat \"/etc/lsb-release\""
  Exit status: 255
  Duration: 63.5739560127
  

Repair stack:
START	----	repair	timestamp=1515769777	localtime=Jan 12 07:09:37	
	GOOD	----	verify.servo_ssh	timestamp=1515769781	localtime=Jan 12 07:09:41	
	GOOD	----	verify.brd_config	timestamp=1515769782	localtime=Jan 12 07:09:42	
	GOOD	----	verify.ser_config	timestamp=1515769783	localtime=Jan 12 07:09:43	
	GOOD	----	verify.job	timestamp=1515769784	localtime=Jan 12 07:09:44	
	GOOD	----	verify.servod	timestamp=1515769787	localtime=Jan 12 07:09:47	
	GOOD	----	verify.pwr_button	timestamp=1515769787	localtime=Jan 12 07:09:47	
	GOOD	----	verify.lid_open	timestamp=1515769787	localtime=Jan 12 07:09:47	
	GOOD	----	verify.update	timestamp=1515769792	localtime=Jan 12 07:09:52	
	GOOD	----	verify.PASS	timestamp=1515769792	localtime=Jan 12 07:09:52	
	FAIL	----	verify.ssh	timestamp=1515770380	localtime=Jan 12 07:19:40	No answer to ping from chromeos2-row3-rack10-host15
	START	----	repair.rpm	timestamp=1515770380	localtime=Jan 12 07:19:40	
		FAIL	----	repair.rpm	timestamp=1515770624	localtime=Jan 12 07:23:44	chromeos2-row3-rack10-host15 is still offline after powercycling
	END FAIL	----	repair.rpm	timestamp=1515770624	localtime=Jan 12 07:23:44	
	START	----	repair.sysrq	timestamp=1515770624	localtime=Jan 12 07:23:44	
		FAIL	----	repair.sysrq	timestamp=1515770924	localtime=Jan 12 07:28:44	Host chromeos2-row3-rack10-host15 is still offline after sysrq.
	END FAIL	----	repair.sysrq	timestamp=1515770924	localtime=Jan 12 07:28:44	
	START	----	repair.servoreset	timestamp=1515770924	localtime=Jan 12 07:28:44	
		INFO	----	----	timestamp=1515770985	localtime=Jan 12 07:29:45	Start crashcollection record
		INFO	----	New Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/usr/local/autotest/results/hosts/chromeos2-row3-rack10-host15/1697154-repair/20181201070931/crashinfo.chromeos2-row3-rack10-host15/chrome.20180112.072441.1566.dmp
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/chrome.20180112.072441.1566.core
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/service_failure.20180112.072451.0.log
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/chrome.20180112.072441.1566.dmp
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/chrome.20180112.072441.1566.meta
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/chrome.20180112.072441.1566.log
		INFO	----	Orphaned Crash Dump	timestamp=1515770985	localtime=Jan 12 07:29:45	/var/spool/crash/service_failure.20180112.072451.0.meta
		INFO	----	----	timestamp=1515770985	localtime=Jan 12 07:29:45	End crashcollection record
		GOOD	----	verify.ssh	timestamp=1515771012	localtime=Jan 12 07:30:12	
	END GOOD	----	repair.servoreset	timestamp=1515771012	localtime=Jan 12 07:30:12	
	GOOD	----	verify.fwstatus	timestamp=1515771013	localtime=Jan 12 07:30:13	
	GOOD	----	verify.good_au	timestamp=1515771013	localtime=Jan 12 07:30:13	
	GOOD	----	verify.devmode	timestamp=1515771014	localtime=Jan 12 07:30:14	
	GOOD	----	verify.writable	timestamp=1515771015	localtime=Jan 12 07:30:15	
	GOOD	----	verify.tpm	timestamp=1515771016	localtime=Jan 12 07:30:16	
	GOOD	----	verify.ext4	timestamp=1515771018	localtime=Jan 12 07:30:18	
	GOOD	----	verify.power	timestamp=1515771018	localtime=Jan 12 07:30:18	
	GOOD	----	verify.rwfw	timestamp=1515771019	localtime=Jan 12 07:30:19	
	GOOD	----	verify.python	timestamp=1515771019	localtime=Jan 12 07:30:19	
	GOOD	----	verify.cros	timestamp=1515771028	localtime=Jan 12 07:30:28	
	GOOD	----	verify.hwid	timestamp=1515771030	localtime=Jan 12 07:30:30	
	GOOD	----	verify.PASS	timestamp=1515771030	localtime=Jan 12 07:30:30	
	START	----	reboot	timestamp=1515771031	localtime=Jan 12 07:30:31	
		GOOD	----	reboot.start	timestamp=1515771031	localtime=Jan 12 07:30:31	
		GOOD	----	reboot.verify	timestamp=1515771051	localtime=Jan 12 07:30:51	
	END GOOD	----	reboot	kernel=3.14.0	localtime=Jan 12 07:30:55	timestamp=1515771055	
	INFO	----	repair	timestamp=1515771055	localtime=Jan 12 07:30:55	Can't repair label 'pool:bvt'.
	INFO	----	repair	timestamp=1515771055	localtime=Jan 12 07:30:55	Can't repair label 'board:samus'.
	INFO	----	repair	timestamp=1515771055	localtime=Jan 12 07:30:55	Can't repair label 'cros-version:samus-release/R65-10302.0.0'.
END GOOD	----	repair	timestamp=1515771055	localtime=Jan 12 07:30:55	chromeos2-row3-rack10-host15 repaired successfully

Comment 2 by linben@chromium.org, Jan 13 2018

Also, stack of ClobberStateful failure:

  Traceback (most recent call last):
    File "/usr/local/autotest/server/server_job.py", line 1033, in run_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 160, in reboot
      **dargs)
    File "/usr/local/autotest/server/hosts/remote.py", line 229, in wait_for_restart
      self.log_op(self.OP_REBOOT, op_func)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 566, in log_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 228, in op_func
      super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart
      raise error.AutoservRebootError("Host did not return from reboot")
  AutoservRebootError: Host did not return from reboot
	FAIL	cheets_ClobberStateful	cheets_ClobberStateful	timestamp=1515769112	localtime=Jan 12 06:58:32	Unhandled AutoservRebootError: Host did not return from reboot
  Traceback (most recent call last):
    File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
      dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/server/site_tests/cheets_ClobberStateful/cheets_ClobberStateful.py", line 36, in run_once
      self.client.reboot()
    File "/usr/local/autotest/server/hosts/cros_host.py", line 1520, in reboot
      super(CrosHost, self).reboot(**dargs)
    File "/usr/local/autotest/server/hosts/remote.py", line 164, in reboot
      self.log_op(self.OP_REBOOT, reboot)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 562, in log_op
      self.job.run_op(op, op_func, self.get_kernel_ver)
    File "/usr/local/autotest/server/server_job.py", line 1033, in run_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 160, in reboot
      **dargs)
    File "/usr/local/autotest/server/hosts/remote.py", line 229, in wait_for_restart
      self.log_op(self.OP_REBOOT, op_func)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 566, in log_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 228, in op_func
      super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart
      raise error.AutoservRebootError("Host did not return from reboot")
  AutoservRebootError: Host did not return from reboot
END FAIL	cheets_ClobberStateful	cheets_ClobberStateful	timestamp=1515769112	localtime=Jan 12 06:58:32

Comment 3 by linben@chromium.org, Jan 13 2018

Summary: samus: Host did not return from reboot (was: cheets_ClobberStateful: Host did not return from reboot)
Found another similar case for platform_PowerWash test.
https://luci-milo.appspot.com/buildbot/chromeos/samus-release/5017

Looking at DUT history, I see the following reset + repair combo:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row3-rack10-host13/1694699-reset/20181101211402/
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos2-row3-rack10-host13/1694717-repair/20181101211629/

And the repair also had to do some powercycling:
START	----	repair	timestamp=1515734197	localtime=Jan 11 21:16:37	
	GOOD	----	verify.servo_ssh	timestamp=1515734200	localtime=Jan 11 21:16:40	
	GOOD	----	verify.brd_config	timestamp=1515734202	localtime=Jan 11 21:16:42	
	GOOD	----	verify.ser_config	timestamp=1515734202	localtime=Jan 11 21:16:42	
	GOOD	----	verify.job	timestamp=1515734204	localtime=Jan 11 21:16:44	
	GOOD	----	verify.servod	timestamp=1515734209	localtime=Jan 11 21:16:49	
	GOOD	----	verify.pwr_button	timestamp=1515734209	localtime=Jan 11 21:16:49	
	GOOD	----	verify.lid_open	timestamp=1515734209	localtime=Jan 11 21:16:49	
	GOOD	----	verify.update	timestamp=1515734223	localtime=Jan 11 21:17:03	
	GOOD	----	verify.PASS	timestamp=1515734223	localtime=Jan 11 21:17:03	
	FAIL	----	verify.ssh	timestamp=1515734812	localtime=Jan 11 21:26:52	No answer to ping from chromeos2-row3-rack10-host13
	START	----	repair.rpm	timestamp=1515734812	localtime=Jan 11 21:26:52	
		FAIL	----	repair.rpm	timestamp=1515735056	localtime=Jan 11 21:30:56	chromeos2-row3-rack10-host13 is still offline after powercycling
	END FAIL	----	repair.rpm	timestamp=1515735056	localtime=Jan 11 21:30:56	
	START	----	repair.sysrq	timestamp=1515735056	localtime=Jan 11 21:30:56	
		FAIL	----	repair.sysrq	timestamp=1515735352	localtime=Jan 11 21:35:52	Host chromeos2-row3-rack10-host13 is still offline after sysrq.
	END FAIL	----	repair.sysrq	timestamp=1515735352	localtime=Jan 11 21:35:52	
	START	----	repair.servoreset	timestamp=1515735353	localtime=Jan 11 21:35:53	
		GOOD	----	verify.ssh	timestamp=1515735377	localtime=Jan 11 21:36:17	
	END GOOD	----	repair.servoreset	timestamp=1515735377	localtime=Jan 11 21:36:17	
	GOOD	----	verify.fwstatus	timestamp=1515735377	localtime=Jan 11 21:36:17	
	GOOD	----	verify.good_au	timestamp=1515735378	localtime=Jan 11 21:36:18	
	GOOD	----	verify.devmode	timestamp=1515735379	localtime=Jan 11 21:36:19	
	GOOD	----	verify.writable	timestamp=1515735380	localtime=Jan 11 21:36:20	
	GOOD	----	verify.tpm	timestamp=1515735381	localtime=Jan 11 21:36:21	
	GOOD	----	verify.ext4	timestamp=1515735382	localtime=Jan 11 21:36:22	
	GOOD	----	verify.power	timestamp=1515735383	localtime=Jan 11 21:36:23	
	GOOD	----	verify.rwfw	timestamp=1515735384	localtime=Jan 11 21:36:24	
	FAIL	----	verify.python	timestamp=1515735385	localtime=Jan 11 21:36:25	Python is missing; may be caused by powerwash
	GOOD	----	verify.cros	timestamp=1515735395	localtime=Jan 11 21:36:35	
	START	----	repair.au	timestamp=1515735395	localtime=Jan 11 21:36:35	
		GOOD	----	verify.ssh	timestamp=1515735752	localtime=Jan 11 21:42:32	
		GOOD	----	verify.power	timestamp=1515735756	localtime=Jan 11 21:42:36	
		GOOD	----	verify.rwfw	timestamp=1515735757	localtime=Jan 11 21:42:37	
		GOOD	----	verify.python	timestamp=1515735758	localtime=Jan 11 21:42:38	
		GOOD	----	verify.cros	timestamp=1515735767	localtime=Jan 11 21:42:47	
	END GOOD	----	repair.au	timestamp=1515735767	localtime=Jan 11 21:42:47	
	GOOD	----	verify.hwid	timestamp=1515735769	localtime=Jan 11 21:42:49	
	GOOD	----	verify.PASS	timestamp=1515735769	localtime=Jan 11 21:42:49	
	START	----	reboot	timestamp=1515735769	localtime=Jan 11 21:42:49	
		GOOD	----	reboot.start	timestamp=1515735769	localtime=Jan 11 21:42:49	
		GOOD	----	reboot.verify	timestamp=1515735790	localtime=Jan 11 21:43:10	
	END GOOD	----	reboot	kernel=3.14.0	localtime=Jan 11 21:43:15	timestamp=1515735795	
	INFO	----	repair	timestamp=1515735795	localtime=Jan 11 21:43:15	Can't repair label 'pool:bvt'.
	INFO	----	repair	timestamp=1515735795	localtime=Jan 11 21:43:15	Can't repair label 'board:samus'.
	INFO	----	repair	timestamp=1515735795	localtime=Jan 11 21:43:15	Can't repair label 'cros-version:samus-release/R65-10301.0.0'.
END GOOD	----	repair	timestamp=1515735795	localtime=Jan 11 21:43:15	chromeos2-row3-rack10-host13 repaired successfully

Sign in to add a comment