New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 690678 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jul 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

generic_RebootTest Flake

Project Member Reported by sbasi@chromium.org, Feb 9 2017

Issue description

A number of test runs are hitting flake in generic reboot test. Whats strange is this is the only test experiencing it.

Will track them here.

http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=99575830
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/99575830-chromeos-test/chromeos6-row2-rack11-host7/debug

02/03 17:04:16.230 DEBUG|      abstract_ssh:0744| Restarting master ssh connection
02/03 17:04:54.694 ERROR|           metrics:0429| Caught exception while flushing: No module named pyasn1.codec.ber
02/03 17:05:05.483 WARNI|        base_utils:0912| run process timeout (49) fired on: /usr/bin/ssh -a -x     -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos6-row2-rack11-host7 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::wait_up|is_up|ssh_ping] -> ssh_run(true)\";fi; true"
02/03 17:05:07.503 DEBUG|      abstract_ssh:0599| Host chromeos6-row2-rack11-host7 is still down after waiting 343 seconds
02/03 17:05:07.504 INFO |        server_job:0183| 		ABORT	----	reboot.verify	timestamp=1486170307	localtime=Feb 03 17:05:07	Host did not return from reboot
02/03 17:05:07.506 INFO |        server_job:0183| 	END FAIL	----	reboot	timestamp=1486170307	localtime=Feb 03 17:05:07	Host did not return from reboot
  Traceback (most recent call last):
    File "/usr/local/autotest/server/server_job.py", line 937, in run_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 150, in reboot
      **dargs)
    File "/usr/local/autotest/server/hosts/remote.py", line 219, in wait_for_restart
      self.log_op(self.OP_REBOOT, op_func)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 548, in log_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 218, in op_func
      super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 309, in wait_for_restart
      raise error.AutoservRebootError("Host did not return from reboot")
  AutoservRebootError: Host did not return from reboot



What is interesting is this error "Caught exception while flushing: No module named pyasn1.codec.ber"

I'll add more instances of this failure as I go through the failed CQ runs.
 
I'm 90% sure the pyasn1 thing is an unrelated and benign message. It's what I tried to fix in crbug.com/676696. Looking at your timestamps, I think the are before my fix. But in any case, it's only from the monarch metrics stuff, shouldn't have caused failure.
Cc: yunlian@chromium.org manojgupta@chromium.org laszio@chromium.org
this test keeps failing once in a while. 
please fix or move to bvt-perbuild. 
test flakes are killing the PFQ.

this just happened here:

https://uberchromegw.corp.google.com/i/chromeos/builders/daisy_skate-chrome-pfq/builds/3551/steps/HWTest%20%5Bbvt-cq%5D/logs/stdio


and it has happened in other places too:

https://bugs.chromium.org/p/chromium/issues/list?can=2&q=generic_RebootTest&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids

Cc: ayatane@chromium.org xixuan@chromium.org
 Issue 721149  has been merged into this issue.

Comment 4 by ecgh@chromium.org, Dec 15 2017

Another example:
https://luci-milo.appspot.com/buildbot/chromeos/sentry-paladin/1794

12/15 04:47:12.283 DEBUG|      abstract_ssh:0819| Restarting master ssh connection
12/15 04:47:12.284 DEBUG|     ssh_multiplex:0118| Nuking ssh master_job
12/15 04:47:12.284 DEBUG|     ssh_multiplex:0123| Cleaning ssh master_tempdir
12/15 04:47:12.284 INFO |     ssh_multiplex:0092| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_mvottLssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row8-rack9-host4'
12/15 04:47:12.284 DEBUG|             utils:0212| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_mvottLssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row8-rack9-host4'
12/15 04:47:42.441 INFO |     ssh_multiplex:0107| Timed out waiting for master-ssh connection to be established.
12/15 04:48:01.525 WARNI|             utils:0915| run process timeout (19) fired on: /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_zZnkaBssh-master/socket -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos4-row8-rack9-host4 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::is_up|ssh_ping|run] -> ssh_run(true)\";fi; true"
12/15 04:48:03.538 DEBUG|      abstract_ssh:0682| Host chromeos4-row8-rack9-host4 is still down after waiting 312 seconds
12/15 04:48:03.539 INFO |        server_job:0218| 		ABORT	----	reboot.verify	timestamp=1513342083	localtime=Dec 15 04:48:03	Host did not return from reboot
12/15 04:48:03.540 INFO |        server_job:1401| Parsing lines in fast mode
12/15 04:48:03.541 INFO |        server_job:0218| 	END FAIL	----	reboot	timestamp=1513342083	localtime=Dec 15 04:48:03	Host did not return from reboot
  Traceback (most recent call last):
    File "/usr/local/autotest/server/server_job.py", line 1033, in run_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 160, in reboot
      **dargs)
    File "/usr/local/autotest/server/hosts/remote.py", line 229, in wait_for_restart
      self.log_op(self.OP_REBOOT, op_func)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 566, in log_op
      op_func()
    File "/usr/local/autotest/server/hosts/remote.py", line 228, in op_func
      super(RemoteHost, self).wait_for_restart(timeout=timeout, **dargs)
    File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 310, in wait_for_restart
      raise error.AutoservRebootError("Host did not return from reboot")
  AutoservRebootError: Host did not return from reboot

Status: Archived (was: Untriaged)

Sign in to add a comment