[guado_moblab-paladin] Flaky errors of moblab_RunSuite FAIL: Unhandled AutoservRunError: command execution error |
||||||
Issue description[ADD relevant info inline] There are flaky errors since build 5182. Looks like the DUT(chromeos2-row2-rack8-host11) sometimes can't reboot after provision. Link to build or pfq page. https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6182 https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6184 https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6185 https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6189 build # for that buildbot. 6182 6184 6185 6189 Snippet of log that contains the failure. ------- from moblab_RunSuite.tgz START moblab_RunSuite moblab_RunSuite timestamp=1496843442 localtime=Jun 07 06:50:42 FAIL moblab_RunSuite moblab_RunSuite timestamp=1496845772 localtime=Jun 07 07:29:32 Unhandled AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row2-rack8-host11 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|run_as_moblab|wrapper] -> ssh_run(su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R57-9202.66.0 --suite_name=dummy_server')\";fi; su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R57-9202.66.0 --suite_name=dummy_server'" Exit status: 3 Duration: 2193.00971985 stdout: ^[[?25h^[[?0c stderr: Autotest instance: localhost 06-07-2017 [06:52:04] Submitted create_suite_job rpc 06-07-2017 [06:52:10] Created suite job: http://localhost/afe/#tab_id=view_job&object_id=1 @@@STEP_LINK@Link to suite@http://localhost/afe/#tab_id=view_job&object_id=1@@@ The suite job has another 23:29:59.019311 till timeout. 06-07-2017 [07:28:30] Suite job is finished. 06-07-2017 [07:28:30] Start collecting test results and dump them to json. Suite job [ PASSED ] dummy_PassServer [ PASSED ] provision [ FAILED ] provision ABORT: Host did not return from reboot dummy_PassServer [ PASSED ] Suite timings: Downloads started at 2017-06-07 06:52:04 Payload downloads ended at 2017-06-07 06:52:09 Suite started at 2017-06-07 06:52:16 Artifact downloads ended (at latest) at 2017-06-07 06:52:17 Testing started at 2017-06-07 06:52:28 Testing ended at 2017-06-07 07:24:52 ----------------------------------------- --------------------------------- From HWTest stdio chromeos-server22-282: 369b93bde2e76810 1 Autotest instance: cautotest 06-07-2017 [06:33:02] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=122020013 @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=122020013@@@ The suite job has another 0:59:50.019471 till timeout. 06-07-2017 [07:18:03] printing summary of incomplete jobs (1): moblab_DummyServerSuite: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=122020038 06-07-2017 [07:31:47] Suite job is finished. 06-07-2017 [07:31:47] Start collecting test results and dump them to json. Suite job [ PASSED ] moblab_RunSuite [ FAILED ] moblab_RunSuite FAIL: Unhandled AutoservRunError: command execution error Suite timings: Downloads started at 2017-06-07 06:32:51 Payload downloads ended at 2017-06-07 06:32:59 Suite started at 2017-06-07 06:33:23 Artifact downloads ended (at latest) at 2017-06-07 06:33:27 Testing started at 2017-06-07 06:50:42 Testing ended at 2017-06-07 07:29:32 Links to test logs: Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/122020013-chromeos-test/ moblab_RunSuite http://cautotest/tko/retrieve_logs.cgi?job=/results/122020038-chromeos-test/ .. .. ************************************************************ ** Finished Stage HWTest [moblab_quick] - Wed, 07 Jun 2017 07:32:18 -0700 (PDT) ************************************************************ 07:32:18: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.StepFailure'>: Traceback (most recent call last): File "/b/cbuild/chromite/lib/parallel.py", line 441, in _Run self._task(*self._task_args, **self._task_kwargs) File "/b/cbuild/chromite/cbuildbot/stages/generic_stages.py", line 649, in Run raise failures_lib.StepFailure() StepFailure Traceback (most recent call last): File "/b/cbuild/chromite/cbuildbot/builders/generic_builders.py", line 119, in _RunParallelStages parallel.RunParallelSteps(steps) File "/b/cbuild/chromite/lib/parallel.py", line 678, in RunParallelSteps return [queue.get_nowait() for queue in queues] File "/b/cbuild/chromite/lib/parallel.py", line 675, in RunParallelSteps pass File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/b/cbuild/chromite/lib/parallel.py", line 561, in ParallelTasks raise BackgroundFailure(exc_infos=errors) BackgroundFailure: <class 'chromite.lib.failures_lib.StepFailure'>: Traceback (most recent call last): File "/b/cbuild/chromite/lib/parallel.py", line 441, in _Run self._task(*self._task_args, **self._task_kwargs) File "/b/cbuild/chromite/cbuildbot/stages/generic_stages.py", line 649, in Run raise failures_lib.StepFailure() StepFailure 07:32:18: INFO: Running cidb query on pid 20237, repr(query) starts with <sqlalchemy.sql.expression.Select at 0x7fe1b23ac350; Select object> 07:32:18: INFO: Running cidb query on pid 20237, repr(query) starts with <sqlalchemy.sql.expression.Select at 0x7fe1b23ac490; Select object> 06:26:46: INFO: Created cidb engine bot@130.211.191.11 for pid 20255 06:26:46: INFO: Running cidb query on pid 20255, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7fe1b26ad890> -----------------------------------------------------------------------------------------------------------------------
,
Jun 9 2017
Builder: guado_moblab-paladin Builds: https://chromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6206 https://chromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin/builds/6205 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122431308-chromeos-test/chromeos2-row2-rack8-host11/: @@@STEP_LINK@[Test-Logs]: dummy_PassServer: retry_count: 1, GOOD: completed successfully@http://localhost/tko/retrieve_logs.cgi?job=/results/5-moblab/@@@ @@@STEP_LINK@[Flake-Dashboard]: dummy_PassServer@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=dummy_PassServer@@@ Will return from run_suite with status: WARNING Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 818, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 471, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 348, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 381, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/moblab_RunSuite/moblab_RunSuite.py", line 65, in run_once raise e AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row2-rack8-host11 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|run_as_moblab|wrapper] -> ssh_run(su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R59-9460.60.0 --suite_name=dummy_server --retry=True --max_retries=1')\";fi; su - moblab -c '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan --build=cyan-release/R59-9460.60.0 --suite_name=dummy_server --retry=True --max_retries=1'" Exit status: 2 Duration: 2246.62287402
,
Jun 9 2017
This is the same reason it's marked experimental. My vague memory is that dhaddock@ has been trying to resolve that.
,
Jun 26 2017
Nope, not me. Don't know about this test
,
Jun 27 2017
,
Jun 27 2017
I believe you meant to assign to Keith Haddow. Though this bug may be old news by now.
,
Jan 30 2018
There are various versions of this bug, we have been stable mostly but at this time moblab is experimetal but nothing related to this issue. Marking as obsolete, since there are newer tracking bugs mostly related to netowrking issues. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by mojahsu@chromium.org
, Jun 8 2017