Issue metadata
Sign in to add a comment
|
devserver load following lab downtime causes test failures. |
||||||||||||||||||||||||
Issue description
,
Nov 17 2016
daisy_skate
chromeos-server22-180: 328b5a3695c9b810 3
Autotest instance: cautotest
Unhandled run_suite exception: Timeout occurred- waited 1800 seconds.
Traceback (most recent call last):
File "/usr/local/autotest/site_utils/run_suite.py", line 1787, in main
code, output_dict = main_without_exception_handling(options)
File "/usr/local/autotest/site_utils/run_suite.py", line 1601, in main_without_exception_handling
options.skip_duts_check)
File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 306, in check_dut_availability
multiple_labels=('pool:%s' % pool, 'board:%s' % board))
File "/usr/local/autotest/server/frontend.py", line 510, in get_hosts
hosts = self.run('get_hosts', **query_args)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
self, call, **dargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 114, in GenericRetry
time.sleep(sleep_time)
File "/usr/local/autotest/site-packages/chromite/lib/timeout_util.py", line 62, in kill_us
raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 1800 seconds.
Will return from run_suite with status: INFRA_FAILURE
,
Nov 17 2016
Triggered task: elm-paladin/R56-8998.0.0-rc1-bvt-inline
Waiting for results from the following shards: 0
Waiting for results from the following shards: 0
chromeos-server31-93: 328b873c311b1510 3
Autotest instance: cautotest
Unhandled run_suite exception: Timeout occurred- waited 1800 seconds.
Traceback (most recent call last):
File "/usr/local/autotest/site_utils/run_suite.py", line 1787, in main
code, output_dict = main_without_exception_handling(options)
File "/usr/local/autotest/site_utils/run_suite.py", line 1601, in main_without_exception_handling
options.skip_duts_check)
File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 306, in check_dut_availability
multiple_labels=('pool:%s' % pool, 'board:%s' % board))
File "/usr/local/autotest/server/frontend.py", line 510, in get_hosts
hosts = self.run('get_hosts', **query_args)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
self, call, **dargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 114, in GenericRetry
time.sleep(sleep_time)
File "/usr/local/autotest/site-packages/chromite/lib/timeout_util.py", line 62, in kill_us
raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 1800 seconds.
Will return from run_suite with status: INFRA_FAILURE
,
Nov 17 2016
Command /b/cbuild/internal_master/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com ... returns an error code 3, which is identified as an INFRA_FAILURE. (cbuildbot/swarming_lib.py) Since there's no logging of detailed error, I don't know what 'returncode=3' means. proxy server is down or abnormal or too busy?
,
Nov 17 2016
I think the original round of failure (at about 5:02am) caused by this run_suite timeout has passed. We see alll builders at least are able to execute the tests now. A new round of failure happens (at about 7:33am) is related to ssp in autotest (could not download tar) is logged in crbug/666372. There is speculation on dev_server overloading. Need to see if it recovers in next round.
,
Nov 17 2016
What's run_suite timeout? I thought it's 90 minutes. (--timeout_mins 90), but seems after about 30 minutes, the command returns returncode=3.
,
Nov 17 2016
afaict, the devserver load is just a fallout of the lab downtime. We don't handle it graciously, but nothing I can do right now about it. Things seem to have recovered on their own for now.
,
Nov 17 2016
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by skau@chromium.org
, Nov 17 2016