HWTest stage has incorrect timeout |
|||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/nyan_big-paladin/builds/1637 appears to be timing out after an hour or less than an hour but should be 90 minutes. From chromite/cbuildbot/config_dump.json, nyan_big-paladin specifies timeout of 5400. From stage logs https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fnyan_big-paladin%2F1637%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout ** Start Stage HWTest [bvt-inline] - Tue, 16 May 2017 08:44:58 -0700 (PDT) Timeout is requested of 90 minutes. For whatever reason timeout is picked up as 60 minutes: The suite job has another 0:59:46.577017 till timeout. Suite times out after an hour [1;31m09:46:50: ERROR: ** Suite timed out before completion **[0m ** Finished Stage HWTest [bvt-inline] - Tue, 16 May 2017 09:46:50 -0700 (PDT) (The actual timeout appears to have been caused by the provision jobs and subsequent repair jobs all failing, so might indicate an issue with the image being tested) https://viceroy.corp.google.com/chromeos/suite_details?build_id=1521950
,
May 16 2017
I can't see any evidence of a builder imposed 60 minute timeout. How come there is the message: The suite job has another 0:59:46.577017 till timeout. Is that actually at 9:35 instead of 8:50? If that's the case, and then the actual aborts just get labeled a timed out incorrectly: Reason: Some test(s) was aborted before running, suite must have timed out.
,
May 24 2017
,
Jun 13 2017
Drew, do you think you could take a look at this one?
,
Jun 29 2017
Some digging into where that specific error message gets set suggests that it's not necessarily a timeout that would cause it (http://shortn/_xAsavlFFKE): > # The case that a job was aborted before it got a chance to run > # usually indicates suite has timed out (unless aborted by user). > # In this case, the abort reason will be None. > # Update the reason with proper information. The logs for the test cases that failed show that there was an unhandled exception: http://shortn/_FxCuuX1TuR I think this happened when creating the server_job, which isn't in a try/catch block except for the outermost global exception handler. The exception itself was a persistent urllib2.URLError (perhaps some intermittent infra or networking issue?). I'm not really familiar with autotest at all, but could exiting non-cleanly in this way result in a misleading "timeout" error message?
,
Jul 5 2017
,
Jun 5 2018
,
Jul 3
The same HWTest stage timeout was observed on veyron_minnie-release build #2350 (https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release/builds/2350). [1;31m14:00:48: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.SuiteTimedOut'>: ** Suite timed out before completion **
,
Jul 6
I found below exception in lucifer job_reporter. Not sure if it caused timeout.
job_reporter: 2018-07-02 13:54:36,313:DEBUG:eventlib:run_event_command:90:Starting event command with ['/opt/infra-tools/usr/bin/lucifer_run_job', '-autotestdir', '/usr/local/autotest', '-abortsock', '/usr/local/autotest/leases/213661623.sock', '-hosts', u'chromeos4-row9-rack9-host5', '-x-level', 'STARTING', '-resultsdir', '/usr/local/autotest/results/213661623-chromeos-test/chromeos4-row9-rack9-host5', '-x-control-file', '/usr/local/autotest/results/213661623-chromeos-test/chromeos4-row9-rack9-host5/lucifer/control_attach', '-x-execution-tag', None, '-x-job-owner', u'chromeos-test', '-x-job-name', u'veyron_minnie-release/R69-10837.0.0/sanity/dummy_PassServer.sanity', '-x-reboot-after', 'never', '-x-parsing-only', '-x-run-reset', '-x-test-retries', '0', '-x-require-ssp']
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 209, in <module>
sys.exit(main(sys.argv))
File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 46, in main
ret = _main(args)
File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 93, in _main
return _run_autotest_job(args)
File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 106, in _run_autotest_job
ret = _run_lucifer_job(handler, args, job)
File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 148, in _run_lucifer_job
event_handler=event_handler, args=command_args)
File "/usr/local/autotest/venv/lucifer/eventlib.py", line 91, in run_event_command
with subprocess32.Popen(args, stdout=PIPE) as proc:
File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-a234a83456f26d726445fc6c8e6ce271/local/lib/python2.7/site-packages/subprocess32.py", line 825, in __init__
restore_signals, start_new_session)
File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-a234a83456f26d726445fc6c8e6ce271/local/lib/python2.7/site-packages/subprocess32.py", line 1396, in _execute_child
restore_signals, start_new_session, preexec_fn)
TypeError: coercing to Unicode: need string or buffer, NoneType found
Please note the value of '-x-execution-tag' is None.
,
Jul 6
,
Jul 6
The HWTest failed because provision error that DUT cannot come back after reboot. That also might be related to issue 639301 because the DUT board is also cyan. The interesting thing is the misleading message of "Suite timed out before completion". There should be no timeout happened.
,
Jul 7
Issue 860832 has been merged into this issue.
,
Jul 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/ec2422b769c5360a01157d45b4c9445dfcc7d257 commit ec2422b769c5360a01157d45b4c9445dfcc7d257 Author: Allen Li <ayatane@chromium.org> Date: Mon Jul 09 16:43:00 2018 autotest: Handle when execution tag is None I don't think this should happen normally (if execution tag is missing from the associated HQE, other things will go wrong too), but may as well handle the case. BUG= chromium:722934 TEST=None Change-Id: Ie43a8dd71c5c37f8629480d4ec51b6dd462c4705 Reviewed-on: https://chromium-review.googlesource.com/1128391 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Congbin Guo <guocb@chromium.org> [modify] https://crrev.com/ec2422b769c5360a01157d45b4c9445dfcc7d257/venv/lucifer/cmd/job_reporter.py
,
Aug 24
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by dshi@chromium.org
, May 16 2017