New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 722934 link

Starred by 2 users

Issue metadata

Status: Duplicate
Owner:
OOO
Closed: Aug 24
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

HWTest stage has incorrect timeout

Project Member Reported by davidri...@chromium.org, May 16 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/nyan_big-paladin/builds/1637 appears to be timing out after an hour or less than an hour but should be 90 minutes.

From chromite/cbuildbot/config_dump.json, nyan_big-paladin specifies timeout of 5400.

From stage logs https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fnyan_big-paladin%2F1637%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout
** Start Stage HWTest [bvt-inline] - Tue, 16 May 2017 08:44:58 -0700 (PDT)
Timeout is requested of 90 minutes.

For whatever reason timeout is picked up as 60 minutes:
  The suite job has another 0:59:46.577017 till timeout.

Suite times out after an hour
09:46:50: ERROR: ** Suite timed out before completion **
** Finished Stage HWTest [bvt-inline] - Tue, 16 May 2017 09:46:50 -0700 (PDT)

(The actual timeout appears to have been caused by the provision jobs and subsequent repair jobs all failing, so might indicate an issue with the image being tested)
https://viceroy.corp.google.com/chromeos/suite_details?build_id=1521950
 

Comment 1 by dshi@chromium.org, May 16 2017

Owner: chingcodes@chromium.org
The suite job does have time 90min timeout:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=117872072

The 60min timeout is likely enforced by builder.
I can't see any evidence of a builder imposed 60 minute timeout.

How come there is the message:
  The suite job has another 0:59:46.577017 till timeout.
Is that actually at 9:35 instead of 8:50?  If that's the case, and then the actual aborts just get labeled a timed out incorrectly:
Reason: Some test(s) was aborted before running, suite must have timed out.


Labels: Hotlist-Fixit
Owner: sjg@chromium.org

Comment 4 by sjg@chromium.org, Jun 13 2017

Owner: ddavenp...@chromium.org
Drew, do you think you could take a look at this one?
Some digging into where that specific error message gets set suggests that it's not necessarily a timeout that would cause it (http://shortn/_xAsavlFFKE):
> # The case that a job was aborted before it got a chance to run
> # usually indicates suite has timed out (unless aborted by user).
> # In this case, the abort reason will be None.
> # Update the reason with proper information.

The logs for the test cases that failed show that there was an unhandled exception: http://shortn/_FxCuuX1TuR

I think this happened when creating the server_job, which isn't in a try/catch block except for the outermost global exception handler.

The exception itself was a persistent urllib2.URLError (perhaps some intermittent infra or networking issue?).

I'm not really familiar with autotest at all, but could exiting non-cleanly in this way result in a misleading "timeout" error message?

Comment 6 by sjg@google.com, Jul 5 2017

Labels: Team-BLD
Owner: ----
Labels: -Pri-3 Pri-2
Owner: gu...@chromium.org
Status: Assigned (was: Untriaged)
The same HWTest stage timeout was observed on veyron_minnie-release build #2350 (https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release/builds/2350).

14:00:48: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.SuiteTimedOut'>: ** Suite timed out before completion **
Owner: ayatane@chromium.org
I found below exception in lucifer job_reporter. Not sure if it caused timeout.

job_reporter: 2018-07-02 13:54:36,313:DEBUG:eventlib:run_event_command:90:Starting event command with ['/opt/infra-tools/usr/bin/lucifer_run_job', '-autotestdir', '/usr/local/autotest', '-abortsock', '/usr/local/autotest/leases/213661623.sock', '-hosts', u'chromeos4-row9-rack9-host5', '-x-level', 'STARTING', '-resultsdir', '/usr/local/autotest/results/213661623-chromeos-test/chromeos4-row9-rack9-host5', '-x-control-file', '/usr/local/autotest/results/213661623-chromeos-test/chromeos4-row9-rack9-host5/lucifer/control_attach', '-x-execution-tag', None, '-x-job-owner', u'chromeos-test', '-x-job-name', u'veyron_minnie-release/R69-10837.0.0/sanity/dummy_PassServer.sanity', '-x-reboot-after', 'never', '-x-parsing-only', '-x-run-reset', '-x-test-retries', '0', '-x-require-ssp']
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 209, in <module>
    sys.exit(main(sys.argv))
  File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 46, in main
    ret = _main(args)
  File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 93, in _main
    return _run_autotest_job(args)
  File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 106, in _run_autotest_job
    ret = _run_lucifer_job(handler, args, job)
  File "/usr/local/autotest/venv/lucifer/cmd/job_reporter.py", line 148, in _run_lucifer_job
    event_handler=event_handler, args=command_args)
  File "/usr/local/autotest/venv/lucifer/eventlib.py", line 91, in run_event_command
    with subprocess32.Popen(args, stdout=PIPE) as proc:
  File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-a234a83456f26d726445fc6c8e6ce271/local/lib/python2.7/site-packages/subprocess32.py", line 825, in __init__
    restore_signals, start_new_session)
  File "/usr/local/google/home/chromeos-test/.cache/cros_venv/venv-2.7.6-a234a83456f26d726445fc6c8e6ce271/local/lib/python2.7/site-packages/subprocess32.py", line 1396, in _execute_child
    restore_signals, start_new_session, preexec_fn)
TypeError: coercing to Unicode: need string or buffer, NoneType found

Please note the value of '-x-execution-tag' is None.
Owner: gu...@chromium.org
The HWTest failed because provision error that DUT cannot come back after reboot. That also might be related to issue 639301 because the DUT board is also cyan.

The interesting thing is the misleading message of "Suite timed out before completion". There should be no timeout happened.

 Issue 860832  has been merged into this issue.
Project Member

Comment 13 by bugdroid1@chromium.org, Jul 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/ec2422b769c5360a01157d45b4c9445dfcc7d257

commit ec2422b769c5360a01157d45b4c9445dfcc7d257
Author: Allen Li <ayatane@chromium.org>
Date: Mon Jul 09 16:43:00 2018

autotest: Handle when execution tag is None

I don't think this should happen normally (if execution tag is missing
from the associated HQE, other things will go wrong too), but may as
well handle the case.

BUG= chromium:722934 
TEST=None

Change-Id: Ie43a8dd71c5c37f8629480d4ec51b6dd462c4705
Reviewed-on: https://chromium-review.googlesource.com/1128391
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Congbin Guo <guocb@chromium.org>

[modify] https://crrev.com/ec2422b769c5360a01157d45b4c9445dfcc7d257/venv/lucifer/cmd/job_reporter.py

Mergedinto: 860842
Status: Duplicate (was: Assigned)

Sign in to add a comment