New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 792258 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

wait for DUTs coming back from reverify before kicking off testing suite

Project Member Reported by dgarr...@chromium.org, Dec 5 2017

Issue description

This looks like a bug in TOT.


test output contains:

INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal
Reverifying all DUTs.
NOTICE:root:ts_mon was set up.
Autotest instance created: localhost
Autotest instance created: localhost
TestLabException: Not enough DUTs for board: gandof, pool: bvt; required: 4, found: 0
Traceback (most recent call last):
  File "/usr/local/autotest/site_utils/run_suite.py", line 2020, in _run_task
    return _run_suite(options)
  File "/usr/local/autotest/site_utils/run_suite.py", line 1761, in _run_suite
    options.skip_duts_check)
  File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 330, in check_dut_availability
    hosts=hosts)
NotEnoughDutsError: Not enough DUTs for board: gandof, pool: bvt; required: 4, found: 0
Will return from run_suite with status: INFRA_FAILURE
12-05-2017 [15:35:53] Submitted create_suite_job rpc



Original stack trace of the exception:
[('./site_utils/test_push.py', 486, 'test_suite_wrapper', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 386, 'test_suite', 'create_and_return, testbed_test)'), ('./site_utils/test_push.py', 333, 'do_run_suite', "raise TestPushException('Failed to retrieve suite job ID.')")]
Test for pushing to prod failed:

Failed to retrieve suite job ID.
INFO:googleapiclient.discovery:URL being requested: POST https://www.googleapis.com/gmail/v1/users/me/messages/send?alt=json
DEBUG:root:Email sent: 160290937ca07c76
Reverifying all DUTs.
INFO:oauth2client.client:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Waiting for ts_mon flushing process to finish...
Error Message: StageControlFileFailure: Failed to stage git_oc-release/angler-userdebug/3771772#2 on 100.115.245.253: staging artifacts=test_suites files=  for git_oc-release/angler-userdebug/3771772#2 failed;HTTP OK not accompanied by 'Success'.
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 1148, in replacement
    return func(**kwargs)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1886, in create_suite_job
    test_source_build, hostname=sample_dut)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1764, in _stage_build_artifacts
    "Failed to stage %s on %s: %s" % (build, ds_name, e))
StageControlFileFailure: Failed to stage git_oc-release/angler-userdebug/3771772#2 on 100.115.245.253: staging artifacts=test_suites files=  for git_oc-release/angler-userdebug/3771772#2 failed;HTTP OK not accompanied by 'Success'.
Traceback (most recent call last):
  File "/usr/local/autotest/site_utils/run_suite.py", line 1762, in _run_suite
    job_id = create_suite(afe, options)
  File "/usr/local/autotest/client/common_lib/cros/retry.py", line 218, in func_retry
    remaining_time)
  File "/usr/local/autotest/client/common_lib/cros/retry.py", line 123, in timeout
    default_result = func(*args, **kwargs)
  File "/usr/local/autotest/site_utils/run_suite.py", line 1704, in create_suite
    child_dependencies=_make_child_deps_from_options(options),
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry
    return _run()
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 177, in _Wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 243, in _run
    return functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 108, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 126, in __call__
    raise BuildException(resp['error'])
JSONRPCException: StageControlFileFailure: Failed to stage git_oc-release/angler-userdebug/3771772#2 on 100.115.245.253: staging artifacts=test_suites files=  for git_oc-release/angler-userdebug/3771772#2 failed;HTTP OK not accompanied by 'Success'.
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 1148, in replacement
    return func(**kwargs)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1886, in create_suite_job
    test_source_build, hostname=sample_dut)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1764, in _stage_build_artifacts
    "Failed to stage %s on %s: %s" % (build, ds_name, e))
StageControlFileFailure: Failed to stage git_oc-release/angler-userdebug/3771772#2 on 100.115.245.253: staging artifacts=test_suites files=  for git_oc-release/angler-userdebug/3771772#2 failed;HTTP OK not accompanied by 'Success'.

Will return from run_suite with status: INFRA_FAILURE
INFO:root:Finished waiting for ts_mon process.
Traceback (most recent call last):
  File "./site_utils/test_push.py", line 674, in <module>
    sys.exit(main())
  File "./site_utils/test_push.py", line 671, in main
    return _main(arguments)
  File "./site_utils/test_push.py", line 613, in _main
    check_queue(queue)
  File "./site_utils/test_push.py", line 504, in check_queue
    raise exc_info[0](exc_info[1])
__main__.TestPushException: Failed to retrieve suite job ID.




chromeos-staging-master2

/var/log/apache2/error.log

[Tue Dec 05 15:36:02.936082 2017] [:error] [pid 6806:tid 139972810274560] INFO:root:Staging artifacts on devserver http://100.115.245.253:8082: build=git_oc-release/angler-userdebug/3771772#2, artifacts=['test_suites'], files=, archive_url=gs://chromeos-image-archive/git_oc-release/angler-userdebug/3771772#2
[Tue Dec 05 15:36:02.936557 2017] [:error] [pid 6806:tid 139972810274560] DEBUG:root:Running 'ssh 100.115.245.253 'curl "http://100.115.245.253:8082/stage?build_id=3771772#2&files=&target=angler-userdebug&archive_url=gs://chromeos-image-archive/git_oc-release/angler-userdebug/3771772#2&artifacts=test_suites&branch=git_oc-release&async=True&os_type=android"''
[Tue Dec 05 15:36:12.172038 2017] [:error] [pid 6806:tid 139972810274560] DEBUG:root:response for RPC: '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\\n"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\\n<html>\\n<head>\\n    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>\\n    <title>500 Internal Server Error</title>\\n    <style type="text/css">\\n    #powered_by {\\n        margin-top: 20px;\\n        border-top: 2px solid black;\\n        font-style: italic;\\n    }\\n\\n    #traceback {\\n        color: red;\\n    }\\n    </style>\\n</head>\\n    <body>\\n        <h2>500 Internal Server Error</h2>\\n        <p>The server encountered an unexpected condition which prevented it from fulfilling the request.</p>\\n        <pre id="traceback">Traceback (most recent call last):\\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond\\n    response.body = self.handler()\\n  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__\\n    self.body = self.oldhandler(*args, **kwargs)\\n  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__\\n    return self.callable(*self.args, **self.kwargs)\\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 874, in stage\\n    dl, factory = _get_downloader_and_factory(kwargs)\\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 306, in _get_downloader_and_factory\\n    artifacts, files = _get_artifacts(kwargs)\\n  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 236, in _get_artifacts\\n    raise DevServerError(\\'No artifacts specified.\\')\\nDevServerError: No artifacts specified.\\n</pre>\\n    <div id="powered_by">\\n    <span>Powered by <a href="http://www.cherrypy.org">CherryPy 3.2.2</a></span>\\n    </div>\\n    </body>\\n</html>\\n'
chromeos2-devservertest:


chromeos2-devservertest

/var/log/devserver/server.log:
::ffff:127.0.0.1 - - [05/Dec/2017:15:36:02] "GET /check_health HTTP/1.1" 200 458 "" "curl/7.35.0"
[05/Dec/2017:15:36:12] HTTP Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond
    response.body = self.handler()
  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 874, in stage
    dl, factory = _get_downloader_and_factory(kwargs)
  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 306, in _get_downloader_and_factory
    artifacts, files = _get_artifacts(kwargs)
  File "/home/chromeos-test/chromiumos/src/platform/dev/devserver.py", line 236, in _get_artifacts
    raise DevServerError('No artifacts specified.')
DevServerError: No artifacts specified.

 
Cc: davidri...@chromium.org xixuan@chromium.org shuqianz@chromium.org
Owner: xixuan@chromium.org
Summary: Lab Test Push Failing - Invalid artifacts specified? (was: Lab Test Push Failing)
My quick provision changes should not be live/enabled.
Cc: -davidri...@chromium.org dgarr...@chromium.org ayatane@chromium.org
Labels: -Pri-3 Pri-1
Owner: shuqianz@chromium.org
Status: Assigned (was: Untriaged)
Summary: gandof tests are not kicked off (was: Lab Test Push Failing - Invalid artifacts specified?)
After some debugging, I believe CL https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/792290 makes this (just) happen. But it shouldn't get blamed. The test_push logic here has a bug.

Before the CL: test_push: 1) reverify all DUTs 2) check DUT availability 3) kick off run_suite.py
After the CL: test_push: 2) reverify all DUTs, 2) kick off run_suite.py

Luckily before this CL, when 3 runs, 1 already finishes. So DUT is still available.
Currently it always fail to schedule gandof tests (from 12.1) due to:

TestLabException: Not enough DUTs for board: gandof, pool: bvt; required: 4, found: 0
Traceback (most recent call last):
  File "/usr/local/autotest/site_utils/run_suite.py", line 2020, in _run_task
    return _run_suite(options)
  File "/usr/local/autotest/site_utils/run_suite.py", line 1761, in _run_suite
    options.skip_duts_check)
  File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 334, in check_dut_availability
    hosts=hosts)
NotEnoughDutsError: Not enough DUTs for board: gandof, pool: bvt; required: 4, found: 0
Will return from run_suite with status: INFRA_FAILURE

I will re-assign this to @shuqianz + cc @ayatane.


For the devserver artifact issue, it doesn't happen when I re-run test_push. So hold it for now.
Summary: gandof tests are not kicked off in staging lab (was: gandof tests are not kicked off)
Summary: wait for DUTs coming back from reverify before kicking off testing suite (was: gandof tests are not kicked off in staging lab)
Project Member

Comment 6 by bugdroid1@chromium.org, Dec 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f239b313fe3d05955acb51f619cc38197786cacc

commit f239b313fe3d05955acb51f619cc38197786cacc
Author: Shuqian Zhao <shuqianz@chromium.org>
Date: Wed Dec 06 01:12:17 2017

autotest: fix test_push no available duts

Before we kick off testing suite, we reverify the DUTs, but the DUTs
need some time to come back. Add the code which re-checking the DUTs
status until they are back to testing push.

BUG= chromium:792258 
TEST=unittest

Change-Id: Ia24ce3a16dab400aff3459372bcd96ed00643919
Reviewed-on: https://chromium-review.googlesource.com/809352
Tested-by: Shuqian Zhao <shuqianz@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Trybot-Ready: Don Garrett <dgarrett@chromium.org>
Commit-Queue: Shuqian Zhao <shuqianz@chromium.org>

[modify] https://crrev.com/f239b313fe3d05955acb51f619cc38197786cacc/site_utils/test_push.py

Status: Fixed (was: Assigned)

Sign in to add a comment