HWtest failure return code -9 / code 247, but all tests pass |
|||||
Issue descriptionObserved here: https://uberchromegw.corp.google.com/i/chromeos/builders/auron_yuna-release/builds/1062/steps/HWTest%20%5Bbvt-inline%5D/logs/stdio INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpKy5JY0/tmp8YsX_Q/temp_summary.json --raw-cmd --task-name auron_yuna-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:auron_yuna-release/R60-9493.0.0' '--tags=task_name:auron_yuna-release/R60-9493.0.0-bvt-inline' '--tags=board:auron_yuna' -- /usr/local/autotest/site_utils/run_suite.py --build auron_yuna-release/R60-9493.0.0 --board auron_yuna --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}" -m 113903163 01:55:34: WARNING: Exception is not retriable return code: 247; command: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpKy5JY0/tmp8YsX_Q/temp_summary.json --raw-cmd --task-name auron_yuna-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:auron_yuna-release/R60-9493.0.0' '--tags=task_name:auron_yuna-release/R60-9493.0.0-bvt-inline' '--tags=board:auron_yuna' -- /usr/local/autotest/site_utils/run_suite.py --build auron_yuna-release/R60-9493.0.0 --board auron_yuna --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}" -m 113903163 Priority was reset to 100 Triggered task: auron_yuna-release/R60-9493.0.0-bvt-inline Waiting for results from the following shards: 0 chromeos-server22-135: 35bd0e310ac8ae10 -9 The last line is from swarming.py, indicating a return code of -7 from the single shard it launched. This later leads to: 01:55:38: ERROR: ** HWTest failed (code 247) ** Note that the corresponding AFE job succeeds with all tests passing: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113903163 Must be some communication issue between swarming.py and AFE?
,
Apr 25 2017
Assigning to infra deputy.
,
Apr 25 2017
Found no logs from swarming proxy website: https://chromeos-proxy.appspot.com/task?id=35bd0e310ac8ae10&refresh=10&request_detail=true&show_raw=1 and swarming logs on chromeos-server22.cbf don't show anything: from run_isolated.log: 2244 2017-04-25 08:38:05.743 I: run_command(['/usr/bin/python', u'/usr/local/autotest/site_utils/run_suite.py', u'--build', u'auron_yuna-release/R60-9493.0.0', u'--board', u'auron_yuna', u'--suite_name', u'bvt-inline', u'--pool', u'bvt', u'--num', u'6', u'--file_bugs', u'True', u'--priority', u'Build', u'--timeout_mins', u'180', u'--retry', u'True', u'--max_retries', u'10', u'--minimum_duts', u'4', u'--suite_min_duts', u'6', u'--offload_failures_only', u'False', u'--job_keyvals', u"{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}", u'-m', u'113903163'], /usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a) 2244 2017-04-25 08:55:29.821 I: Waiting for proces exit 2244 2017-04-25 08:55:29.886 I: Profiling: Section RunTest took 1044.143 seconds 2244 2017-04-25 08:55:29.886 I: Command finished with exit code -9 (0xfffffff7) 2244 2017-04-25 08:55:29.887 I: rmtree(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a) 2244 2017-04-25 08:55:29.889 D: make_tree_deleteable(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a) 2244 2017-04-25 08:55:30.035 I: rmtree(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ite7TOcC) 2244 2017-04-25 08:55:30.035 D: make_tree_deleteable(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ite7TOcC) 2244 2017-04-25 08:55:30.095 I: Result: {"duration":1044.1435949802399,"exit_code":-9,"had_hard_timeout":false,"internal_failure":null,"outputs_ref":null,"stats":{},"version":5} So I prefer it's some communication issue between swarming & AFE. Will keep an eye on that.
,
May 15 2017
We've got a number of canary failures where tests are run (and data is in TKO), but we're getting these swarming timeouts. Sometimes the tests succeed, sometimes they fail. Examples: https://luci-milo.appspot.com/buildbot/chromeos/banon-release/1123 https://luci-milo.appspot.com/buildbot/chromeos/bob-release/507 From banon-release:1123 https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fbanon-release%2F1123%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout 05:57:58: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpGM7QBv/tmpGf8FXP/temp_summary.json --raw-cmd --task-name banon-release/R60-9554.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:banon-release/R60-9554.0.0' '--tags=task_name:banon-release/R60-9554.0.0-bvt-inline' '--tags=board:banon' -- /usr/local/autotest/site_utils/run_suite.py --build banon-release/R60-9554.0.0 --board banon --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1518600, 'BuildStage', 44854069L)}" -m 117644129 [1;33m08:22:00: WARNING: Killing tasks: [<_BackgroundTask(_BackgroundTask-7:7:4, started)>][0m Tests do run, and run past at which the tasks were killed: https://viceroy.corp.google.com/chromeos/suite_details?build_id=1518600 The timeout was supposed to be 4 hours and this is 2.5h in so it should be that.
,
Jun 23 2017
,
Jun 23 2017
,
Jul 17 2017
ChromeOS Infra P1 Bugscrub. P1 Bugs in this component should be important enough to get weekly status updates. Is this already fixed? -> Fixed Is this no longer relevant? -> Archived or WontFix Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority. Is this a Feature Request rather than a bug? Type -> Feature Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign. Does this bug have the wrong owner? -> reassign. Bugs that remain in this state next week will be downgraded to P2.
,
Jul 24 2017
Cannot reproduce - reopen if this happens more. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mnissler@chromium.org
, Apr 25 2017