Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 1 user
Status: WontFix
Owner: ----
Closed: Jul 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
HWtest failure return code -9 / code 247, but all tests pass
Project Member Reported by mnissler@chromium.org, Apr 25 Back to list
Observed here:

https://uberchromegw.corp.google.com/i/chromeos/builders/auron_yuna-release/builds/1062/steps/HWTest%20%5Bbvt-inline%5D/logs/stdio

INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpKy5JY0/tmp8YsX_Q/temp_summary.json --raw-cmd --task-name auron_yuna-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:auron_yuna-release/R60-9493.0.0' '--tags=task_name:auron_yuna-release/R60-9493.0.0-bvt-inline' '--tags=board:auron_yuna' -- /usr/local/autotest/site_utils/run_suite.py --build auron_yuna-release/R60-9493.0.0 --board auron_yuna --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}" -m 113903163
01:55:34: WARNING: Exception is not retriable return code: 247; command: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpKy5JY0/tmp8YsX_Q/temp_summary.json --raw-cmd --task-name auron_yuna-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:auron_yuna-release/R60-9493.0.0' '--tags=task_name:auron_yuna-release/R60-9493.0.0-bvt-inline' '--tags=board:auron_yuna' -- /usr/local/autotest/site_utils/run_suite.py --build auron_yuna-release/R60-9493.0.0 --board auron_yuna --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}" -m 113903163
Priority was reset to 100
Triggered task: auron_yuna-release/R60-9493.0.0-bvt-inline
Waiting for results from the following shards: 0
chromeos-server22-135: 35bd0e310ac8ae10 -9

The last line is from swarming.py, indicating a return code of -7 from the single shard it launched.

This later leads to: 01:55:38: ERROR: ** HWTest failed (code 247) **

Note that the corresponding AFE job succeeds with all tests passing: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113903163

Must be some communication issue between swarming.py and AFE?



 
Components: Infra>Client>ChromeOS
Cc: xixuan@chromium.org
Owner: chingcodes@chromium.org
Assigning to infra deputy.
Found no logs from swarming proxy website:

https://chromeos-proxy.appspot.com/task?id=35bd0e310ac8ae10&refresh=10&request_detail=true&show_raw=1

and swarming logs on chromeos-server22.cbf don't show anything:

from run_isolated.log:

2244 2017-04-25 08:38:05.743 I: run_command(['/usr/bin/python', u'/usr/local/autotest/site_utils/run_suite.py', u'--build', u'auron_yuna-release/R60-9493.0.0', u'--board', u'auron_yuna', u'--suite_name', u'bvt-inline', u'--pool', u'bvt', u'--num', u'6', u'--file_bugs', u'True', u'--priority', u'Build', u'--timeout_mins', u'180', u'--retry', u'True', u'--max_retries', u'10', u'--minimum_duts', u'4', u'--suite_min_duts', u'6', u'--offload_failures_only', u'False', u'--job_keyvals', u"{'datastore_parent_key': ('Build', 1473059, 'BuildStage', 43274314L)}", u'-m', u'113903163'], /usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a)
2244 2017-04-25 08:55:29.821 I: Waiting for proces exit
2244 2017-04-25 08:55:29.886 I: Profiling: Section RunTest took 1044.143 seconds
2244 2017-04-25 08:55:29.886 I: Command finished with exit code -9 (0xfffffff7)
2244 2017-04-25 08:55:29.887 I: rmtree(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a)
2244 2017-04-25 08:55:29.889 D: make_tree_deleteable(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ir4xvr9a)
2244 2017-04-25 08:55:30.035 I: rmtree(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ite7TOcC)
2244 2017-04-25 08:55:30.035 D: make_tree_deleteable(/usr/local/google/home/chromeos-test/swarming_bots/bot_135/w/ite7TOcC)
2244 2017-04-25 08:55:30.095 I: Result:
{"duration":1044.1435949802399,"exit_code":-9,"had_hard_timeout":false,"internal_failure":null,"outputs_ref":null,"stats":{},"version":5}

So I prefer it's some communication issue between swarming & AFE. Will keep an eye on that.

We've got a number of canary failures where tests are run (and data is in TKO), but we're getting these swarming timeouts.  Sometimes the tests succeed, sometimes they fail.

Examples:
https://luci-milo.appspot.com/buildbot/chromeos/banon-release/1123
https://luci-milo.appspot.com/buildbot/chromeos/bob-release/507

From banon-release:1123
https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fbanon-release%2F1123%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout
05:57:58: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpGM7QBv/tmpGf8FXP/temp_summary.json --raw-cmd --task-name banon-release/R60-9554.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:banon-release/R60-9554.0.0' '--tags=task_name:banon-release/R60-9554.0.0-bvt-inline' '--tags=board:banon' -- /usr/local/autotest/site_utils/run_suite.py --build banon-release/R60-9554.0.0 --board banon --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1518600, 'BuildStage', 44854069L)}" -m 117644129
08:22:00: WARNING: Killing tasks: [<_BackgroundTask(_BackgroundTask-7:7:4, started)>]

Tests do run, and run past at which the tasks were killed:
https://viceroy.corp.google.com/chromeos/suite_details?build_id=1518600

The timeout was supposed to be 4 hours and this is 2.5h in so it should be that.

Owner: ----
Status: untriaged
Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.
Status: WontFix
Cannot reproduce - reopen if this happens more.
Sign in to add a comment