I'm seeing a failure mode where a build step fails quickly without any info from the AFE job in the logs, however the AFE job runs successfully (and may or may not pass tests, this seems immaterial to the failure mode).
Examples:
https://uberchromegw.corp.google.com/i/chromeos/builders/wolf-release/builds/2070/steps/HWTest%20%5Bbvt-inline%5D/logs/stdio
https://uberchromegw.corp.google.com/i/chromeos/builders/wolf-release/builds/2070/steps/PaygenTestCanary/logs/stdio
The output looks like this:
@@@BUILD_STEP@HWTest [bvt-inline]@@@
************************************************************
@@@STEP_LINK@stdout-->stdio@https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fwolf-release%2F2070%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout@@@
** Start Stage HWTest [bvt-inline] - Tue, 25 Apr 2017 01:26:31 -0700 (PDT)
**
** Stage that runs tests in the Autotest lab.
************************************************************
01:26:31: INFO: Created cidb engine bot@173.194.81.53 for pid 9331
01:26:31: INFO: Running cidb query on pid 9331, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f7962734690>
01:26:31: INFO: Waiting up to forever for payloads and test artifacts ...
Preconditions for the stage successfully met. Beginning to execute stage...
01:26:31: INFO: Running cidb query on pid 9331, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f7961cd6910>
01:26:32: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp2O5Nf9/tmpn5aWp2/temp_summary.json --raw-cmd --task-name wolf-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:wolf-release/R60-9493.0.0' '--tags=task_name:wolf-release/R60-9493.0.0-bvt-inline' '--tags=board:wolf' -- /usr/local/autotest/site_utils/run_suite.py --build wolf-release/R60-9493.0.0 --board wolf --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473025, 'BuildStage', 43274757L)}" -c
Autotest instance: cautotest
04-25-2017 [01:37:58] Submitted create_suite_job rpc
04-25-2017 [01:38:03] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113905413
@@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113905413@@@
--create_and_return was specified, terminating now.
Will return from run_suite with status: OK
01:38:12: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp2O5Nf9/tmpT4fV6D/temp_summary.json --raw-cmd --task-name wolf-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:wolf-release/R60-9493.0.0' '--tags=task_name:wolf-release/R60-9493.0.0-bvt-inline' '--tags=board:wolf' -- /usr/local/autotest/site_utils/run_suite.py --build wolf-release/R60-9493.0.0 --board wolf --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473025, 'BuildStage', 43274757L)}" -m 113905413
01:59:05: WARNING: Exception is not retriable return code: 1; command: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp2O5Nf9/tmpT4fV6D/temp_summary.json --raw-cmd --task-name wolf-release/R60-9493.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:wolf-release/R60-9493.0.0' '--tags=task_name:wolf-release/R60-9493.0.0-bvt-inline' '--tags=board:wolf' -- /usr/local/autotest/site_utils/run_suite.py --build wolf-release/R60-9493.0.0 --board wolf --suite_name bvt-inline --pool bvt --num 6 --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 10 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'datastore_parent_key': ('Build', 1473025, 'BuildStage', 43274757L)}" -m 113905413
Priority was reset to 100
Triggered task: wolf-release/R60-9493.0.0-bvt-inline
Waiting for results from the following shards: 0
N/A: 35bd1455fe8f0e10 None
cmd=['/b/cbuild/repository/chromite/third_party/swarming.client/swarming.py', 'run', '--swarming', 'chromeos-proxy.appspot.com', '--task-summary-json', '/tmp/cbuildbot-tmp2O5Nf9/tmpT4fV6D/temp_summary.json', '--raw-cmd', '--task-name', u'wolf-release/R60-9493.0.0-bvt-inline', '--dimension', 'os', 'Ubuntu-14.04', '--dimension', 'pool', 'default', '--print-status-updates', '--timeout', '14400', '--io-timeout', '14400', '--hard-timeout', '14400', '--expiration', '1200', u'--tags=priority:Build', u'--tags=suite:bvt-inline', u'--tags=build:wolf-release/R60-9493.0.0', u'--tags=task_name:wolf-release/R60-9493.0.0-bvt-inline', u'--tags=board:wolf', '--', '/usr/local/autotest/site_utils/run_suite.py', '--build', u'wolf-release/R60-9493.0.0', '--board', u'wolf', '--suite_name', u'bvt-inline', '--pool', u'bvt', '--num', '6', '--file_bugs', 'True', '--priority', u'Build', '--timeout_mins', '180', '--retry', 'True', '--max_retries', '10', '--minimum_duts', '4', '--suite_min_duts', '6', '--offload_failures_only', 'False', '--job_keyvals', "{'datastore_parent_key': ('Build', 1473025, 'BuildStage', 43274757L)}", '-m', '113905413']
01:59:05: INFO: Running cidb query on pid 9331, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f7961cce3d0>
01:59:09: INFO: cidb query succeeded after 1 retries
@@@STEP_FAILURE@@@
01:59:09: ERROR: ** HWTest failed (code 1) **
01:59:09: INFO: Running cidb query on pid 9331, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f7961cc0e90>
01:59:09: INFO: Running cidb query on pid 9331, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f7961cd69d0>
************************************************************
** Finished Stage HWTest [bvt-inline] - Tue, 25 Apr 2017 01:59:10 -0700 (PDT)
************************************************************
Note the lines in the middle:
Waiting for results from the following shards: 0
N/A: 35bd1455fe8f0e10 None
Which suggest that something is going wrong with the build step tracking the AFE job.
Comment 1 by mnissler@chromium.org
, Apr 25 2017