paladin builder took 20 minutes to try (and fail) at aborting previous hwtest suites |
||||
Issue descriptionBuild: https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin/builds/2501 Stage: https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin/builds/2501/steps/BuildReexecutionFinished/logs/stdio 17:18:40: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp3nKLWj/tmpSKmD3x/temp_summary.json --raw-cmd --task-name abort-veyron_minnie-paladin/R60-9536.0.0-rc1-arc-bvt-cq --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --expiration 1200 -- /usr/local/autotest/site_utils/abort_suite.py -i veyron_minnie-paladin/R60-9536.0.0-rc1 -s arc-bvt-cq Priority was reset to 100 Triggered task: abort-veyron_minnie-paladin/R60-9536.0.0-rc1-arc-bvt-cq 17:18:57: WARNING: HttpsMonitor.send received status 400: { "error": { "code": 400, "message": "Operation was attempted past the valid range.", "status": "OUT_OF_RANGE" } } Waiting for results from the following shards: 0 chromeos-server22-208: 36088a6273decc10 0 2017-05-09 17:34:25,967 INFO | No suites have been aborted. The suite jobs may have already been aborted/completed? Note this script does not support asynchronus suites. 17:34:40: INFO: RunCommand: /b/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp3nKLWj/tmpCfJ5uN/temp_summary.json --raw-cmd --task-name abort-veyron_minnie-paladin/R60-9535.0.0-rc2-arc-bvt-cq --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --expiration 1200 -- /usr/local/autotest/site_utils/abort_suite.py -i veyron_minnie-paladin/R60-9535.0.0-rc2 -s arc-bvt-cq Priority was reset to 100
,
May 10 2017
Metrics indicate this has been happening intermittently in the tail of builders, http://shortn/_dOOmQYiYrE The "normal" case is in the 15s to 1 minute range. (even a minute is kind of slow though). We should perform this abort operation in a separate background stage to eliminate this problem entirely.
,
May 10 2017
+ people who care about speed
,
May 10 2017
We can make it a separate stage and the HWTest stage should only start after the abort stage is finished. Also the abort operation is always executed on the past 2 builds. If the past builds succeeded, can we just skip sending the abortion requests? http://shortn/_RHcbXIBaS9
,
May 10 2017
making a separate background stage and only doing it if prev slave didn't succeed both sgtm
,
May 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/5c8e5fd6a2904b27445b63cd521bbeb2e6b7dad8 commit 5c8e5fd6a2904b27445b63cd521bbeb2e6b7dad8 Author: Ningning Xia <nxia@chromium.org> Date: Fri May 19 20:57:54 2017 Do not abort HWTests for passed builds. If the old builds passed successfully, do not trigger AbortHWTests on the old builds in _AbortPreviousHWTestSuites. BUG= chromium:720212 TEST=run_tests Change-Id: Ice346a1b13db0081810b9f787de143c92d7c244b Reviewed-on: https://chromium-review.googlesource.com/505277 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/5c8e5fd6a2904b27445b63cd521bbeb2e6b7dad8/cbuildbot/stages/report_stages.py
,
Mar 30 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by akes...@chromium.org
, May 10 2017