swarming failure, HWTest results missing |
|||||||||
Issue descriptionbuild: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15209 Note that all of the suites passed successfully, but results were not retrieved. Looks like swarming failure? Relevant HWTest logs: 04:14:47: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpDQ_U8z/tmpTFmydB/temp_summary.json --raw-cmd --task-name peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:bvt-inline' '--tags=build:peach_pit-paladin/R61-9696.0.0-rc1' '--tags=task_name:peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline' '--tags=board:peach_pit' -- /usr/local/autotest/site_utils/run_suite.py --build peach_pit-paladin/R61-9696.0.0-rc1 --board peach_pit --suite_name bvt-inline --pool cq --num 6 --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --offload_failures_only True --job_keyvals "{'cidb_build_stage_id': 48609511L, 'cidb_build_id': 1629418, 'datastore_parent_key': ('Build', 1629418, 'BuildStage', 48609511L)}" -c DEBUG:root:Bug filing disabled. No module named tlslite.utils Autotest instance: cautotest 06-29-2017 [04:25:19] Submitted create_suite_job rpc 06-29-2017 [04:25:31] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=125735676 @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=125735676@@@ --create_and_return was specified, terminating now. Will return from run_suite with status: OK 04:25:44: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpDQ_U8z/tmpEy7Bb4/temp_summary.json --raw-cmd --task-name peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:bvt-inline' '--tags=build:peach_pit-paladin/R61-9696.0.0-rc1' '--tags=task_name:peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline' '--tags=board:peach_pit' -- /usr/local/autotest/site_utils/run_suite.py --build peach_pit-paladin/R61-9696.0.0-rc1 --board peach_pit --suite_name bvt-inline --pool cq --num 6 --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --offload_failures_only True --job_keyvals "{'cidb_build_stage_id': 48609511L, 'cidb_build_id': 1629418, 'datastore_parent_key': ('Build', 1629418, 'BuildStage', 48609511L)}" -m 125735676 04:35:53: INFO: Refreshing due to a 401 (attempt 1/2) 04:35:53: INFO: Refreshing access_token 04:46:04: WARNING: Exception is not retriable return code: 1; command: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpDQ_U8z/tmpEy7Bb4/temp_summary.json --raw-cmd --task-name peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:bvt-inline' '--tags=build:peach_pit-paladin/R61-9696.0.0-rc1' '--tags=task_name:peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline' '--tags=board:peach_pit' -- /usr/local/autotest/site_utils/run_suite.py --build peach_pit-paladin/R61-9696.0.0-rc1 --board peach_pit --suite_name bvt-inline --pool cq --num 6 --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --offload_failures_only True --job_keyvals "{'cidb_build_stage_id': 48609511L, 'cidb_build_id': 1629418, 'datastore_parent_key': ('Build', 1629418, 'BuildStage', 48609511L)}" -m 125735676 Priority was reset to 100 Triggered task: peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline Waiting for results from the following shards: 0 N/A: 370c6b0fc966eb10 None cmd=['/b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py', 'run', '--swarming', 'chromeos-proxy.appspot.com', '--task-summary-json', '/tmp/cbuildbot-tmpDQ_U8z/tmpEy7Bb4/temp_summary.json', '--raw-cmd', '--task-name', u'peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline', '--dimension', 'os', 'Ubuntu-14.04', '--dimension', 'pool', 'default', '--print-status-updates', '--timeout', '9000', '--io-timeout', '9000', '--hard-timeout', '9000', '--expiration', '1200', u'--tags=priority:CQ', u'--tags=suite:bvt-inline', u'--tags=build:peach_pit-paladin/R61-9696.0.0-rc1', u'--tags=task_name:peach_pit-paladin/R61-9696.0.0-rc1-bvt-inline', u'--tags=board:peach_pit', '--', '/usr/local/autotest/site_utils/run_suite.py', '--build', u'peach_pit-paladin/R61-9696.0.0-rc1', '--board', u'peach_pit', '--suite_name', u'bvt-inline', '--pool', u'cq', '--num', '6', '--file_bugs', 'False', '--priority', u'CQ', '--timeout_mins', '90', '--retry', 'True', '--max_retries', '5', '--minimum_duts', '4', '--offload_failures_only', 'True', '--job_keyvals', "{'cidb_build_stage_id': 48609511L, 'cidb_build_id': 1629418, 'datastore_parent_key': ('Build', 1629418, 'BuildStage', 48609511L)}", '-m', '125735676'] 04:46:04: INFO: No json dump found, no HWTest results to report 04:46:04: INFO: Running cidb query on pid 16060, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f869710a750> @@@STEP_FAILURE@@@ 04:46:04: ERROR: ** HWTest failed (code 1) ** 04:46:04: INFO: Translating result ** HWTest failed (code 1) ** to fail.
,
Jun 29 2017
Maybe it's caused by a large peak of request: https://pantheon.corp.google.com/appengine?project=chromeos-proxy&organizationId=433637338589&duration=P30D&graph=AE_REQUESTS_BY_TYPE There's a clear peak for number of requests when this expiration happens, which may indicate that we should add more swarming bots.
,
Jun 30 2017
Given that this caused failures, someone should definitively answer the question of whether we need more swarming bots.
,
Jun 30 2017
,
Jun 30 2017
P1 - This appears to be causing regular canary failures on at least one board.
,
Jun 30 2017
Alas, I know very little about swarming, other than the fact that they cause HWTest failures with a given signature.
,
Jul 5 2017
xixuan is best suited for this and she's the current deputy.
,
Jul 5 2017
https://viceroy.corp.google.com/chromeos/swarming_proxy is experiencing sth bad. Current swarming server (chromeos-server22.cbf) isn't in good health condition: CPL | avg1 157.57 | avg5 170.76 | | avg15 176.57 | | csw 262645 | intr 218541 | | | numcpu 6 | MEM | tot 31.4G | free 205.3M | cache 76.8M | dirty 0.3M | buff 11.4M | slab 150.5M | | | | | SWP | tot 3.8G | free 0.0M | | | | | | | vmcom 36.8G | vmlim 19.5G | PAG | scan 77059 | | stall 0 | | | | | swin 0 | | swout 0 | DSK | xvda | busy 97% | read 11257 | write 740 | KiB/r 6 | KiB/w 32 | MBr/s 7.12 | MBw/s 2.34 | avq 13.49 | avio 0.75 ms |
,
Jul 5 2017
,
Jul 5 2017
current outage https://viceroy.corp.google.com/chromeos/swarming_proxy
,
Jul 5 2017
,
Jul 5 2017
TO-DO list: 1. Will migrate an existing swarming proxy server (server31) to current pool for basic use. (doing) 2. Provision more servers ready for swarming proxy. 3. Figure out a way to monitor each bot's health individually.
,
Jul 5 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/1859d7c7932bbee64ef2724543e0e8160b5f11c6 commit 1859d7c7932bbee64ef2724543e0e8160b5f11c6 Author: xixuan <xixuan@chromium.org> Date: Wed Jul 05 21:29:56 2017
,
Jul 6 2017
,
Jul 6 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/d52600fcfbf703a26b6100b24744ef617d7da120 commit d52600fcfbf703a26b6100b24744ef617d7da120 Author: xixuan <xixuan@chromium.org> Date: Thu Jul 06 22:15:27 2017
,
Jul 7 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/524fae9dca6d122fdeea9d31301a9975b1102a2a commit 524fae9dca6d122fdeea9d31301a9975b1102a2a Author: xixuan <xixuan@chromium.org> Date: Fri Jul 07 00:03:59 2017
,
Jul 7 2017
Update: 1 & 2 are finished. server31 is moved to prod, and server51 is provisioned for suite-scheduler. For 3, we have a new metrics in https://viceroy.corp.google.com/chrome_infra/Jobs/pools?duration=1d&pool=cores%5C%3A6%5C%7Ccpu%5C%3Ax86%5C%7Ccpu%5C%3Ax86%5C-64%5C%7Cgpu%5C%3Anone%5C%7Cmachine%5C_type%5C%3An1%5C-highmem%5C-4%5C%7Cos%5C%3ALinux%5C%7Cos%5C%3AUbuntu%5C%7Cos%5C%3AUbuntu%5C-14%5C.04%5C%7Cpool%5C%3Adefault&refresh=-1&service_name=chromeos-proxy, soon it will display on https://viceroy.corp.google.com/chromeos/swarming_proxy. So mark this as fixed.
,
Jul 10 2017
Somes HWTests are still failing in swarming. Are they related to this bug?
,
Jul 10 2017
,
Jul 10 2017
swarming is working after checking. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by xixuan@chromium.org
, Jun 29 2017