Multiple failures for veyron_jerry-release due to denial of access token refresh |
|||||
Issue descriptionThe 2 most recent veyron_jerry-release attempts failed with similar issues with what appears to be trouble refreshing the access token. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941274790613553424 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941244364493115600 From logs: [1;33m17:27:43: WARNING: HttpsMonitor.send received status 429: { "error": { "code": 429, "message": "Insufficient tokens for quota 'WriteGroup' and limit 'CLIENT_PROJECT-100s' of service 'prodxmon-pa.googleapis.com' for consumer 'project_number:102025095358'.", "status": "RESOURCE_EXHAUSTED", "details": [ { "@type": "type.googleapis.com/google.rpc.Help", "links": [ { "description": "Google developer console API key", "url": "https://console.developers.google.com/project/102025095358/apiui/credential" } ] } ] } } [0m [1;33m17:27:51: WARNING: Exception is not retriable return code: 3; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /b/swarming/w/ir/tmp/t/cbuildbot-tmp_g6Tfr/tmpUlo67i/temp_summary.json --print-status-updates --timeout 14400 --raw-cmd --task-name veyron_jerry-release/R69-10867.0.0-sanity --dimension os Ubuntu-14.04 --dimension pool default --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:sanity' '--tags=build:veyron_jerry-release/R69-10867.0.0' '--tags=task_name:veyron_jerry-release/R69-10867.0.0-sanity' '--tags=board:veyron_jerry' -- /usr/local/autotest/site_utils/run_suite.py --build veyron_jerry-release/R69-10867.0.0 --board veyron_jerry --suite_name sanity --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 5 --minimum_duts 1 --suite_min_duts 1 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 85406435L, 'cidb_build_id': 2740142, 'datastore_parent_key': ('Build', 2740142, 'BuildStage', 85406435L)}" -m 216195220 Triggered task: veyron_jerry-release/R69-10867.0.0-sanity --- and --- 23:21:38: INFO: Refreshing due to a 401 (attempt 1/2) 23:21:38: INFO: Refreshing access_token 23:31:38: INFO: Refreshing due to a 401 (attempt 1/2) 23:31:38: INFO: Refreshing access_token 23:39:33: INFO: Refreshing due to a 401 (attempt 1/2) 23:39:33: INFO: Refreshing access_token 00:21:43: INFO: Refreshing due to a 401 (attempt 1/2) 00:21:43: INFO: Refreshing access_token 00:31:42: INFO: Refreshing due to a 401 (attempt 1/2) 00:31:42: INFO: Refreshing access_token 00:34:47: INFO: Re-run swarming_cmd to avoid buildbot salency check. 00:34:47: INFO: RunCommand: /b/swarming/w/ir/cache/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /b/swarming/w/ir/tmp/t/cbuildbot-tmpn2j4aq/tmp1wbP0U/temp_summary.json --print-status-updates --timeout 14400 --raw-cmd --task-name veyron_jerry-release/R69-10868.0.0-sanity --dimension os Ubuntu-14.04 --dimension pool default --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:sanity' '--tags=build:veyron_jerry-release/R69-10868.0.0' '--tags=task_name:veyron_jerry-release/R69-10868.0.0-sanity' '--tags=board:veyron_jerry' -- /usr/local/autotest/site_utils/run_suite.py --build veyron_jerry-release/R69-10868.0.0 --board veyron_jerry --suite_name sanity --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 5 --minimum_duts 1 --suite_min_duts 1 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 85446746L, 'cidb_build_id': 2741629, 'datastore_parent_key': ('Build', 2741629, 'BuildStage', 85446746L)}" -m 216271491 [1;33m00:35:36: WARNING: Exception is not retriable return code: 3; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /b/swarming/w/ir/tmp/t/cbuildbot-tmpn2j4aq/tmp1wbP0U/temp_summary.json --print-status-updates --timeout 14400 --raw-cmd --task-name veyron_jerry-release/R69-10868.0.0-sanity --dimension os Ubuntu-14.04 --dimension pool default --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:sanity' '--tags=build:veyron_jerry-release/R69-10868.0.0' '--tags=task_name:veyron_jerry-release/R69-10868.0.0-sanity' '--tags=board:veyron_jerry' -- /usr/local/autotest/site_utils/run_suite.py --build veyron_jerry-release/R69-10868.0.0 --board veyron_jerry --suite_name sanity --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 5 --minimum_duts 1 --suite_min_duts 1 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 85446746L, 'cidb_build_id': 2741629, 'datastore_parent_key': ('Build', 2741629, 'BuildStage', 85446746L)}" -m 216271491 Triggered task: veyron_jerry-release/R69-10868.0.0-sanity
,
Jul 12
Over to the oncall.
,
Jul 12
I saw this same error in a previously failing falco-release (which has since passed now): https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941274861474566800 https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8941274861474566800/+/steps/HWTest__bvt-inline_/0/stdout --- log snippet --- 17:40:42: INFO: RetriableHttp: attempt 5 receiving status 503, final attempt [1;33m17:40:43: WARNING: HttpsMonitor.send received status 503: { "error": { "code": 503, "message": "The service is currently unavailable.", "status": "UNAVAILABLE" } } [0m [1;33m17:40:43: WARNING: HttpsMonitor.send received status 503: { "error": { "code": 503, "message": "The service is currently unavailable.", "status": "UNAVAILABLE" } } [0m 17:41:02: WARNING: HttpsMonitor.send received status 429: { "error": { "code": 429, "message": "Insufficient tokens for quota 'WriteGroup' and limit 'CLIENT_PROJECT-100s' of service 'prodxmon-pa.googleapis.com' for consumer 'project_number:102025095358'.", "status": "RESOURCE_EXHAUSTED", "details": [ { "@type": "type.googleapis.com/google.rpc.Help", "links": [ { "description": "Google developer console API key", "url": "https://console.developers.google.com/project/102025095358/apiui/credential" } ] } ] } } [1;33m17:41:02: WARNING: Exception is not retriable return code: 3; command: /b/swarming/w/ir/cache/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /b/swarming/w/ir/tmp/t/cbuildbot-tmpV8pj3x/tmpIOl9mm/temp_summary.json --print-status-updates --timeout 14400 --raw-cmd --task-name falco-release/R69-10867.0.0-bvt-inline --dimension os Ubuntu-14.04 --dimension pool default --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:bvt-inline' '--tags=build:falco-release/R69-10867.0.0' '--tags=task_name:falco-release/R69-10867.0.0-bvt-inline' '--tags=board:falco' -- /usr/local/autotest/site_utils/run_suite.py --build falco-release/R69-10867.0.0 --board falco --suite_name bvt-inline --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --max_retries 5 --minimum_duts 4 --suite_min_duts 6 --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 85408934L, 'cidb_build_id': 2740013, 'datastore_parent_key': ('Build', 2740013, 'BuildStage', 85408934L)}" --json_dump -m 216198187 Triggered task: falco-release/R69-10867.0.0-bvt-inline
,
Jul 13
We are spiking over the quota of 150k write requests per 100s, I'm not sure I correctly increased the quota to 200k per 100s. https://pantheon.corp.google.com/apis/api/prodxmon-pa.googleapis.com/credentials?organizationId=433637338589&project=google.com:prodx-mon-chrome-infra or if that just changed the view of the monitoring graph.
,
Jul 13
Filed crbug.com/863466 with the troopers to investigate. we should handle the issue of not being able to log monitoring data and not failing the build. Will investigate this issue, lowering priority to p2. crbug.com/863466 is the critical bug as to why the monitoring writes are spiking.
,
Jul 16
Copying current sheriffs
,
Jul 17
Verified monitoring errors have cleared, don't think they were causing issues. The second error listed above is still appearing in the logs and the veyron_jerry-release builders are still failing. Also verified that the builds are running on different builders, so not a single builder issue. 23:21:38: INFO: Refreshing due to a 401 (attempt 1/2) 23:21:38: INFO: Refreshing access_token 23:31:38: INFO: Refreshing due to a 401 (attempt 1/2) 23:31:38: INFO: Refreshing access_token Still trying to figure out what the something is that is hanging ...
,
Jul 23
Veyron_jerry-release has had 9 successfull builds since 2018-07-20: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=veyron_jerry-release&buildBranch=master The access token failures are no longer happening, closing. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by athilenius@chromium.org
, Jul 12