PFQ informational bots fail to SyncChrome due to RESOURCE_EXHAUSTED |
|||||||||
Issue descriptionHappening on: - caroline-tot-chrome-pfq-informational - daisy-tot-chromium-pfq-informational - eve-tot-chrome-pfq-informational - peach_pit-tot-chrome-pfq-informational - tricky-tot-chrome-pfq-informational - veyron_minnie-tot-chrome-pfq-informational Sample build: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8945038622751113392 05:22:21: INFO: Refreshing due to a 401 (attempt 1/2) 05:22:21: INFO: Refreshing access_token 05:45:27: INFO: RetriableHttp: attempt 1 receiving status 503, will retry 05:46:33: WARNING: HttpsMonitor.send received status 429: { "error": { "code": 429, "message": "Insufficient tokens for quota 'WriteGroup' and limit 'CLIENT_PROJECT-100s' of service 'prodxmon-pa.googleapis.com' for consumer 'project_number:102025095358'.", "status": "RESOURCE_EXHAUSTED", "details": [ { "@type": "type.googleapis.com/google.rpc.Help", "links": [ { "description": "Google developer console API key", "url": "https://console.developers.google.com/project/102025095358/apiui/credential" } ] } ] } }
,
Jun 1 2018
Alec, it looks like the prodmon API is exhausted. Need get help from the current ChOps oncall.
,
Jun 1 2018
,
Jun 1 2018
,
Jun 1 2018
Oh, this seems like tsmon data couldn't be sent to prod, but it's not clear why that would block your build. One has to look into what the script being run actually does.
,
Jun 1 2018
Alec, let's look and see if we don't have an exception wrap on the line that killed the build.
,
Jun 5 2018
Thanks for prioritizing this. These builders are very important for gardening.
,
Jun 5 2018
,
Jun 5 2018
This is uncharted territory for me. Can someone convince me that the 429 warning is the cause of the build being killed? I'm looking through CBuildBot source, but that's slow going. The very long list of 401 auth errors that span over almost 24 hours seems more probable as to why the build was killed.
,
Jun 5 2018
This failure was the build hanging, both the 401s and 429 are unrelated. jclinton and I looked at the swarming machine and don't see any other issues with it. Swarming killed the task after 23 hours 50 min with a timeout error.
,
Jun 5 2018
Actually, this is still happening: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderSummary?buildBranch=&builderGroups=informational&limit=&email=&buildConfig=. We need to figure out what is hanging. We can log into the machines and get process trees.
,
Jun 6 2018
`tee` is hanging because /build/amd64-generic/etc/portage/package.keywords/chrome doesn't exist. In fact, /build doesn't exist. Still trying to understand why.
,
Jun 6 2018
,
Jun 7 2018
Issue 849173 has been merged into this issue.
,
Jun 11 2018
,
Jun 11 2018
This turned out to be an issue with the newly added log streaming support, the original error message (resource exhaustion) was a red herring. The builder was stalled until it was finally killed by Swarming after 23 hours, 50 min each time. Reverting crrev.com/c/1063329 (revert here crrev.com/c/1091183) appears to have worked. Marking this fixed. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by jrbarnette@chromium.org
, Jun 1 2018