New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 865855 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

lakitu-gpu-paladin:4079 failed

Project Member Reported by oka@chromium.org, Jul 20

Issue description

lakitu-gpu-paladin:4079 failed

Builders failed on: 
- lakitu-gpu-paladin: 
  https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940524314268301552


lakitu-gpu-paladin has failed twice in a row.

HttpError: <HttpError 403 when requesting https://www.googleapis.com/compute/v1/projects/cros-autotest-bots/global/images?alt=json returned "Quota 'IMAGES' exceeded. Limit: 100.0 globally.">



 
Cc: wonderfly@google.com
Labels: -Pri-1 Pri-0
The issue is similar to crbug.com/765380. 
CC: wonderfly@ (the owner of the bug)
I don't have access to cros-autotest-bot project to check quota or cleanup.

Independent of this, this is concerning if we keep leaking resources and hit quota. 

@wonderfly Have we thought of increasing quota until we come up with better solution?
Labels: -Pri-0 Pri-1
> this is concerning if we keep leaking resources and hit quota. 

Most of the leaks were due to the builder machines getting aborted so there isn't much we can do - from the infrastructure layer. Hopefully when we move away from the waterfalls things will get better - there will be fewer aborts.

In the meantime, I have cleaned up stale resources and made sure all resource usages were way below quota.
As per our discussion, wonderfly can you please setup a notification to cloud-image@ or even send pager to primary/secondary oncall (not sure if that is doable), when we are near quota limits. 

Eventually we need to add piece of code while draining waterfall to clean up cros-autotest-bot.



Components: Infra>Client>ChromeOS
Owner: wonderfly@google.com
@wonderfly   can  you try enabling "Stackdriver error reporting" notifications in pantheon for the project.
https://pantheon.corp.google.com/user-preferences/communication?_ga=2.158971770.-1537615205.1531806244

I think once notifications are enalbed, it would make sense to allow lakitu-developers to access the project.

Labels: -Pri-1 Pri-3
Stackdriver error reporting was already enabled on the project, but unfortunately it's not what we need here. As per its description it sends a notification when "A logging configuration error has occured in the project", and what we want is a notification when a resource quota is near to be hit. I looked at other notification types and none of them does this. I'll explore the direction of a cron job. Shouldn't be too hard. In the mean time, I've given lakitu-dev all editor's access to the project.
Cc: -oka@chromium.org
Cc: -wonderfly@google.com wonderfly@chromium.org
Owner: wonderfly@chromium.org

Sign in to add a comment