lakitu-gpu-paladin:4079 failed |
|||||||
Issue descriptionlakitu-gpu-paladin:4079 failed Builders failed on: - lakitu-gpu-paladin: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8940524314268301552 lakitu-gpu-paladin has failed twice in a row. HttpError: <HttpError 403 when requesting https://www.googleapis.com/compute/v1/projects/cros-autotest-bots/global/images?alt=json returned "Quota 'IMAGES' exceeded. Limit: 100.0 globally.">
,
Jul 20
I don't have access to cros-autotest-bot project to check quota or cleanup. Independent of this, this is concerning if we keep leaking resources and hit quota. @wonderfly Have we thought of increasing quota until we come up with better solution?
,
Jul 20
Just checked the builder status seems to be green for now. https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=lakitu-gpu-paladin&buildBranch=master
,
Jul 20
> this is concerning if we keep leaking resources and hit quota. Most of the leaks were due to the builder machines getting aborted so there isn't much we can do - from the infrastructure layer. Hopefully when we move away from the waterfalls things will get better - there will be fewer aborts. In the meantime, I have cleaned up stale resources and made sure all resource usages were way below quota.
,
Jul 20
As per our discussion, wonderfly can you please setup a notification to cloud-image@ or even send pager to primary/secondary oncall (not sure if that is doable), when we are near quota limits. Eventually we need to add piece of code while draining waterfall to clean up cros-autotest-bot.
,
Jul 20
,
Jul 21
@wonderfly can you try enabling "Stackdriver error reporting" notifications in pantheon for the project. https://pantheon.corp.google.com/user-preferences/communication?_ga=2.158971770.-1537615205.1531806244 I think once notifications are enalbed, it would make sense to allow lakitu-developers to access the project.
,
Jul 23
Stackdriver error reporting was already enabled on the project, but unfortunately it's not what we need here. As per its description it sends a notification when "A logging configuration error has occured in the project", and what we want is a notification when a resource quota is near to be hit. I looked at other notification types and none of them does this. I'll explore the direction of a cron job. Shouldn't be too hard. In the mean time, I've given lakitu-dev all editor's access to the project.
,
Jul 24
,
Aug 2
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by oka@chromium.org
, Jul 20Labels: -Pri-1 Pri-0