Alert on chromium source commit rate being too low |
|||||||||
Issue descriptionWe should scope this to only be working hours MTV, since we aren't really supporting other times. Or maybe only weekdays, since that's when troopers are on call. I don't think we have a graph for this right now, but we could get one fairly easily I think, right? dsansome@, is this a good idea at all? I was thinking we could do this to have another way to detect if CQ is down, if CQ gets confused or something. Maybe redundancy here is not useful.
,
Nov 22 2016
Assigning to dave to get your thoughts.
,
Nov 22 2016
We have metrics for CQ's commit rate (http://shortn/_NnXrmURCWX) but that won't include people bypassing CQ and landing CLs directly. I don't think the base rate is high enough for you to get a meaningful signal out of the noise. What's the context here - what are you trying to detect?
,
Nov 23 2016
,
Nov 23 2016
Assigning back to Stephen as Dave shared his thoughts.
,
Nov 23 2016
Oh right, the context is https://docs.google.com/document/d/1norVOQ0vMp5dW7PE0m0gKlu2uwSZ_htUYQ7oamsoY0s/edit# of course. I'd say the failed/successful commit ratio would make a better metric. You could threshold it on the total queue length so it doesn't fire when volume is low.
,
Nov 23 2016
So, I wanted to have two separate ways to track success. I don't know how often CQ monitoring would fail, I was just thinking that we could have two redundant things. So, I agree the failed successful commit ratio is good to track, but we should also track commit rate. But if no one else thinks it's worth tracking, we can close this.
,
Feb 28 2017
We do have this alert: https://cs.corp.google.com/piper///depot/google3/configs/monitoring/chrome_infra/buildbot_alerts.py?q=buildbot_alerts&dr=C&l=257 It might be worth checking to see if it's threshold is effective.
,
Mar 29 2017
I haven't worked on this in a while. Should we do this? Katie, you did something similar to this, right? Did we make an alert for CQ commit failure rate?
,
Mar 29 2017
Removing Infra>Monitoring since this is a CQ related alert modification. Please reserve Infra>Monitoring for monitoring (ts_mon and event_mon) bugs. Added Ops-AddMonitoring label to track monitoring related tasks.
,
Aug 18 2017
,
Aug 20
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 20
This is a very old bug for a postmortem that is no longer relevant (CQ no longer needs a local checkout). On top of all, IMHO alerting on low commit rate may be suboptimal (e.g. it might trigger on holidays or on a valid tree closure). I think it's better to have a proper CQ service monitoring, which Foundation most likely has now. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by martiniss@chromium.org
, Nov 22 2016