New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 804667 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Tune BotUpdateSlow to take into account # of data points

Project Member Reported by efoo@chromium.org, Jan 23 2018

Issue description

Issue: BotUpdateSlow is firing a lot after hours since it is more likely to occur due to low # of builds and likelihood of the builds taking longer

Idea: Tune this alert such that we take into account a threshold for # of data points? If the # of data point does not exceed a certain threshold, we do not page, but file a ticketed alert instead. 

 
Blockedon: 804952

Comment 2 by mar...@chromium.org, Jan 26 2018

Blockedon: -804952

Comment 3 by mar...@chromium.org, Jan 26 2018

Cc: katthomas@chromium.org
Components: Infra>Monitoring
Status: Available (was: Untriaged)
The monitoring data seems spotty at best:
https://screenshot.googleplex.com/8NwvL5SGkEV

Especially when I compare with actual builds at:
https://ci.chromium.org/buildbot/tryserver.chromium.win/win10_chromium_x64_rel_ng/
I spot checked 10 builds and 9 were <=3 minutes, one outlier at 10min 37sec.
I suspect a degenerate case that creates outliers, which then trigger this alert. cc'ing Kat in case she has an opinion on this.
Currently, the alert fires only if there have been greater than 20 data points in the last 2 hours. 
http://google3/configs/monitoring/chrome_ops_client_infra/buildbot_alerts.py?l=146&rcl=183393513

Let me know if I can help!
Components: -Infra>Monitoring

Sign in to add a comment