Tune BotUpdateSlow to take into account # of data points |
||||
Issue descriptionIssue: BotUpdateSlow is firing a lot after hours since it is more likely to occur due to low # of builds and likelihood of the builds taking longer Idea: Tune this alert such that we take into account a threshold for # of data points? If the # of data point does not exceed a certain threshold, we do not page, but file a ticketed alert instead.
,
Jan 26 2018
,
Jan 26 2018
The monitoring data seems spotty at best: https://screenshot.googleplex.com/8NwvL5SGkEV Especially when I compare with actual builds at: https://ci.chromium.org/buildbot/tryserver.chromium.win/win10_chromium_x64_rel_ng/ I spot checked 10 builds and 9 were <=3 minutes, one outlier at 10min 37sec. I suspect a degenerate case that creates outliers, which then trigger this alert. cc'ing Kat in case she has an opinion on this.
,
Jan 26 2018
Currently, the alert fires only if there have been greater than 20 data points in the last 2 hours. http://google3/configs/monitoring/chrome_ops_client_infra/buildbot_alerts.py?l=146&rcl=183393513 Let me know if I can help!
,
Feb 22 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by wangxianzhu@chromium.org
, Jan 23 2018