Add specific metrics/alerts for update_scripts step |
|||||||||||
Issue description'update_scripts' is a step that is CLEARLY an infra-only step; nothing in the user's task can make this step fail, and it only fails when infra changes are made. We should have metrics and alerts for its failure.
,
Dec 17 2016
since I'm out till january, I'm de-assigning this in case someone else wants to do this next week.
,
Jan 6 2017
No one did this over break. Re-assigning to you, Robbie.
,
Jan 9 2017
just getting back from break, I'll work on this this week.
,
Jan 9 2017
,
Jan 12 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/f1726d83b169ff9852308cf214dbda7c24407758 commit f1726d83b169ff9852308cf214dbda7c24407758 Author: Robert Iannucci <iannucci@google.com> Date: Thu Jan 12 21:26:26 2017
,
Jan 13 2017
A suggested alert: "update_scripts fails too often": http://shortn/_w00tdSSu6Q (threshold > 0) which basically combines two criteria: a fleet-wide update_scripts failure (> 5/hour), or a single bot fails >50% of update_scripts steps over 4h period. The alert description would have a link to a viceroy graph with detailed breakdown per bots (say, top 10), per master, fleet total, etc. so the trooper can diagnose why it fired and what kind of problem it is. WDYT?
,
Jan 25 2017
,
Mar 29 2017
Removing Infra>Monitoring since this is a Buildbot related alert modification. Please reserve Infra>Monitoring for monitoring (ts_mon and event_mon) bugs. Added Ops-AddMonitoring label to track monitoring related tasks.
,
Apr 26 2017
What's the story? This is a bug filed from a postmortem and is still a P1. Please let us know the plan and ETA for a fix.
,
May 10 2017
bulk reassign to my google account
,
May 10 2017
reassign to my chromium.org account! I was holding it wrong, apparently.
,
May 15 2017
Issue 421769 has been merged into this issue.
,
May 24 2017
Robbie, what's the plan for this P1? This is a P1 from a post mortem. What's the plan here? Should this be lowered to a lower priority? Thanks!
,
May 25 2017
,
Sep 7 2017
Lowering priority to Pri-2.
,
Sep 7 2017
This is blocking closing cit-pm-10. What's the plan for this?
,
Jan 22 2018
This is still a staging alert. Have we found it to be helpful?
,
Sep 26
I move to close this as wontfix. We specifically don't do this (update_scripts) in LUCI. Please reopen if you disagree. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by katthomas@google.com
, Dec 16 2016