New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 674759 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Sep 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Feature

Blocking:
issue 673940



Sign in to add a comment

Add specific metrics/alerts for update_scripts step

Project Member Reported by iannucci@chromium.org, Dec 15 2016

Issue description

'update_scripts' is a step that is CLEARLY an infra-only step; nothing in the user's task can make this step fail, and it only fails when infra changes are made. We should have metrics and alerts for its failure.
 
Labels: Infra-Failures

Comment 2 by iannu...@google.com, Dec 17 2016

Cc: iannucci@chromium.org
Owner: ----
Status: Available (was: Assigned)
since I'm out till january, I'm de-assigning this in case someone else wants to do this next week.
Owner: iannucci@chromium.org
Status: Assigned (was: Available)
No one did this over break. Re-assigning to you, Robbie.
just getting back from break, I'll work on this this week.
Blocking: 673940
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 12 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/f1726d83b169ff9852308cf214dbda7c24407758

commit f1726d83b169ff9852308cf214dbda7c24407758
Author: Robert Iannucci <iannucci@google.com>
Date: Thu Jan 12 21:26:26 2017

A suggested alert: "update_scripts fails too often": http://shortn/_w00tdSSu6Q (threshold > 0) which basically combines two criteria: a fleet-wide update_scripts failure (> 5/hour), or a single bot fails >50% of update_scripts steps over 4h period.

The alert description would have a link to a viceroy graph with detailed breakdown per bots (say, top 10), per master, fleet total, etc. so the trooper can diagnose why it fired and what kind of problem it is.

WDYT?
Labels: Hotlist-Infra-Failures

Comment 9 by efoo@chromium.org, Mar 29 2017

Components: -Infra>Monitoring Infra>Platform>Buildbot
Labels: Ops-AddMonitoring
Removing Infra>Monitoring since this is a Buildbot related alert modification. Please reserve Infra>Monitoring for monitoring (ts_mon and event_mon) bugs. Added Ops-AddMonitoring label to track monitoring related tasks.
What's the story? This is a bug filed from a postmortem and is still a P1. Please let us know the plan and ETA for a fix.
Owner: iannu...@google.com
bulk reassign to my google account
Owner: iannucci@chromium.org
reassign to my chromium.org account! I was holding it wrong, apparently.
Issue 421769 has been merged into this issue.

Comment 14 by efoo@chromium.org, May 24 2017

Robbie, what's the plan for this P1? This is a P1 from a post mortem. What's the plan here? Should this be lowered to a lower priority?
Thanks!
Cc: -katthomas@chromium.org

Comment 16 by efoo@chromium.org, Sep 7 2017

Labels: -Pri-1 Pri-2
Lowering priority to Pri-2. 




Comment 17 by efoo@chromium.org, Sep 7 2017

This is blocking closing cit-pm-10. What's the plan for this?
This is still a staging alert. Have we found it to be helpful?

Status: WontFix (was: Assigned)
I move to close this as wontfix. We specifically don't do this (update_scripts) in LUCI.

Please reopen if you disagree.

Sign in to add a comment