New issue
Advanced search Search tips

Issue 807344 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Add metric tracking the number of group alerts in SOM

Project Member Reported by nedngu...@google.com, Jan 30 2018

Issue description

Perf benchmarking team has been spent lot of effort in stablizing our waterfall & make our sheriffs more effective. It would be nice to track how this is going by tracking the number of group alerts in SOM. 
There are two group alerts metrics we care about:
1) Number of group alerts of "consistent failures" 
2) Number of group alerts of "new failures" 

Group (1) is the most important to us 

 
Labels: -Type-Bug Milestone-Workflow Type-Feature
Status: Available (was: Untriaged)
Project Member

Comment 2 by bugdroid1@chromium.org, Feb 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/71ed7ee94abe422575253d7462e60fdce21a45a9

commit 71ed7ee94abe422575253d7462e60fdce21a45a9
Author: Sean McCullough <seanmccullough@chromium.org>
Date: Mon Feb 05 23:59:13 2018

[som] Add monitoring metrics for alert *groups*

Also split alerts by category (new vs. consistent failures)

Bug: 807344
Change-Id: Ic3e19c6fc470274c42bb3dab274259544805d1ea
Reviewed-on: https://chromium-review.googlesource.com/896166
Commit-Queue: Sean McCullough <seanmccullough@chromium.org>
Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org>

[modify] https://crrev.com/71ed7ee94abe422575253d7462e60fdce21a45a9/go/src/infra/appengine/sheriff-o-matic/som/handler/analyze.go
[modify] https://crrev.com/71ed7ee94abe422575253d7462e60fdce21a45a9/go/src/infra/appengine/sheriff-o-matic/som/handler/analyze_test.go

Hi Sean, since the change is landed, can we view these metrics in some graph now?
We don't have a viceroy graph for it yet but you can see it in pcon:
http://shortn/_X4CzHqBJaZ


Awesome, thanks Sean!
Hey Sean, thanks for working on this!

Does this graph measure the sum of new and consistent failures?
re: #6 It does if you further break the metric down by "category": http://shortn/_YBwV1Ln4Gv

Great! Sounds like http://shortn/_2kzn6u2l88 is probably the graph we want then. Thanks so much for this Sean!
It seems like this data is deleted after ~5d. Is there any way to get it retained for much longer (6 months to a year)? We were hoping to use this to verify improvement over long time horizons for our team.

Sign in to add a comment