New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 875383 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Monorail 4xx high alerts are noisy

Project Member Reported by zhangtiff@chromium.org, Aug 17

Issue description

Often times, a MonorailProd4xxRateHigh is the only page I end up getting on my shift, and usually the result of these is to look at the logs for a bit, find out that there was one user who did something that caused high 4xx's, then not take any action. 

I believe we should downgrade this alert to a ticket level rather than page level because I thinking keeping alerts we don't do much with as pages encourages a page-blindness where we get used to seeing pages as unimportant. 

However, downgrading this alert to a ticket has come up a few times, and one point does keep coming up: 

* MonorailProd4xxRateHigh is apparently used to find auth issues as well, so in some cases it does represent a page-worthy issue. 

That said, the vast majority of times I've seen this page, it doesn't tend to represent a serious issue. So I think we should figure out some way to split the alerts for the different cases. ie: 

* Can we have different alerts for a single user causing a lot of 4xx's versus a global increase in 4xx? 
* Suggestion from Shutao: Perhaps we could exclude HTTP requests from robots. (or maybe move them into a separate alert) 
* Maybe we can find more specific ways to catch global auth issues/endpoints accidentally getting deleted/other things. 
 
For endpoints accidentally getting deleted, maybe we could use probing approach instead?

Sign in to add a comment