Often times, a MonorailProd4xxRateHigh is the only page I end up getting on my shift, and usually the result of these is to look at the logs for a bit, find out that there was one user who did something that caused high 4xx's, then not take any action.
I believe we should downgrade this alert to a ticket level rather than page level because I thinking keeping alerts we don't do much with as pages encourages a page-blindness where we get used to seeing pages as unimportant.
However, downgrading this alert to a ticket has come up a few times, and one point does keep coming up:
* MonorailProd4xxRateHigh is apparently used to find auth issues as well, so in some cases it does represent a page-worthy issue.
That said, the vast majority of times I've seen this page, it doesn't tend to represent a serious issue. So I think we should figure out some way to split the alerts for the different cases. ie:
* Can we have different alerts for a single user causing a lot of 4xx's versus a global increase in 4xx?
* Suggestion from Shutao: Perhaps we could exclude HTTP requests from robots. (or maybe move them into a separate alert)
* Maybe we can find more specific ways to catch global auth issues/endpoints accidentally getting deleted/other things.
Comment 1 by st...@chromium.org
, Aug 17