Alerts for TKO replicas |
||||
Issue descriptionWe have alerts around TKO functionality, but no alerts around the RO slaves. The slave tko-rep2 was down for 2 weeks before GE notified us that it was broken.
,
Apr 23 2018
FYI this is what we use in our team: https://cloud.google.com/sql/docs/mysql/configure-ha#setting_an_alert_for_a_group_of_failover_replicas you can have this alert in like 3 minutes :)
,
Apr 23 2018
I'll add an alert on the Cloud SQL instance.
,
Apr 23 2018
The alert has been created at https://app.google.stackdriver.com/policy-advanced/17533009174001660803?project=google.com:chromeos-lab
,
Apr 24 2018
,
Apr 24 2018
The alert was set to "Violates when: Seconds Behind Master is above a threshold of 600 s for greater than 1 hour". The first alert was fired at https://app.google.stackdriver.com/incidents/0.kr8wl9453hg9?project=google.com:chromeos-lab Goingto change it to "Violates when: Seconds Behind Master is above a threshold of 4000 s for greater than 1 hour".
,
Apr 30 2018
fixed, needs to be examined in light of new replica name.
,
May 8 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by akes...@chromium.org
, Apr 23 2018Owner: nxia@chromium.org
Status: Assigned (was: Untriaged)