New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 833938 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Alerts for TKO replicas

Project Member Reported by dgarr...@chromium.org, Apr 17 2018

Issue description

We have alerts around TKO functionality, but no alerts around the RO slaves. The slave tko-rep2 was down for 2 weeks before GE notified us that it was broken.
 
Labels: -Chase-Pending Chase
Owner: nxia@chromium.org
Status: Assigned (was: Untriaged)
stainless's result's pipeline should have a metric / alert on data being stale; then, move that pipeline to use this replica.
FYI this is what we use in our team: https://cloud.google.com/sql/docs/mysql/configure-ha#setting_an_alert_for_a_group_of_failover_replicas you can have this alert in like 3 minutes :)

Comment 3 by nxia@chromium.org, Apr 23 2018

I'll add an alert on the Cloud SQL instance.

Comment 5 by jkop@chromium.org, Apr 24 2018

Cc: jkop@chromium.org

Comment 6 by nxia@chromium.org, Apr 24 2018

The alert was set to "Violates when: Seconds Behind Master is above a threshold of 600 s for greater than 1 hour".

The first alert was fired at https://app.google.stackdriver.com/incidents/0.kr8wl9453hg9?project=google.com:chromeos-lab

Goingto change it to  "Violates when: Seconds Behind Master is above a threshold of 4000 s for greater than 1 hour".
Labels: -Chase
fixed, needs to be examined in light of new replica name.

Comment 8 by nxia@chromium.org, May 8 2018

Status: Fixed (was: Assigned)

Sign in to add a comment