add a pre-cq-launcher tick rate alert |
|||||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/pre-cq-launcher/builds/9415 full of messages like [W 2017-06-19 16:27:27] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:27] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:32] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:37] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:43] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:44] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:49] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:49] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:54] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:27:56] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:28:00] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} [W 2017-06-19 16:28:02] TRANSIENT error publishing messages; retrying... {"error":"context deadline exceeded", "delay":"30s", "pubsub":"pubsub(projects/luci-logdog/topics/logs)"} unsurprisingly, these messages aren't in the logdog version of the logs.
,
Jun 19 2017
Going to restart the pre-cq launcher and see if that fixes things.
,
Jun 19 2017
^ got pre-cq working again. Demoting to P1. Outage is over. Possible preventative meausures: - pre-cq tick rate alerts - root cause the pubsub publishing failure (hence Infra label on this bug)
,
Jun 19 2017
Chase-Pending. Justification: adding alerts are well scoped, preventative measure against P0 outages.
,
Jun 19 2017
+logdog people
,
Jun 20 2017
Pub/Sub uptime and connectivity are prerequisites. Nothing in the logs suggest anything went wrong on our end, and it was retrying consistently. This suggests wither a GCE or acute Pub/Sub service outage, which are both beyond our control.
,
Jun 20 2017
Looks like it's happening again?
,
Jun 20 2017
Never mind, things are fine
,
Jun 21 2017
,
Jun 21 2017
pre-cq-launcher is a class 1 service. We need to shorten its outages. Re-upping to P1.
,
Jun 22 2017
,
Jun 26 2017
,
Jun 28 2017
,
Jun 28 2017
,
Jul 10 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by akes...@chromium.org
, Jun 19 2017