Logdog archivist not running since Oct 2nd |
||||
Issue description
,
Oct 29
Yes this should actually be a P0. tandrii suggested to increase the 14 day bigtable log dropoff to stop the bleeding, in which case this could be downgraded to a P1.
,
Oct 29
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/71cc512e061fbccda7f0e5653554a2974aef7a96 commit 71cc512e061fbccda7f0e5653554a2974aef7a96 Author: Ryan Tseng <hinoka@google.com> Date: Mon Oct 29 18:41:03 2018 [logdog] Update archivist/collector to use go 1.9 To fix deployment. Bug: 899829 TBR: iannucci Change-Id: I80771622400130c452bef8ebc4e7a29a4ed3661e Reviewed-on: https://chromium-review.googlesource.com/c/1305657 Reviewed-by: Ryan Tseng <hinoka@chromium.org> Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org> Commit-Queue: Ryan Tseng <hinoka@chromium.org> [modify] https://crrev.com/71cc512e061fbccda7f0e5653554a2974aef7a96/logdog/server/cmd/logdog_archivist/Dockerfile [modify] https://crrev.com/71cc512e061fbccda7f0e5653554a2974aef7a96/logdog/server/cmd/logdog_collector/Dockerfile
,
Oct 29
After redeploying the archivist, the backlog is churning again: https://screenshot.googleplex.com/KPZEK2tBhzR However, the rate is quite slow. The cluster replica size is being increased to churn through the backlog faster.
,
Oct 29
After bumping the replica size up to 2048 replicas (from 64), the burn rate is 2M entries every 10 minutes. With 86M outstanding entries, the backlog should burn through in 8 hours.
,
Oct 31
This instance is fixed, but ended up uncovering a bug which led to another outage. crbug.com/900148
,
Nov 15
|
||||
►
Sign in to add a comment |
||||
Comment 1 by estaab@chromium.org
, Oct 29