New issue
Advanced search Search tips

Issue 899829 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug

Blocking:
issue v8:8322



Sign in to add a comment

Logdog archivist not running since Oct 2nd

Project Member Reported by hinoka@chromium.org, Oct 29

Issue description

Cc: estaab@chromium.org
Ryan, do you think this should be a pri-0? Do you you want anyone else to work with you on it?
Labels: -Pri-1 Pri-0
Yes this should actually be a P0.

tandrii suggested to increase the 14 day bigtable log dropoff to stop the bleeding, in which case this could be downgraded to a P1.
Project Member

Comment 3 by bugdroid1@chromium.org, Oct 29

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/71cc512e061fbccda7f0e5653554a2974aef7a96

commit 71cc512e061fbccda7f0e5653554a2974aef7a96
Author: Ryan Tseng <hinoka@google.com>
Date: Mon Oct 29 18:41:03 2018

[logdog] Update archivist/collector to use go 1.9

To fix deployment.

Bug:  899829 
TBR: iannucci
Change-Id: I80771622400130c452bef8ebc4e7a29a4ed3661e
Reviewed-on: https://chromium-review.googlesource.com/c/1305657
Reviewed-by: Ryan Tseng <hinoka@chromium.org>
Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org>
Commit-Queue: Ryan Tseng <hinoka@chromium.org>

[modify] https://crrev.com/71cc512e061fbccda7f0e5653554a2974aef7a96/logdog/server/cmd/logdog_archivist/Dockerfile
[modify] https://crrev.com/71cc512e061fbccda7f0e5653554a2974aef7a96/logdog/server/cmd/logdog_collector/Dockerfile

After redeploying the archivist, the backlog is churning again:
https://screenshot.googleplex.com/KPZEK2tBhzR

However, the rate is quite slow.  The cluster replica size is being increased to churn through the backlog faster.
After bumping the replica size up to 2048 replicas (from 64), the burn rate is 2M entries every 10 minutes.  With 86M outstanding entries, the backlog should burn through in 8 hours.
Status: Fixed (was: Assigned)
This instance is fixed, but ended up uncovering a bug which led to another outage.   crbug.com/900148 
Blocking: v8:8322

Sign in to add a comment