LogDog currently requires a lot of attention to function. In SRE terms, it's more like a pet, and less like a cattle. LogDog needs to be more like cattle.
LogDog currently requires a lot of attention to function. In SRE terms, it's more like a pet, and less like a cattle. LogDog needs to be more like cattle.
This may include:
* Run LogDog through PRR (With the eventual goal of SRE handoff)
* Reduce the number of potential states a stream can be in. Simplify the pipeline basically.
* Remove dependency on PubSub. The async behavior of PubSub is the cause of a lot of the complexity in stream state (since a lot of the operations end up being eventually consistent, and the code needs to work around that).
LogDog currently requires a lot of attention to function. In SRE terms, it's more like a pet, and less like a cattle. LogDog needs to be more like cattle.
This may include:
* Run LogDog through PRR (With the eventual goal of SRE handoff)
* Reduce the number of potential states a stream can be in. Simplify the pipeline basically.
Currently a stream can be in the following states (among others):
* Pre-registration (butler is registering)
* Registration complete, no logs in collector, pessimistic archival scheduled
* Logs in collector pubsub, not in bigtable
* Logs in collector pubsub, also in bigtable
* Stream terminated, all logs in bigtable, archival scheduled
* Stream terminated, all logs in bigtable, archival started
* Stream terminated, all logs in bigtable, archival completed
* Stream terminated, some logs still in collector pubsub, not drained yet, archival scheduled
* Stream terminated, some logs still in collector pubsub, not drained yet, archival started, DelayMax not hit
* Stream terminated, some logs still in collector pubsub, not drained yet, archival started, DelayMax hit
* Stream terminated, some logs still in collector pubsub, not drained yet, archival completed
And there are more states related to which part of the pipeline a archival tumble task may be in.
This leads to a suggestion which is to...
* Remove dependency on PubSub. The async behavior of PubSub is the cause of a lot of the complexity in stream state (since a lot of the operations end up being eventually consistent, and the code needs to work around that).
Comment 1 by hinoka@chromium.org
, Nov 2