Recent stress tests of have revealed a few problems with the LogDog Collector pipeline that need fixing.
1) Collector is hammering gRPC / BigTable with calls. It is probably exceeding quota, and definitely spending most of its CPU and time on these RPCs.
Currently, Collector loads each individual log entry into its own BigTable row. This makes fetching data and ensuring contiguous log space really easy, but means that large log streams will hammer BT with magnitudes more RPC calls than it's designed to tolerate.
The proposed solution is to change LogDog's intermediate storage BigTable format.
Current: Sparse table, individual row protobufs, row key == {Stream, Index}, one row per index.
Proposed: Sparse table, bulk row protobufs, row key == {Stream, Index_last}, one row per bundled set of contiguous log indexes.
We would assert in Collector that a given bundle entry is composed of contiguous indexes. This becomes a formal ingest requirement rather than a nice implementation detail.
Writing a row would now be a single write per bundle rather than len(bundle) writes, which is a huge gain.
Reading a row would seek forward from target {Stream, Index} to find the first matching row, load that row's data, then find the row value in that bundle. For example:
Want: Row 10
Have: {0-8}, {9-15}, {16-20}
We would seek for {10}. Since we're indexing based on the LAST contiguous index in the bundle, the first row would be {9-15}, which we would load, confirm that 9 < 10, and pick out 10 from the bundle.
Archive: We would still scan forwards from 0 and pull the full space.
Tail: We would still keys-only to the last row and pull that out.
2) Collector currently doesn't refresh a bundle's ACK deadline if it's taking too long. We need to implement this similar to the way task queue does it so that poison bundles don't stay in Pub/Sub forever.
We can do this for free by updating the Pub/Sub library to use the new SubscriptionHandle API (https://godoc.org/google.golang.org/cloud/pubsub#SubscriptionHandle).
Comment 1 by estaab@google.com
, Mar 25 2016