New issue
Advanced search Search tips

Issue 675401 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Logdog coordinator should not use http.Client when constructing PubSub client

Project Member Reported by vadimsh@chromium.org, Dec 17 2016

Issue description

The coordinator log is full of:

Failed to create Pub/Sub client. :: {"error":"constructing pubsub client: unsupported HTTP client specified", "id":"0f182472c96e08c950441bc267de7d0f40b3c137bc582005003bb78883c4aaef", "namespace":"luci.infra"}

Failed to get archival publisher. :: {"error":"failed to create Pub/Sub client", "id":"0f182472c96e08c950441bc267de7d0f40b3c137bc582005003bb78883c4aaef", "namespace":"luci.infra"}

----

This is from https://github.com/luci/luci-go/blob/master/logdog/appengine/coordinator/service.go#L340

Cloud PubSub Go library switched to gRPC-only mode in October and no longer supports http.Client. It requires PerRPCCredentials or TokenSource now and, importantly, the client must be properly closed (otherwise it can leak gRPC connections).

I've fixed some PubSub calls in https://github.com/luci/luci-go/commit/60de90eea5c9a5606f7872d2bc1b7aea63e5c057, but apparently I've missed ArchivalPublisher().
 
Cc: phosek@chromium.org
It is possible this causes datastore quota issues.

This graph shows ArchiveStream(...) stopped being called on Dec 15, 15:50: http://shortn/_cIb1zhexSf

This graph shows that Logdog coordinator started to do increasingly more datastore ops at around the same time: http://shortn/_Xb3GwyDs5o

I think due to the bug archival pipeline is essentially offline, but request to archive still accumulate in datastore (as tumble mutatations?) and logdog retries them, eventually eating all datastore quota.
I believe this CL should fix the bug: https://codereview.chromium.org/2582253002/

But I don't know how to verify it works for real (with real PubSub), or how to deploy to luci-logdog.appspot.com.

Comment 3 by d...@chromium.org, Dec 18 2016

Thanks for looking into this. I do believe the latest version deployed jumped the October boundary. LogDog runs on classic GAE - is switching to gRPC going to be problematic b/c of HTTP2, or does that work with urlfetch?
It works fine, tsmon has been using it for some time already. It uses Remote Sockets API underneath.
Project Member

Comment 6 by bugdroid1@chromium.org, Dec 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/0da66c1a83109bf1973a7646efec7331bb8f5d41

commit 0da66c1a83109bf1973a7646efec7331bb8f5d41
Author: dnj <dnj@chromium.org>
Date: Sun Dec 18 21:46:53 2016

Coordinator: AppEngine context for Pub/Sub.

After switching Pub/Sub to gRPC, we need to pass the AppEngine Context
instead of the luci/gae Context.

BUG= chromium:675401 
TEST=None
R=vadimsh@chromium.org

Review-Url: https://codereview.chromium.org/2584293002

[modify] https://crrev.com/0da66c1a83109bf1973a7646efec7331bb8f5d41/logdog/appengine/coordinator/service.go

Project Member

Comment 7 by bugdroid1@chromium.org, Dec 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/fad4dadaa7d4d99fc00958a4d5281af7f9d85a7c

commit fad4dadaa7d4d99fc00958a4d5281af7f9d85a7c
Author: dnj <dnj@chromium.org>
Date: Mon Dec 19 21:50:31 2016

LogDog: Re-use Pub/Sub gRPC clients.

Since a single Tumble mutation may perform multiple Pub/Sub publishes,
re-use a single Pub/Sub gRPC client. Clients are bound to the projects
that they are created from, and free'd at the end of the handler's
operation via a service-wide close.

Leave the archival publisher Close in (even though it's no-op) since
it's a useful check on archival publisher lifecycle.

BUG= chromium:675401 
TEST=None
R=vadimsh@chromium.org

Review-Url: https://codereview.chromium.org/2583033002

[modify] https://crrev.com/fad4dadaa7d4d99fc00958a4d5281af7f9d85a7c/logdog/appengine/coordinator/archivalPublisher.go
[modify] https://crrev.com/fad4dadaa7d4d99fc00958a4d5281af7f9d85a7c/logdog/appengine/coordinator/service.go

Comment 8 by d...@chromium.org, Dec 20 2016

Owner: d...@chromium.org
Status: Fixed (was: Untriaged)
This is looking good now.

Sign in to add a comment