New issue
Advanced search Search tips

Issue 807844 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Cleanup tsmon Go implementation

Project Member Reported by vadimsh@chromium.org, Feb 1 2018

Issue description

It is known to have weird API, race conditions and contentious locks.

Goals in particular:
  * Get rid of New*In(...) functions (e.g. NewIntIn). No one uses them, they are confusing.
  * Get rid of New*Callback and New*CallbackIn functions (e.g. NewCallbackInt). Almost no one uses them, there's no type checks, and nothing bad happens if non-callback metrics are used from within callbacks (in fact most callback users do just that).
  * Get rid of "global callbacks" (but keep "regular" callbacks). They are needed only on GAE and can be implemented in GAE-specific portion of tsmon. Their usage is subtle and confusing (I saw them used outside GAE, where regular callbacks metrics should be used instead).
  * Cleanup usage of context.Context. GetState(context.Background()) should fail, instead of silently falling back to global variables, like it does now. This is just weird.
  * Stop returning 'error' from metric setters. There can be no errors there, and no one actually checks them anyway.
  * Make custom targets easier to use, they are useful on appengine.
  * Stop using metric-global locks, e.g. one hot combination of metric fields must not block updates to other combinations of fields.
  * Distribution-valued metrics are racy.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/358800585c05ee4aa35c1113f0fe23424da3e0ba

commit 358800585c05ee4aa35c1113f0fe23424da3e0ba
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Feb 01 18:37:58 2018

[tsmon] Get rid of "callback" metrics.

There weren't actually used and flush callback do not enforce their usage at
all.

R=iannucci@chromium.org
BUG=807844

Change-Id: I3323e30514a254a519986c8430410dfb02c61295
Reviewed-on: https://chromium-review.googlesource.com/896604
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/358800585c05ee4aa35c1113f0fe23424da3e0ba/appengine/tsmon/globalmetrics.go
[modify] https://crrev.com/358800585c05ee4aa35c1113f0fe23424da3e0ba/common/tsmon/metric/metric.go

Project Member

Comment 2 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/02d859a1feca788415765d68bc3c31a630a42082

commit 02d859a1feca788415765d68bc3c31a630a42082
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Feb 01 20:22:29 2018

[tsmon] Make metric registry global and get rid of New*In functions.

All production code defines metrics as global variables during init() time.
There's not much gained in making the metric registry stored in the context.

R=iannucci@chromium.org
BUG=807844

Change-Id: Ia37e8a6262d9c5eb3a955b3c8fdfc210fd6ae8c2
Reviewed-on: https://chromium-review.googlesource.com/896545
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/bq/eventupload.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/bq/eventupload_test.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/context.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/iface.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/metric/metric.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/metric/metric_test.go
[add] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/registry/registry.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/store/fake.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/store/inmemory.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/store/nil.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/store/store.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/common/tsmon/store/storetest/testing.go
[modify] https://crrev.com/02d859a1feca788415765d68bc3c31a630a42082/server/tsmon/middleware_test.go

Project Member

Comment 3 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b

commit 9c5cd53c3145bd8cc9bc3108669b75fc9819f26b
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Feb 01 20:42:18 2018

[tsmon] Remove 'error' from tsmon metric methods.

Almost no one checks these errors, and they are never happening in a well formed
program.

Instead we now panic if the program is malformed (e.g. it tries to pass an
integer for string-typed field or "increment" a string-typed metric), or log
an error if runtime values are wrong (e.g. when counter is being rolled back).

R=iannucci@chromium.org
BUG=807844

Change-Id: I506c18d6f5ec6788e4db341b7e65b429bd9fe1bc
Reviewed-on: https://chromium-review.googlesource.com/896328
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/bq/eventupload.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/metric/http_transport_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/metric/metric.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/metric/metric_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/metric/standard_metrics_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/store/fake.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/store/inmemory.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/store/nil.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/store/store.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/common/tsmon/store/storetest/testing.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/grpc/grpcmon/client_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/grpc/grpcmon/server_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/milo/buildsource/buildbot/pubsub.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/scheduler/appengine/catalog/catalog_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/scheduler/appengine/engine/engine_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/scheduler/appengine/task/gitiles/state_test.go
[modify] https://crrev.com/9c5cd53c3145bd8cc9bc3108669b75fc9819f26b/server/tsmon/middleware_test.go

Project Member

Comment 4 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/7d8342ee374f84bf9fa3261c2a151d0907184b39

commit 7d8342ee374f84bf9fa3261c2a151d0907184b39
Author: Vadim Shtayura <vadimsh@chromium.org>
Date: Thu Feb 01 22:38:17 2018

Roll luci-go DEPS.

infra/go/src/go.chromium.org/luci:
9c5cd53c3 [tsmon] Remove 'error' from tsmon metric methods.
02d859a1f [tsmon] Make metric registry global and get rid of New*In functions.
358800585 [tsmon] Get rid of "callback" metrics.
20010f36b [Machine Database] Add state to machines
2fca01285 [Machine Database] Add state to oses, platforms, vlans
2a0fccb69 [isolate] Making tarring isolate deterministic.
b5e7210e1 [Machine Database] Add commands to create, delete, and retrieve nics
04403493b [Machine Database] Support deleting nics
635ebc5e6 fix typo: "got" -> "had".
963d27950 [milo] Add ACL checks for builder page.
fada09218 [scheduler] Increase rate of 'read-project-config' queue processing.
be601b65d [milo] Reduce parallel requests from 8 to 4

R=iannucci@chromium.org
BUG=807844

Change-Id: I5103d776ef721ad68318a72bd7093cee4b2994fd
Reviewed-on: https://chromium-review.googlesource.com/898094
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/go/src/infra/monitoring/sysmon/system/system_test.go
[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/go/src/infra/monitoring/sysmon/puppet/puppet_test.go
[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/go/src/infra/monitoring/sysmon/android/android_test.go
[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/go/src/infra/tools/cloudtail/pipe_test.go
[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/DEPS
[modify] https://crrev.com/7d8342ee374f84bf9fa3261c2a151d0907184b39/go/src/infra/monitoring/sysmon/docker/docker_test.go

Cc: -iannucci@chromium.org iannu...@google.com

Sign in to add a comment