New issue
Advanced search Search tips

Issue 864724 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 895378

Blocking:
issue 408424
issue 864728
issue 622538
issue 638932
issue 729851



Sign in to add a comment

Swarming: define bot bucketing in pools.cfg

Project Member Reported by mar...@chromium.org, Jul 17

Issue description

For bot centric monitoring:

Update pools.cfg to define the dimensions that we want to group the bots to define utilization levels, health, time spent in maintenance/quarantined mode, etc.
https://cs.chromium.org/chromium/infra/luci/appengine/swarming/proto/pools.proto

This shall also define a pubsub topic where the time series can be streamed to at a one minute resolution (more discussion is needed about this). The default is streaming through ts_mon.

This is only the luci-config part, not the implementation.
 
Blocking: 864728
 Issue 866052  will act as a defacto reason to add pools.cfg everywhere. These configurations will live in pools.cfg.
Project Member

Comment 3 by bugdroid1@chromium.org, Aug 2

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/3dc6d7117fb611d82ac2b2280675bc3e1bafb46c

commit 3dc6d7117fb611d82ac2b2280675bc3e1bafb46c
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Aug 02 19:57:17 2018

swarming: add discriminant for bot monitoring

R=jbudorick@chromium.org, mikenichols@chromium.org

Bug: 864724
Change-Id: Ib5da3c654819598e9b0388efbeb9c681b2b2c816
Reviewed-on: https://chromium-review.googlesource.com/1145713
Reviewed-by: Mike Nichols <mikenichols@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/3dc6d7117fb611d82ac2b2280675bc3e1bafb46c/appengine/swarming/proto/pools.proto
[modify] https://crrev.com/3dc6d7117fb611d82ac2b2280675bc3e1bafb46c/appengine/swarming/proto/pools_pb2.py

Project Member

Comment 4 by bugdroid1@chromium.org, Aug 3

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/190753b520911d7be9f9632284f0dbca283069a1

commit 190753b520911d7be9f9632284f0dbca283069a1
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Aug 03 18:58:31 2018

[swarming] tweaks pools BotMonitoring proto.

Follow up on change 1145713.

Bug: 864724
Change-Id: Iceee957150eb9f99e56c63adf894ae73dba4b6f7
Reviewed-on: https://chromium-review.googlesource.com/1161978
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/190753b520911d7be9f9632284f0dbca283069a1/appengine/swarming/proto/pools.proto
[modify] https://crrev.com/190753b520911d7be9f9632284f0dbca283069a1/appengine/swarming/proto/pools_pb2.py

Project Member

Comment 6 by bugdroid1@chromium.org, Aug 8

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/457cd5f59b7a1b2eee837f48d86e42a677ebc962

commit 457cd5f59b7a1b2eee837f48d86e42a677ebc962
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Aug 08 22:30:08 2018

[components] Migrate stats_framework into a package

This will be refactored a second time afterward to split out the logservice code
into a separate file. Not doing here to reduce the complexity of the CL.

Bug: 864724
Change-Id: I56b6d260d475e1576ae319502983a80b963a588f
Reviewed-on: https://chromium-review.googlesource.com/1167473
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/components/README.md
[rename] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/components/stats_framework/__init__.py
[rename] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/components/stats_framework/stats_framework_test.py
[rename] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/components/stats_framework/stats_gviz.py
[add] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/components/stats_framework/test_support
[modify] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/components/test_support/stats_framework_mock.py
[modify] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/isolate/handlers_frontend.py
[modify] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/isolate/stats.py
[modify] https://crrev.com/457cd5f59b7a1b2eee837f48d86e42a677ebc962/appengine/isolate/stats_test.py

Project Member

Comment 7 by bugdroid1@chromium.org, Aug 11

Blockedon: 895378
Project Member

Comment 9 by bugdroid1@chromium.org, Nov 19

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/b79439f9c66195d8dd161a39a82aace99fbba94a

commit b79439f9c66195d8dd161a39a82aace99fbba94a
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Mon Nov 19 21:02:09 2018

[swarming] expand BotEvent.ALLOWED_EVENTS to be more readable

As more event types will be added, the previous form was hard to
decipher. This will make issues  895378  and 905087 nicer to implement.

R=qyearsley@chromium.org

Bug: 864724
Change-Id: Icedf77516b72bc2e068624db400d893ba95d916a
Reviewed-on: https://chromium-review.googlesource.com/c/1343297
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/b79439f9c66195d8dd161a39a82aace99fbba94a/appengine/swarming/server/bot_management.py

Project Member

Comment 10 by bugdroid1@chromium.org, Dec 7

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/bd9ae51d3d87f6949fadd7165493a65d221d8cc1

commit bd9ae51d3d87f6949fadd7165493a65d221d8cc1
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Dec 07 12:59:01 2018

[swarming] Add cron job for both bot and task monitoring

They don't do anything, it is simply to ease the deployment by adding
the handlers right away so I can check in cron.yaml update soonish.

Start adding bot events in memcache; this will accelerate monitoring by
an order of magnitude.

Bug: 864722
Bug: 864724
Change-Id: I5702dccb6ee341414db6ebb69e1a207e748f5d29
Reviewed-on: https://chromium-review.googlesource.com/c/1366635
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[add] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/doc/Monitoring.md
[modify] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/handlers_backend.py
[modify] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/bot_management.py
[modify] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/bot_management_test.py
[add] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/stats_bots.py
[add] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/stats_bots_test.py
[add] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/stats_tasks.py
[add] https://crrev.com/bd9ae51d3d87f6949fadd7165493a65d221d8cc1/appengine/swarming/server/stats_tasks_test.py

Sign in to add a comment