New issue
Advanced search Search tips

Issue 916360 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Dec 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Swarming: add default pubsub topic for new task

Project Member Reported by maruel@google.com, Dec 19

Issue description

Use case:
Normally the client wants to listen for task updates via pubsub, so the client specifies a Cloud Pub/Sub topic to listen to at the moment of task creation, and then listens to the messages.

We found one use case were a service would want to listen to all task updates for a pool, and the client uses a fire-and-forget pattern, thus doesn't need to specify a Pub/Sub topic.

This means we don't need to support a task updates to be sent to 2 different Pub/Sub topics, which simplifies our life.


Goal:
Specify pool level default pubsub topic, that is used by default if the client doesn't specify one.
This has the advantages of:
- being simple in design
- being simple in understanding
- low amount of refactoring


AI:
- Add default_pubsub_notification to Pool in proto/config/pools.proto.
- Add PubSub to PoolConfig in server/pools_config.py.
- Add a function _validate_pubsub(ctx, cfg) to this file.
- Add unit test in server/pools_config_test.py to confirm validation.
- Add a function similar to apply_server_property_defaults() but that applies to the request instead of the properties, called in SwarmingTasksService.new() in handlers_endpoints.py around line 147
- Add unit test in handlers_endpoints_test.py.

https://chromium.googlesource.com/infra/luci/luci-py.git/+/master/appengine/swarming/proto/config/pools.proto
https://cs.chromium.org/chromium/infra/luci/appengine/swarming/server/pools_config.py
https://cs.chromium.org/chromium/infra/luci/appengine/swarming/handlers_endpoints.py

Magically, tomorrow morning a message PubSub shall exists in proto/api/swarming.proto (not tonight because the CQ is broken)
https://chromium.googlesource.com/infra/luci/luci-py.git/+/master/appengine/swarming/proto/api/swarming.proto

Then start using it.
 
Cc: akes...@chromium.org
Why do you want this to be considered a "default" endpoint? Sounds to me like you intend for the default to be overridden if the task specifies a topic.

I propose something different:

 - pools.proto adds a repeated PubsubEndPoint field; all of these endpoints will receive taskupdates for any task in that pool

- refactor _maybe_pubsub_notify_now and its caller so that it accepts a task and a pubsub endpoint separately, rather than reading pubsub endpoint info off of the task

 - instead of munging the task at enqueue time according to whatever is in the pool config at the time (prone to races, adds complexity to task enqueueing, causes "mystery values" to appear on task) just add a separate loop through all endpoints for the tasks's pool to _maybe_pubsub_notify_via_tq
The reason for the design is that it's easy to understand and most importantly; I didn't see a need to stream to two pubsub topics in my discussion with Alex.
Every task in skylab will need to stream to the results-archiver topic (or it will be lost to our results dashboards). That means if there are any other use cases for pubsub topics, they would be incompatible with skylab if we were hard restricted to 1 topic.

I also think it is harder to understand and maintain if the required topic info is replicated over thousands of tasks. For instance, if we need to change topic or change auth token at some point, we would lose all the currently inflight tasks (which in ChromeOS lab could mean many hours worth of pending workload).
I think that the fact that the pubsub topic is locked at task creation is a feature, not a bug, as it enables graceful migration.
Let's VC on this, to me it seems like the opposite.
Status: WontFix (was: Available)
The consensus was to use the BQ table of Skylab results (issue 765435) instead of PubSub for getting results to stainless (b/119272888).

Sign in to add a comment