New issue
Advanced search Search tips

Issue 915342 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature

Blocking:
issue 869348
issue 916548



Sign in to add a comment

Swarming: create pending_for_dedupe state

Project Member Reported by mar...@chromium.org, Dec 14

Issue description

Historical note:
That's a fairly rare circumstance but it's starting to happen more and more as binaries are more deterministic/reproducible across OSes, thanks a lot of our migration to clang/LLVM everywhere.

Scenario:
- Multiple Swarming tasks with the exact same internal TaskSlice hash are triggered within a short period of time.

Actual:
- They all run, because there's no TaskResultSummary with state COMPLETED yet.

Expected:
- The duplicates wait for the first one to complete, and skip accordingly.


Pro:
- Saves duplicate workload in a scenario that can happen relatively often, I recent commit is still being tested, try jobs are happening concurrently.

Drawback:
- In the failure mode case, this increases user visible latency.


Implementation:
- This would require a new state, PENDING_DEDUPE, or a way for the task to have its TaskToRun.queue_number set to None, yet ready to be enqueued when the dedupe_from task is done. This is tricky, as this means that when the "primary" task completes, it now needs to do N additional transactions for each pending tasks that were waiting for its results, either doing a DUPED if the "primary" task succeeded, or to enqueue the TaskToRun.queue_number.

This creates a new situation where TaskToRun.queue_number is None yet TaskResultSummary.state is PENDING. This complicates expiration handling and would challenge some assumptions in the code base.
 
Blocking: 869348
Blocking: 916548

Sign in to add a comment