New issue
Advanced search Search tips

Issue 817831 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 781021



Sign in to add a comment

Swarming: use one TaskToRun per task enqueueing

Project Member Reported by mar...@chromium.org, Mar 1 2018

Issue description

TaskToRun are reused for task retries on internal_failure. That's an error in hindsight, as the reuse means we are loosing information about when the task was re-enqueued.

As part of  issue 781021 , using separate TaskToRun per TaskSlice will make the whole thing much more manageable and auditable as task triggering information won't be lost anymore.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Mar 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/31a1c9f25742feb8fc9eac0bd34601777ff44168

commit 31a1c9f25742feb8fc9eac0bd34601777ff44168
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Mar 22 19:06:36 2018

swarming: split bot_update_task Tx into stand-alone function

As I have to do refactoring in a follow up, move the code first to help
reviewability.

bot_update_task() is unreadable, an epitome of my worst programming skills.

Split the transaction code into its own, moving towards a semblance of
maintainability.

Intentionally do not make something like a namedtuple in this CL to keep it as
pure code movement without structural change.

No functional change.

R=vadimsh@chromium.org

Bug:  817831 
Change-Id: I52bb03129c46ca58a085d508782fa74eb9b827aa
Reviewed-on: https://chromium-review.googlesource.com/970851
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/31a1c9f25742feb8fc9eac0bd34601777ff44168/appengine/swarming/server/task_scheduler.py

Project Member

Comment 2 by bugdroid1@chromium.org, Mar 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/c5db5135a63f3db4452308b1535ec47f189f668b

commit c5db5135a63f3db4452308b1535ec47f189f668b
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Sat Mar 24 03:54:44 2018

Swarming: one TaskToRun per attempt.

- Change TaskToRun key id to the try number. It will contain more information
  once  issue 781021  is implemented.
- Do not store TaskToRun for deduped task.
- Do not reuse TaskToRun for try #2.
- Add TaskToRun.created_ts to know when a task was enqueued. It will be needed
  for  issue 781021 .
- Rename a few 'task' local variable to 'to_run' make it things clearer.

This change temporarily breaks task cancelation and dead bot handling for the
tasks that are on-going.

Bug:  817831 , 781021 
Change-Id: If252001ba8c8bcb499134b1d21ab2f10cd06203b
Reviewed-on: https://chromium-review.googlesource.com/969760
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/doc/Schemas.md
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_to_run_test.py

Comment 3 by mar...@chromium.org, Mar 24 2018

Status: Fixed (was: Assigned)
Deployed to prod.
Project Member

Comment 4 by bugdroid1@chromium.org, May 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/7c1f40f87fec05a05bb8285cceb40d52ecea304a

commit 7c1f40f87fec05a05bb8285cceb40d52ecea304a
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu May 03 21:01:18 2018

[swarming] remove negative cache in dead bot handling

This is not necessary anymore since  https://crbug.com/817831  was fixed. It was
removing the entry for try_number 1 but the new TaskToRun entity for try_number
2 has a different negative cache entry.

Bug:  817831 
Change-Id: I7c8e06776b8a1f316d14ea2a36b5aba561d8acac
Reviewed-on: https://chromium-review.googlesource.com/1042886
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/7c1f40f87fec05a05bb8285cceb40d52ecea304a/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/7c1f40f87fec05a05bb8285cceb40d52ecea304a/appengine/swarming/server/task_scheduler_test.py

Sign in to add a comment