New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781021 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 728124
issue 810008
issue 812886
issue 812936
issue 817831
issue 831348

Blocking:
issue 795745
issue 831252
issue 838060
issue 839173
issue 839464
issue 839467
issue 847602



Sign in to add a comment

Swarming automatic TaskProperties fallback

Project Member Reported by mar...@chromium.org, Nov 2 2017

Issue description

Implement a way to create automatic fallback when a TaskProperties cannot be fulfilled in a pre-determinated amount of time.

This requires updating the DB Schema, the API and the documentation, but requires no bot change.

Related to issue 756295 but not exactly an 'or', in the sense that there is a strong linear progression between the options.

Work in progress doc (internal only): http://go/swarming-automatic-fallback
 

Comment 1 by mar...@chromium.org, Jan 12 2018

Owner: mar...@chromium.org
Status: Assigned (was: Available)

Comment 2 by mar...@chromium.org, Jan 12 2018

 Issue 706586  will be implemented as part of this work.

Comment 3 by mar...@chromium.org, Jan 12 2018

Cc: martiniss@chromium.org jonesmi@google.com dpranke@chromium.org nedngu...@google.com
 Issue 706586  has been merged into this issue.
Project Member

Comment 4 by bugdroid1@chromium.org, Jan 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/521c67ca0b647037004b0e7f3831d47aa9c3771c

commit 521c67ca0b647037004b0e7f3831d47aa9c3771c
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Jan 12 20:54:13 2018

swarming: reduce the use of request.properties

No functional change.

request.properties will change of form. Do refactoring so follow up refactoring
has less copy-paste.

Fix an aliasing of a variable 'request' which made the code confusing.

R=vadimsh@chromium.org

Bug:  781021 
Change-Id: I85edd9fa2a7105d6304c02ab10eeb066ae8a65ac
Reviewed-on: https://chromium-review.googlesource.com/865414
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/521c67ca0b647037004b0e7f3831d47aa9c3771c/appengine/swarming/handlers_bot.py
[modify] https://crrev.com/521c67ca0b647037004b0e7f3831d47aa9c3771c/appengine/swarming/handlers_endpoints.py
[modify] https://crrev.com/521c67ca0b647037004b0e7f3831d47aa9c3771c/appengine/swarming/message_conversion.py

keyboard: load shedding, shed, deny

Comment 6 by mar...@chromium.org, Feb 16 2018

Blockedon: 812936

Comment 7 by mar...@chromium.org, Feb 21 2018

Blocking: 795745

Comment 8 by mar...@chromium.org, Feb 21 2018

Blockedon: 812886 810008 728124
Cc: eyaich@chromium.org
Blockedon: 817831

Comment 11 by efoo@chromium.org, Mar 21 2018

Labels: LUCI-Blocker-Clients LUCI-Clients

Comment 12 by efoo@chromium.org, Mar 21 2018

Labels: LUCI-Blocker-Chromium-CQSets
Project Member

Comment 13 by bugdroid1@chromium.org, Mar 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/821925d0b365357471252beac43c928161d7f506

commit 821925d0b365357471252beac43c928161d7f506
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Mar 22 01:12:17 2018

Swarming: move assert_task() guts into subfunction

There is no functional change, the code is just moved.

Soon a TaskRequest will have multiple TaskProperties, each as part of a
TaskSlice. Move this code without changing the logic, so that follow ups are
easier to read.

In a follow up I'll want to make it async but decided to just move code first.

R=vadimsh@chromium.org

Bug:  781021 
Change-Id: I596d41d76a37baa4e5333e89761d8ad054d7ac83
Reviewed-on: https://chromium-review.googlesource.com/938779
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/821925d0b365357471252beac43c928161d7f506/appengine/swarming/server/task_queues.py

Project Member

Comment 14 by bugdroid1@chromium.org, Mar 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/c5db5135a63f3db4452308b1535ec47f189f668b

commit c5db5135a63f3db4452308b1535ec47f189f668b
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Sat Mar 24 03:54:44 2018

Swarming: one TaskToRun per attempt.

- Change TaskToRun key id to the try number. It will contain more information
  once  issue 781021  is implemented.
- Do not store TaskToRun for deduped task.
- Do not reuse TaskToRun for try #2.
- Add TaskToRun.created_ts to know when a task was enqueued. It will be needed
  for  issue 781021 .
- Rename a few 'task' local variable to 'to_run' make it things clearer.

This change temporarily breaks task cancelation and dead bot handling for the
tasks that are on-going.

Bug:  817831 , 781021 
Change-Id: If252001ba8c8bcb499134b1d21ab2f10cd06203b
Reviewed-on: https://chromium-review.googlesource.com/969760
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/doc/Schemas.md
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/c5db5135a63f3db4452308b1535ec47f189f668b/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 15 by bugdroid1@chromium.org, Mar 26 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/d15fc3980b4541f6f6b384ba167db4af19179625

commit d15fc3980b4541f6f6b384ba167db4af19179625
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Mon Mar 26 22:28:58 2018

[swarming] overhaul handlers unit tests

Will add new tests soon and wanted to clean it up a bit first.
Removed 460 lines from the test code by creating reusable expectations.

No functional change.

Bug:  754390 , 781021 
Change-Id: I1f8a5b84312c21e6d4578cae1beb61d281009a4e
Reviewed-on: https://chromium-review.googlesource.com/979275
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/d15fc3980b4541f6f6b384ba167db4af19179625/appengine/swarming/handlers_bot_test.py
[modify] https://crrev.com/d15fc3980b4541f6f6b384ba167db4af19179625/appengine/swarming/handlers_endpoints_test.py
[modify] https://crrev.com/d15fc3980b4541f6f6b384ba167db4af19179625/appengine/swarming/handlers_test.py
[modify] https://crrev.com/d15fc3980b4541f6f6b384ba167db4af19179625/appengine/swarming/test_env_handlers.py

Project Member

Comment 16 by bugdroid1@chromium.org, Mar 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6

commit fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Mar 28 09:41:22 2018

[swarming] Create TaskSlice at the API level.

It is correctly supported so that clients can start updating right away, but
TaskSlice fallback is not implemented.

No behavior change, except making the API return duplicate data, so clients
have time to update.

This permits simplifying the delta for the actual implementation, splitting PAI
changes from the DB schema change, thus reducing the likelihood of breaking the
whole thing and making the change almost reviewable for once.

The client is not updated yet, will do a bit later.

Bug:  781021 
Change-Id: I4941e6869fae9ffaceb5fa1c074ce5699b9c1642
Reviewed-on: https://chromium-review.googlesource.com/979453
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6/appengine/swarming/handlers_endpoints_test.py
[modify] https://crrev.com/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6/appengine/swarming/local_smoke_test.py
[modify] https://crrev.com/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6/appengine/swarming/message_conversion.py
[modify] https://crrev.com/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6/appengine/swarming/swarming_rpcs.py
[modify] https://crrev.com/fc0788b49dfdfdb1d7e2a438e2feb325df39c8d6/appengine/swarming/test_env_handlers.py

Project Member

Comment 17 by bugdroid1@chromium.org, Mar 28 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/023d49b55cce35021d6e050322ec7367c77f9d55

commit 023d49b55cce35021d6e050322ec7367c77f9d55
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Mar 28 20:43:04 2018

[swarming] Compact task_scheduler_test by 80 lines.

No functional change.

Makes properties explicit in each unit test.

The main reason for this change is that it'll make task_slices
testing much easier, so that the properties value will eventually be changed for
task_slices values.

Make task_queue expectations stricter.

Change-Id: I90bdfc967b073a16d24f75967ed8d21aa6f01080
Bug:  781021 
Reviewed-on: https://chromium-review.googlesource.com/982833
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/023d49b55cce35021d6e050322ec7367c77f9d55/appengine/swarming/server/task_scheduler_test.py

Project Member

Comment 19 by bugdroid1@chromium.org, Apr 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18

commit aac2b76b53aa1b2158ac84b9325bf0e3b3265f18
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 04 18:35:41 2018

[swarming] Define current_task_slice in task result, TaskSlice task request

Will soon™ be used.

Doesn't use it yet, to reduce the delta and make the change more reviewable.

TaskSlice will be embedded into TaskRequest as a repeated field as
'task_slices', to replace the current embedded TaskProperties as 'properties'.

Bug:  781021 
Change-Id: I009aa9436f7ed86ab77ec0a43027cf367f933c4a
Reviewed-on: https://chromium-review.googlesource.com/980705
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/message_conversion.py
[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/server/task_request.py
[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/server/task_result.py
[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/aac2b76b53aa1b2158ac84b9325bf0e3b3265f18/appengine/swarming/server/task_scheduler_test.py

Project Member

Comment 20 by bugdroid1@chromium.org, Apr 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/e295ecee83965943a08dea963dd76a3ec446931f

commit e295ecee83965943a08dea963dd76a3ec446931f
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 04 19:34:21 2018

[swarming] Add TaskToRun.task_slice_index

Remove TaskToRun._pre_put_hook, it's not useful. new_task_to_run() does the
verifications.

Use assert (instead of raising an exception) because this is an internal
failure.

Bug:  781021 
Change-Id: Ia2b78f3684f5f6a69e1c39ed17df4290bfbeab33
Reviewed-on: https://chromium-review.googlesource.com/988595
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/e295ecee83965943a08dea963dd76a3ec446931f/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/e295ecee83965943a08dea963dd76a3ec446931f/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/e295ecee83965943a08dea963dd76a3ec446931f/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/e295ecee83965943a08dea963dd76a3ec446931f/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/e295ecee83965943a08dea963dd76a3ec446931f/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 21 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a

commit 945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Tue Apr 10 14:53:37 2018

[swarming] Add TaskRequest.manual_tags and prepare for TaskRequest.task_slices

TaskRequest.manual_tags is to differentiate between tags specified by the user
and tags that are automatically added. Tags on a TaskSlice will have the manual
tags plus the tags relevant for this specific TaskSlice. E.g. if someone
triggers first on 'gpu:foo:bar' then fallbacks on 'gpu:foo:baz', the
TaskResultSummary shall have the automatic tag from the dimension of the
TaskSlice that actually ran. Said in a different way: this means that the tags
for a TaskResultSummary shall have only the relevant tags, hence the manual tags
must be kept in a separate property in TaskRequest.

Stop using TaskRequest.properties and use TaskRequest.task_slice(index) to
prepare for TaskRequest.task_slices. Right now index must be 0, as there's only
one TaskSlice (the one stored as TaskRequest.properties). This makes later CL
much simpler.

Redo task_request_test.py TaskRequest entity creation flow as this was based on
a flow that was used in 2015.

No behavior change from the user's PoV.

Bug:  781021 
Change-Id: If4e49e577ef9086cb1560e0b19c54070174c738e
Reviewed-on: https://chromium-review.googlesource.com/995621
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/event_mon_metrics.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/handlers_backend.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/handlers_bot.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/handlers_endpoints.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/handlers_endpoints_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/message_conversion.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/service_accounts_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_queues.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_queues_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_request.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_request_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_result.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/server/task_to_run_test.py
[modify] https://crrev.com/945d5d7da6b9f32a9523d12e9cfc8eecbb87c79a/appengine/swarming/test_env_handlers.py

Blocking: 831252
Project Member

Comment 23 by bugdroid1@chromium.org, Apr 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/27f3c7ed26c44319dcfefaee5bb51496a1194d6d

commit 27f3c7ed26c44319dcfefaee5bb51496a1194d6d
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 11 14:32:16 2018

[swarming] Fix bug introduced in 945d5d7da6b9f32a952.

This very rarely occurs, only during a very short window where the negative
cache was not caught, and the item was not detected as reapeable even in the
index.

TBR=iannucci@chromium.org

Bug:  781021 
Change-Id: I1f4977fff59ac95ca81c6a2053da77d3d872ff52
Reviewed-on: https://chromium-review.googlesource.com/1007255
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/27f3c7ed26c44319dcfefaee5bb51496a1194d6d/appengine/swarming/server/task_scheduler.py

Project Member

Comment 24 by bugdroid1@chromium.org, Apr 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/afc02e9362a1007e1076db41788d74d4bcf272c6

commit afc02e9362a1007e1076db41788d74d4bcf272c6
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 11 14:50:16 2018

[swarming] Retry Tx more in /internal/cron/update_bot_info

datastore_utils.transaction_async() sets the default number of retries to 1,
which means a lot of
  TransactionFailedError: too much contention on these datastore entities.
  please try again.
are thrown.

This is a problem to the extent that the logs are filed with exceptions due to
this cron job at a rate of ~50%, depending on the new number of dead bots.
'Fix' the problem by retrying more.

TBR=iannucci@chromium.org

Bug:  781021 
Change-Id: I2a93b3380f17ca6c6e975df97a99b2832094c540
Reviewed-on: https://chromium-review.googlesource.com/1007334
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/afc02e9362a1007e1076db41788d74d4bcf272c6/appengine/swarming/server/bot_management.py

Oops the last CL was related to  issue 826421 .
Project Member

Comment 26 by bugdroid1@chromium.org, Apr 12 2018

Project Member

Comment 27 by bugdroid1@chromium.org, Apr 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/4e55114a23538a9a91546f448ef65e4ae69eebb4

commit 4e55114a23538a9a91546f448ef65e4ae69eebb4
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Apr 12 17:54:16 2018

[swarming] Store TaskRequest.task_slices instead of .properties

This is backward incompatible before TaskRequest.task_slices support in
0b45c23341a5d0c2f.

Still only allow a single TaskSlice as the fallback algorithm is not live yet.

Bug:  781021 
Change-Id: I0390123a6fa2570b56de2b80fe38fee650538f45
Reviewed-on: https://chromium-review.googlesource.com/1008667
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/4e55114a23538a9a91546f448ef65e4ae69eebb4/appengine/swarming/handlers_endpoints_test.py
[modify] https://crrev.com/4e55114a23538a9a91546f448ef65e4ae69eebb4/appengine/swarming/message_conversion.py

Project Member

Comment 28 by bugdroid1@chromium.org, Apr 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/b5572d91544ea860c6c9a9ae808acf9728c39459

commit b5572d91544ea860c6c9a9ae808acf9728c39459
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 18 21:50:11 2018

[swarming] refactor task_to_run

No functional change.

The reason for this change is to enable using task_slices in a follow up, and by
doing refactoring up front, this will better highlight the actual changes.

Change-Id: Ib15a1c9fc835ef09fc00222ce92d1fe0fc6d2634
Bug:  781021 
Reviewed-on: https://chromium-review.googlesource.com/1015330
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/b5572d91544ea860c6c9a9ae808acf9728c39459/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 29 by bugdroid1@chromium.org, Apr 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/1fc37f909ef6cb6a87db77bf71b45fe6d78a6e01

commit 1fc37f909ef6cb6a87db77bf71b45fe6d78a6e01
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Apr 20 00:26:35 2018

[client] Use the new API task_slices for task creation

Task result will be done later. In the meantime they can be queried with the
command query.

Do not add support for multiple task slices yet, I still don't know how to
encode this in a user friendly way. But switching to task_slices is a
prerequisite anyway.

Once this tool (and all other clients) are updated, we'll be able to remove the
compatibility layer in the server.

Bug:  781021 
Change-Id: Ic2b5d3002caa2747e456669a721c3ec4207ce45f
Reviewed-on: https://chromium-review.googlesource.com/1019901
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/1fc37f909ef6cb6a87db77bf71b45fe6d78a6e01/client/swarming.py
[modify] https://crrev.com/1fc37f909ef6cb6a87db77bf71b45fe6d78a6e01/client/tests/swarming_test.py

Project Member

Comment 30 by bugdroid1@chromium.org, Apr 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/74f8bd0ed7bcef244a0494fd7c05a2b788b87e3f

commit 74f8bd0ed7bcef244a0494fd7c05a2b788b87e3f
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Apr 20 12:40:36 2018

[client] Fix regression in 1fc37f909ef6cb6a87db

Mainly working around a stupid assert that tests that timeout is a float. I
could have also replaced the assert but it's simpler to fix the call site for
now.

TBR=vadimsh@chromium.org
Bug:  781021 
Bug: skia:7827
Change-Id: Ic82b462c159d9e0929be3532e4c97a9d1b4b891b
Reviewed-on: https://chromium-review.googlesource.com/1021830
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/74f8bd0ed7bcef244a0494fd7c05a2b788b87e3f/client/swarming.py

Project Member

Comment 31 by bugdroid1@chromium.org, Apr 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/4653a4dc87ff3de19cde53fc261754615b40ca6e

commit 4653a4dc87ff3de19cde53fc261754615b40ca6e
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Sat Apr 21 00:47:29 2018

[client] Update default value to be of the right type.

In optparse, if you use type='float', default=0, the default value type will be
int. That's cute.

Bug:  781021 
Change-Id: Iab9a7b16486c311bdf5cf2241c4963c1447aa2b2
Reviewed-on: https://chromium-review.googlesource.com/1022375
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/4653a4dc87ff3de19cde53fc261754615b40ca6e/client/swarming.py

Project Member

Comment 32 by bugdroid1@chromium.org, Apr 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/d9ce22df5318640a761303259c47854d7932d986

commit d9ce22df5318640a761303259c47854d7932d986
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Mon Apr 23 14:48:41 2018

[client] Update swarming.py run with new task_slices

This was broken by me in 1fc37f909ef6cb6a87db. I didn't realize because run is
rarely used.

- Fix swarming.py run to work.
- Add a unit test to confirm that it works now (and didn't before)
- Rename a few poorly named unit tests.

Bug:  781021 
Change-Id: Ic43f5b4021465209a5328086db203cb6cb523868
Reviewed-on: https://chromium-review.googlesource.com/1023712
Reviewed-by: Kevin Lubick <kjlubick@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/d9ce22df5318640a761303259c47854d7932d986/client/swarming.py
[modify] https://crrev.com/d9ce22df5318640a761303259c47854d7932d986/client/tests/swarming_test.py

Project Member

Comment 33 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/bf92440161b8ce8e3c99bc69a10f96a13aab9256

commit bf92440161b8ce8e3c99bc69a10f96a13aab9256
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 25 20:54:49 2018

[client] Add swarming.py post

This command permits arbitrary post RPC. Think of a generic RPC CLI tool, except
that it prepends the swarming API base resource.

While not generally meant be used, it'll be useful to test TaskSlice support
while iterating on the way this should be presented to the user in the CLI tool.

Bug:  781021 
Change-Id: Ie405178e9b4849d69751d67e24eb17f486075edc
Reviewed-on: https://chromium-review.googlesource.com/1028461
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/bf92440161b8ce8e3c99bc69a10f96a13aab9256/client/swarming.py
[modify] https://crrev.com/bf92440161b8ce8e3c99bc69a10f96a13aab9256/client/tests/swarming_test.py

Project Member

Comment 34 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/febdf3b3d8818dfded4dae6c9aecab43de5ceadb

commit febdf3b3d8818dfded4dae6c9aecab43de5ceadb
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 25 21:19:59 2018

[swarming] fix TaskSlice checks and tags creation

- Due to a bug in ndb.Model where LocalStructuredProperty's _pre_put_hook is not
  called, the function needs to be called manually.
- automatic tags depends on expiration_ts as TaskRequest.task_slice() is called,
  which was initialized after. Oops.

Unit test refactoring come in the very next CL that confirms all that. I confirmed
manually that the next CL wouldn't pass without these two fixes.

Bug:  781021 
Change-Id: Ie6ae974d539c372e0fa4c719f8d907b010a6f100
Reviewed-on: https://chromium-review.googlesource.com/1028632
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/febdf3b3d8818dfded4dae6c9aecab43de5ceadb/appengine/swarming/server/task_request.py

Project Member

Comment 35 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/ae88212bc49ee866a650439f216742dd3d84910d

commit ae88212bc49ee866a650439f216742dd3d84910d
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Apr 25 22:24:59 2018

[swarming] Improve tests

No functional changes

After years of DB schema refactoring, the unit tests have started to become
really tricky. This CL changes the unit tests code to be more 'typed', as much
as python can be typed. This is done by changing many dict to the expected
ndb.Model type instead. This makes reading the code saner.

Change the unit tests to use more coherent utility function names across unit
tests.

This is in preparation of removing support for the old style
TaskRequest.properties, as for TaskSlice support, TaskRequest.task_slices is
going to be the only member used going on from (next CL).

Intentionally do this unit test refactoring without touching actual code.

Bug:  781021 
Change-Id: I0da1bcf8d318909516121f7a4b612fa92e1ca786
Reviewed-on: https://chromium-review.googlesource.com/1028185
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/service_accounts_test.py
[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/task_queues_test.py
[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/task_request_test.py
[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/ae88212bc49ee866a650439f216742dd3d84910d/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 36 by bugdroid1@chromium.org, Apr 26 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/effb9f51728b7b809003dc6f69ce35f60a8c5681

commit effb9f51728b7b809003dc6f69ce35f60a8c5681
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Apr 26 15:38:20 2018

[swarming] Remove support to store old-style TaskRequest

Now that all tests have been updated to TaskRequest.task_slices, the only
remaining use was for terminate pseudo-task. Update this logic to use the new
task_slices mechanism.

All TaskRequest must now use task_slices. Having one way to do things makes
everything much simpler.

Rename TaskRequest.properties to TaskRequest.properties_old to signal that
should not be used anymore.

Bug:  781021 
Change-Id: I5f53da00073ebd787c847cf0d819ff61c117b616
Reviewed-on: https://chromium-review.googlesource.com/1027807
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/effb9f51728b7b809003dc6f69ce35f60a8c5681/appengine/swarming/server/task_request.py
[modify] https://crrev.com/effb9f51728b7b809003dc6f69ce35f60a8c5681/appengine/swarming/server/task_request_test.py

Blockedon: 831348
Cc: -martiniss@chromium.org
Project Member

Comment 39 by bugdroid1@chromium.org, Apr 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/0f2e8630c078cb65ea310d97b0f6d938006ad8f0

commit 0f2e8630c078cb65ea310d97b0f6d938006ad8f0
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Apr 27 19:31:46 2018

[swarming] add more test expectations on bot's valid task queues

- Make assert_bot_async() return the number of updated entities
- Add expectation in unit tests with the number of BotTaskDimensions entities
  updated.

No functional change, as the return value was previously ignored and it is
currently looked only in unit tests.

This is done so when task slices are added, the expectations match.

Bug:  781021 
Change-Id: Ibf9387deb336b6ee6b6e6690826f866e061328b4
Reviewed-on: https://chromium-review.googlesource.com/1032796
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/0f2e8630c078cb65ea310d97b0f6d938006ad8f0/appengine/swarming/server/task_queues.py
[modify] https://crrev.com/0f2e8630c078cb65ea310d97b0f6d938006ad8f0/appengine/swarming/server/task_queues_test.py
[modify] https://crrev.com/0f2e8630c078cb65ea310d97b0f6d938006ad8f0/appengine/swarming/server/task_scheduler_test.py

Project Member

Comment 40 by bugdroid1@chromium.org, Apr 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/883704490ac581b18fd9f259f8fd1c57a428dbda

commit 883704490ac581b18fd9f259f8fd1c57a428dbda
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri Apr 27 19:37:36 2018

[swarming] Improve task_to_run

- Inline task_to_run.gen_queue_number() inside new_task_to_run(), it doesn't
  need to be exported anymore. It used to be called in task_scheduler.py but
  this was when TaskToRun entities were reused. This was fixed as part of
   https://crbug.com/817831 . There is now exactly one TaskToRun per try (either
  try_number or task_slice_index) so no queue number mutation is possible now.
  \o/
- Rename task_key to to_run_key to be consistent with the rest of the source
  code.
- Add utility functions task_to_run_key_slice_index() and
  task_to_run_key_try_number(), they will be needed soon to figure out the
  actual task_slice used for the TaskToRun, when only having access to its
  ndb.Key.
- This is also in preparation for a bug fix with memcache-based negative cache
  that will be singled out as a separate CL.
- Add unit tests that are ready for task slice fallback.

Bug:  781021 
Change-Id: Ic2c436b9de7de7d7f386408f2bd3585da2f02e54
Reviewed-on: https://chromium-review.googlesource.com/1032636
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/883704490ac581b18fd9f259f8fd1c57a428dbda/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/883704490ac581b18fd9f259f8fd1c57a428dbda/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 41 by bugdroid1@chromium.org, Apr 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/7c74da00be7dc2a080e8c89df08b4af4d648d140

commit 7c74da00be7dc2a080e8c89df08b4af4d648d140
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Mon Apr 30 15:13:37 2018

[swarming] Fix memcache-based TaskToRun negative cache

It used to be only the TaskRequest task id. This meant that the try number 1 and
2 shared the same negative cache, which was fine. Adding TaskSlice fallback in
the mix, this completely breaks down, as TaskToRun entities are not reused since
 https://crbug.com/817831  was fixed.

When this CL is made live, this will cause a small hickup as the negative cache
is being repopulated, as the keys changed. This won't be a big deal in practice,
we cleared the whole memcache frequently and the service survived just fine.

Add a unit test that test interference between try_number 1 and 2. It will be
expanded once task fallback is added, i.e. task_slice_index > 0 is supported.

Bug:  781021 
Change-Id: I944ee067ed96ca3410e45a9c20cffadc68047a23
Reviewed-on: https://chromium-review.googlesource.com/1033292
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/7c74da00be7dc2a080e8c89df08b4af4d648d140/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/7c74da00be7dc2a080e8c89df08b4af4d648d140/appengine/swarming/server/task_to_run_test.py

Project Member

Comment 42 by bugdroid1@chromium.org, Apr 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/fec529e9d03b95a6f58fe8c0597f238cd6994c40

commit fec529e9d03b95a6f58fe8c0597f238cd6994c40
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Mon Apr 30 20:07:12 2018

[swarming] correctly initialize TaskRunResult.current_task_slice

It went undetected because, well, current_task_slice != 0 was not supported. :)

The current unit test only test for current_task_slice == 0 because it's the
only supported value for now.

Bug:  781021 
Change-Id: I723cd7748433883c2490bc5641312ab1726d5918
Reviewed-on: https://chromium-review.googlesource.com/1033423
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/fec529e9d03b95a6f58fe8c0597f238cd6994c40/appengine/swarming/server/task_result.py
[modify] https://crrev.com/fec529e9d03b95a6f58fe8c0597f238cd6994c40/appengine/swarming/server/task_result_test.py
[modify] https://crrev.com/fec529e9d03b95a6f58fe8c0597f238cd6994c40/appengine/swarming/server/task_scheduler.py

Project Member

Comment 43 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/cdcee247aa8b224788b89b287b3ff911a1e6050c

commit cdcee247aa8b224788b89b287b3ff911a1e6050c
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Tue May 01 15:50:37 2018

[swarming] dedupe non-zero TaskSlice

- When looking for task deduplication, also look for task slice fallback. This
  should be fairly fast as we don't expect task requests to have many idempotent
  task fallbacks.
- Improve task_scheduler_test to be more specific when looking at TaskToRun
  entities than "the one that happens to be fetched by the query".
- Remove spurious log statements.
- Change the cron job to reenqueue task fallback to return the task ids
  affected, this will be leveraged in the unit test that confirms this to work.

This change is currently a no-op as only a single TaskSlice is allowed. A unit
test is already written for this specific use case but I can't commit it right
away. :/

I still prefer to fix these bugs before, so once task fallback is enabled, the
code is actually known to be good.

Bug:  781021 
Change-Id: Ie5943924a782994c717c4cb995ee2a9e07748b72
Reviewed-on: https://chromium-review.googlesource.com/1033443
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/cdcee247aa8b224788b89b287b3ff911a1e6050c/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/cdcee247aa8b224788b89b287b3ff911a1e6050c/appengine/swarming/server/task_scheduler_test.py

Blocking: 838060
Project Member

Comment 45 by bugdroid1@chromium.org, May 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/f6e46dd8b9cb9130cc37f627334e04549c8a9530

commit f6e46dd8b9cb9130cc37f627334e04549c8a9530
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed May 02 21:07:20 2018

[swarming] last changes before enabling task fallback

Most of the changes are test changes, to reduce the delta for the commit
enabling the task fallback support.

The exception is to add the check for duplicate task right away.

Bug:  781021 
Change-Id: I5ac377ea4058a96ea32e43c6f2c5d7ebe958f7cb
Reviewed-on: https://chromium-review.googlesource.com/1039945
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/f6e46dd8b9cb9130cc37f627334e04549c8a9530/appengine/swarming/handlers_endpoints_test.py
[modify] https://crrev.com/f6e46dd8b9cb9130cc37f627334e04549c8a9530/appengine/swarming/local_smoke_test.py
[modify] https://crrev.com/f6e46dd8b9cb9130cc37f627334e04549c8a9530/appengine/swarming/server/task_request.py
[modify] https://crrev.com/f6e46dd8b9cb9130cc37f627334e04549c8a9530/appengine/swarming/server/task_request_test.py

Blocking: 839173
Blocking: 839464
Blocking: 839467
Status: Fixed (was: Assigned)
This is committed and released to prod.

 Issue 839173 ,  issue 839464  and  issue 839467  are the follow up items.
Project Member

Comment 51 by bugdroid1@chromium.org, May 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/d4649a781cb62cd7b06801f57d7a916cfd889225

commit d4649a781cb62cd7b06801f57d7a916cfd889225
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu May 24 00:29:31 2018

[swarming] fix expiration on the fly

The 'task slice expiration on the fly' used the whole TaskRequest expiration
instead of the individual TaskSlice expiration. Oops.

Add test case to confirm all code paths work.

Extract the local transaction function from _expire_task() to make the whole
thing more readable, otherwise it's hard to know which variable is aliased or
not.

Fix incorrect docstring and rename variables to be clearer.

Bug:  781021 
Change-Id: Ia2d318c6414476c1abb0d368a9a8838a4a031ac1
Reviewed-on: https://chromium-review.googlesource.com/1069612
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/d4649a781cb62cd7b06801f57d7a916cfd889225/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/d4649a781cb62cd7b06801f57d7a916cfd889225/appengine/swarming/server/task_scheduler_test.py

Blocking: 847602

Sign in to add a comment