Remove inequality on last_seen_ts from BotInfos queries |
|||||
Issue descriptionPart of issue 825843 was caused by using an inequality filter combined with a repeated field. We can at least remove the inequality by adding to composite a dead value. This can easily be done by a cron job. See BotInfo._calc_composite() https://cs.chromium.org/chromium/infra/luci/appengine/swarming/server/bot_management.py?l=184
,
Apr 3 2018
,
Apr 3 2018
Should be doable in two commits.
,
Apr 4 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/c14663096bbb106dd7f0a6ae65cb5439afb45b68 commit c14663096bbb106dd7f0a6ae65cb5439afb45b68 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Apr 04 20:48:51 2018 [swarming] Redo bot_management_test.py in preparation of refactor Rename BotInfo.composite constants in preparation to add more. Cleanup the tests a bit in preparation as more states are being added. Bug: 826421 , 817976 Change-Id: I84d4397713f52a5fec4dd392ce5f1f0f7747c70e Reviewed-on: https://chromium-review.googlesource.com/993034 Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/c14663096bbb106dd7f0a6ae65cb5439afb45b68/appengine/swarming/server/bot_management.py [modify] https://crrev.com/c14663096bbb106dd7f0a6ae65cb5439afb45b68/appengine/swarming/server/bot_management_test.py
,
Apr 5 2018
,
Apr 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/21ce635d37263d0b95c7659371624214134271ba commit 21ce635d37263d0b95c7659371624214134271ba Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Apr 05 20:31:13 2018 [swarming] precompute dead and alive bots The end goal is to remove the inequality filter on last_seen_ts from the BotInfo queries, which is believed to make it harder to count entities. Switch BotInfo.composite from ComputedProperty to IntegerProperty, because otherwise the property is recomputed on *load*, which makes it impossible to know which entities to refresh in the cron job. Duh. :( Rename tidy_stale to cron_tidy_stale to make the naming coherent with the other cron functions. Follow up CLs: 1. Enable the cron job and switch the query filter to stop using last_seen_ts. 2. Remove old indexes and manually vacuum them. This will remove a fair number of indexes, which will help with overall health. Bug: 826421 Change-Id: I6328256f0a4e64c3901abc70db9fec89696f1a86 Reviewed-on: https://chromium-review.googlesource.com/994092 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Robbie Iannucci <iannucci@chromium.org> [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/cron.yaml [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/handlers_backend.py [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/index.yaml [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/server/bot_management.py [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/server/bot_management_test.py [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/server/task_queues.py [modify] https://crrev.com/21ce635d37263d0b95c7659371624214134271ba/appengine/swarming/server/task_queues_test.py
,
Apr 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/83bad5ae88ec21eb4076b266b3a393f542f27278 commit 83bad5ae88ec21eb4076b266b3a393f542f27278 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Apr 05 22:40:14 2018 [swarming] enable dead bot cron job I'll commit only once 21ce635d37263 has been deployed everywhere. Bug: 826421 Change-Id: I545d9e8d388bf9ed25f644f7b871cc2385efbb80 Reviewed-on: https://chromium-review.googlesource.com/998757 Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/83bad5ae88ec21eb4076b266b3a393f542f27278/appengine/swarming/cron.yaml
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/f31363e439280afd6b0f297d27471d5e3277a0e3 commit f31363e439280afd6b0f297d27471d5e3277a0e3 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Apr 10 10:48:25 2018 [swarming] switch BotInfo to use new simpler index Stop doing inequality query for dead bots. This should fix the datastore index problem. Bug: 826421 Change-Id: I27760ea81bf69e0880fc22f866379a9334765fcc Reviewed-on: https://chromium-review.googlesource.com/998934 Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/handlers_endpoints.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/handlers_endpoints_test.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/index.yaml [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/message_conversion.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/server/bot_management.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/server/bot_management_test.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/server/lease_management.py [modify] https://crrev.com/f31363e439280afd6b0f297d27471d5e3277a0e3/appengine/swarming/ts_mon_metrics.py
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/96694430f61b58977ad6e262efb143581b1c9a80 commit 96694430f61b58977ad6e262efb143581b1c9a80 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Apr 10 21:52:24 2018 swarming: fix stale index causing exception in tx Between the time the BotInfo query yeild entities and the transaction runs, the BotInfo entity could be deleted. This happens for Machine Provider managed machines, as BotInfo is deleted once the lease is released, the bot is deleted. This causes exceptions in cron_update_bot_info(), which is annoying but not a big deal. R=iannucci@chromium.org Bug: 826421 Change-Id: I04791b34f9a8f4080cb91fa27927b19ae1d4d7b6 Reviewed-on: https://chromium-review.googlesource.com/1005281 Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/96694430f61b58977ad6e262efb143581b1c9a80/appengine/swarming/server/bot_management.py
,
Apr 10 2018
Mostly done! Need to deploy, and vacuum indexes.
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/90da805a13a5115b68ccb17f89d9eeb4a67aa099 commit 90da805a13a5115b68ccb17f89d9eeb4a67aa099 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Apr 10 22:28:44 2018 [swarming] reduce parallelism in cron_update_bot_info A lot of CommitError are being raised due to contention on the production server. Lower from 25 parallel transaction down to 5 to try to help reduce the failure rate of this cron job. Put the code inside a try/finally so it can log how many items it processed even when failing. Bug: 826421 Change-Id: I647dbe3bcaffb9c704ab6a26f464315717b24329 Reviewed-on: https://chromium-review.googlesource.com/1006095 Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/90da805a13a5115b68ccb17f89d9eeb4a67aa099/appengine/swarming/server/bot_management.py
,
Apr 11 2018
and CL https://chromium-review.googlesource.com/1007334 Deployed everywhere. Indexes vacuumed. \o/ There's still some on-going exceptions remaining so I may do a follow up CL to try to tame them a bit more but it's working well.
,
Apr 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/d83708119e4b9805fb65f7735ac5b1bbd611b954 commit d83708119e4b9805fb65f7735ac5b1bbd611b954 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri Apr 13 19:40:15 2018 [swarming] variable aliasing strikes again 'lambda: foo(bar.biz)' will not work correctly when inside a loop. This resulted in the same bot being run in the transaction. :( It took me an unreasonable amount of time to figure this out. Log the bots that are updated as dead. Reduce log level otherwise to debug. TBR=iannucci@chromium.org Bug: 826421 Change-Id: Iada8e800e7006742bf074bc140a513202a12d831 Reviewed-on: https://chromium-review.googlesource.com/1012742 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/d83708119e4b9805fb65f7735ac5b1bbd611b954/appengine/swarming/server/bot_management.py |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by mar...@chromium.org
, Apr 3 2018