Swarming: delete old entities. |
|
Issue descriptionLet's use: - BotEvent: 2 years - TaskOutputChunk: 2 years - TaskRequest: 3 years.
,
Sep 17
I'll start with: - TaskRequest: 4 years and will decide to probably trim more aggressively afterwards. We'll want to trim the breadcrumbs for deleted bots too, which is not done right now; this includes BotRoot, BotDimensions, BotSettings, etc
,
Sep 18
Also found: - Bot - File - FileChunk - MachineWhitelist
,
Sep 18
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/028b98df7254d0bd5d4ca58e83d23613e8aae617 commit 028b98df7254d0bd5d4ca58e83d23613e8aae617 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Sep 18 17:27:10 2018 [swarming] Trim BotEvent older than 2 years Used to be 3 years. Now that the backlog is done, it's fine to trim more. We don't expect history older than 2 years to be of much value. This will permit deleting hundreds of millions of entities. Improve unit test to confirm that the right instance was deleted, instead of just counting the number of entities. Reorganize the cron code a little bit to keep less code inside the try/finally. Stop using BotEvents as plural in the comments. Bug: 884579 Change-Id: I9107696127e9623c222d0b3fe58b436e70ea9d3c Reviewed-on: https://chromium-review.googlesource.com/1227034 Reviewed-by: Jao-ke Chin-Lee <jchinlee@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/028b98df7254d0bd5d4ca58e83d23613e8aae617/appengine/swarming/cron.yaml [modify] https://crrev.com/028b98df7254d0bd5d4ca58e83d23613e8aae617/appengine/swarming/server/bot_management.py [modify] https://crrev.com/028b98df7254d0bd5d4ca58e83d23613e8aae617/appengine/swarming/server/bot_management_test.py [modify] https://crrev.com/028b98df7254d0bd5d4ca58e83d23613e8aae617/appengine/swarming/server/task_queues.py
,
Sep 18
These are done: - chromium-swarm-dev - omnibot-legion-swarming-server - omnibot-swarming-server
,
Sep 19
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/c5c026e79addda6ef5576480fc8e7772f109a9e6 commit c5c026e79addda6ef5576480fc8e7772f109a9e6 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Sep 19 00:18:28 2018 [swarming] reduce error->warning for rebuild-task-cache failure This task queue can fail when Cloud DB decides that there's too much contention. This is not an hard failure, the task will be retried. Make the task queue return a 503 instead of 500; since the service never returns 503 by itself, it permits knowing that it was not a real failure, and the only reason this is retried is so that the Task engine retries the task. Bug: 884579 Change-Id: If2fa4af760d7427695ef78700891411ddb4acbc9 Reviewed-on: https://chromium-review.googlesource.com/1227035 Reviewed-by: Jao-ke Chin-Lee <jchinlee@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/c5c026e79addda6ef5576480fc8e7772f109a9e6/appengine/swarming/handlers_backend.py [modify] https://crrev.com/c5c026e79addda6ef5576480fc8e7772f109a9e6/appengine/swarming/server/task_queues.py
,
Sep 19
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/332fcc9ee863e24410e4ff25e699c319496c792d commit 332fcc9ee863e24410e4ff25e699c319496c792d Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Sep 19 19:15:26 2018 [swarming] Add code to trim old tasks and old TaskOutputChunk It's starting to be in tens of terrabytes, it's worth deleting old junk. This code is not yet enabled, as cron.yaml is not touched. This will be done in a follow up. Includes unit test that ensures the right entity is deleted. Bug: 884579 Change-Id: I2fbc906231cd97df0277fce256247c8a8e06fe90 Reviewed-on: https://chromium-review.googlesource.com/1227036 Reviewed-by: Jao-ke Chin-Lee <jchinlee@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/332fcc9ee863e24410e4ff25e699c319496c792d/appengine/swarming/handlers_backend.py [modify] https://crrev.com/332fcc9ee863e24410e4ff25e699c319496c792d/appengine/swarming/server/task_request.py [modify] https://crrev.com/332fcc9ee863e24410e4ff25e699c319496c792d/appengine/swarming/server/task_request_test.py [modify] https://crrev.com/332fcc9ee863e24410e4ff25e699c319496c792d/appengine/swarming/server/task_result.py [modify] https://crrev.com/332fcc9ee863e24410e4ff25e699c319496c792d/appengine/swarming/server/task_result_test.py
,
Nov 24
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/ca6c1941d80eabbd9aa12f671ced3bf7d13370f6 commit ca6c1941d80eabbd9aa12f671ced3bf7d13370f6 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Sat Nov 24 19:01:32 2018 [swarming] Properly delete old tasks. The previous code deleted *recent* tasks, and it was testing for this incorrect behavior. Oops. Use TaskRequest.created_ts instead of TaskRequest.key, as they key ordering is not what we want. R=qyearsley@chromium.org Bug: 884579 Change-Id: I4212e414e6a681acd6ccd16be73630ce1a6e5a4b Reviewed-on: https://chromium-review.googlesource.com/c/1349375 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/ca6c1941d80eabbd9aa12f671ced3bf7d13370f6/appengine/swarming/server/task_request.py [modify] https://crrev.com/ca6c1941d80eabbd9aa12f671ced3bf7d13370f6/appengine/swarming/server/task_request_test.py
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/88bedbacf1d85879deb91584323efab33aadc881 commit 88bedbacf1d85879deb91584323efab33aadc881 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Nov 28 18:38:23 2018 [swarming] enable cron jobs to trim TaskRequest and TaskOutputChunk Respectively: /internal/cron/delete_old_tasks /internal/cron/delete_old_task_output_chunks This is now safe as all instances had their versions prior 3880-ca6c194 were deleted. Bug: 884579 Change-Id: I083dab9d84fb3ff550fe748a51dd54be38effbe0 Reviewed-on: https://chromium-review.googlesource.com/c/1351431 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/88bedbacf1d85879deb91584323efab33aadc881/appengine/swarming/cron.yaml
,
Nov 29
Lowering to: - BotEvent: 1 year - TaskRequest entity group: 18 months This removes the need for a TaskOutput specific cron job.
,
Dec 3
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/43f034525fad4492b85245b05acd9d68671abe4e commit 43f034525fad4492b85245b05acd9d68671abe4e Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Mon Dec 03 14:21:43 2018 [swarming] Remove TaskOutput removal cron job It is not needed, as TaskRequest entity group will be kept for only 18 months. It is safe to remove the handler in the same CL than cron.yaml, as uploading a version with the trimmed cron.yaml takes effect immediately, even if the default version is not bumped. Bug: 884579 Change-Id: Iebb2094d299ccce202e32b3778de782486c639aa Reviewed-on: https://chromium-review.googlesource.com/c/1354365 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/43f034525fad4492b85245b05acd9d68671abe4e/appengine/swarming/cron.yaml [modify] https://crrev.com/43f034525fad4492b85245b05acd9d68671abe4e/appengine/swarming/handlers_backend.py [modify] https://crrev.com/43f034525fad4492b85245b05acd9d68671abe4e/appengine/swarming/server/task_result.py [modify] https://crrev.com/43f034525fad4492b85245b05acd9d68671abe4e/appengine/swarming/server/task_result_test.py
,
Dec 3
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/d18f6439bb30f910083396acc220412f40ae13d8 commit d18f6439bb30f910083396acc220412f40ae13d8 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Mon Dec 03 22:23:52 2018 [swarming] Lower task and BotEvent retention Lower BotEvent retention to 12 months. In practice we don't even need this much. Lower TaskRequest entity groups retention to 18 months. This subsumes the TaskOutput retention policy; no need to trim stdout on a shorter period, at least for now. Bug: 884579 Change-Id: I67e1fea2f4356c38e79511ae7998308b5aeb4d78 Reviewed-on: https://chromium-review.googlesource.com/c/1354364 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/d18f6439bb30f910083396acc220412f40ae13d8/appengine/swarming/server/bot_management.py [modify] https://crrev.com/d18f6439bb30f910083396acc220412f40ae13d8/appengine/swarming/server/task_request.py
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/d2a77d1006ed33d803e2292e15738b66e8944b24 commit d2a77d1006ed33d803e2292e15738b66e8944b24 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Dec 05 18:06:52 2018 [swarming] Delete old bot; Log timestamp for entities being deleted The logging will help to know if the cron job are keeping up with the creation rate so that all stale entities are deleted within the expected time frame. This is done for both TaskRequest and for BotEvent. Deleting old bots is important with Machine Provider created VMs otherwise cruft just accumulates. We want to keep a clean and tidy DB. :) The cron job will be added once this code is deployed everywhere. Bug: 884579 Change-Id: I6fd0d7526446e7439273c9f706d14ead86817d7a Reviewed-on: https://chromium-review.googlesource.com/c/1363252 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/d2a77d1006ed33d803e2292e15738b66e8944b24/appengine/swarming/handlers_backend.py [modify] https://crrev.com/d2a77d1006ed33d803e2292e15738b66e8944b24/appengine/swarming/server/bot_management.py [modify] https://crrev.com/d2a77d1006ed33d803e2292e15738b66e8944b24/appengine/swarming/server/bot_management_test.py [modify] https://crrev.com/d2a77d1006ed33d803e2292e15738b66e8944b24/appengine/swarming/server/task_request.py [modify] https://crrev.com/d2a77d1006ed33d803e2292e15738b66e8944b24/appengine/swarming/server/task_request_test.py
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/52977ce45e492a3681b368d202466aa840dbc016 commit 52977ce45e492a3681b368d202466aa840dbc016 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Dec 05 22:52:02 2018 [swarming] Be tolerant to inconsistent index This is to handle new logging code in d2a77d1006ed33d803e2292e15738b66e8944b24. Bug: 884579 Change-Id: I7b688e3d489f0fe3c96eba6b7a90c438d7e5473a Reviewed-on: https://chromium-review.googlesource.com/c/1363534 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/52977ce45e492a3681b368d202466aa840dbc016/appengine/swarming/server/bot_management.py
,
Dec 6
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/d00338cf557c4c79f5665d0fca3763b4cfe5d9dc commit d00338cf557c4c79f5665d0fca3763b4cfe5d9dc Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Dec 06 17:12:10 2018 [swarming] Fine tune the logging added in d2a77d1006ed33d803e22 Log the delta, so we can see how far behind the cron job is on production without having to calculate it manually. Bug: 884579 Change-Id: Id0f82fee778f0260cbce17aeeb76d299a777ff52 Reviewed-on: https://chromium-review.googlesource.com/c/1365354 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/d00338cf557c4c79f5665d0fca3763b4cfe5d9dc/appengine/swarming/server/bot_management.py [modify] https://crrev.com/d00338cf557c4c79f5665d0fca3763b4cfe5d9dc/appengine/swarming/server/task_request.py
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/943fd59b5f2cab15bb7153a7a52c86f70a017675 commit 943fd59b5f2cab15bb7153a7a52c86f70a017675 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 11 15:05:51 2018 [swarming] enable cron job to delete stale bot Bug: 884579 Change-Id: I009f1f755de80996e00b5f7605d75eee3420867a Reviewed-on: https://chromium-review.googlesource.com/c/1365353 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/943fd59b5f2cab15bb7153a7a52c86f70a017675/appengine/swarming/cron.yaml |
|
►
Sign in to add a comment |
|
Comment 1 by mar...@chromium.org
, Sep 17