New issue
Advanced search Search tips

Issue 826343 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Delete old BotEvent entities

Project Member Reported by mar...@chromium.org, Mar 27 2018

Issue description

As part of issue 826331, I realized that BotEvent index is fairly large on a few instances. We should delete all the entities older than N months as a cron job.
 
Project Member

Comment 1 by bugdroid1@chromium.org, May 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/5b22a756f9eb6f4add2aa3c7dc99087d87651abb

commit 5b22a756f9eb6f4add2aa3c7dc99087d87651abb
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Tue May 15 20:46:15 2018

[swarming] delete old BotEvent entities

On prod we have 1.5 billion BotEvent entities so the index is starting to be
large.

I don't expect the cron job to keep up much, we'll see. :) It's programmed in a
way to not crash even if it cannot keep up.

Start with a 3 years cutoff to start "slowly".

Do not enable the cron job yet, because of the way cron.yaml gets activated
immediately on upload. :/

Bug:  826343 
Change-Id: Id3baf8754784d08fa84a2327f4f4144d48f85b23
Reviewed-on: https://chromium-review.googlesource.com/1060256
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/5b22a756f9eb6f4add2aa3c7dc99087d87651abb/appengine/swarming/cron.yaml
[modify] https://crrev.com/5b22a756f9eb6f4add2aa3c7dc99087d87651abb/appengine/swarming/handlers_backend.py
[modify] https://crrev.com/5b22a756f9eb6f4add2aa3c7dc99087d87651abb/appengine/swarming/server/bot_management.py
[modify] https://crrev.com/5b22a756f9eb6f4add2aa3c7dc99087d87651abb/appengine/swarming/server/bot_management_test.py

Project Member

Comment 2 by bugdroid1@chromium.org, May 16 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/50d148584dd806b263fe29954006c7998f2a3896

commit 50d148584dd806b263fe29954006c7998f2a3896
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed May 16 15:20:26 2018

[swarming] enable cron job to delete old BotEvent.

It will increase the cost per day a bit but it should be fine.

Bug:  826343 
Change-Id: Id09497e49b4af72ad22620ec5c5a7e5e5627ecc8
Reviewed-on: https://chromium-review.googlesource.com/1061613
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/50d148584dd806b263fe29954006c7998f2a3896/appengine/swarming/cron.yaml

Project Member

Comment 3 by bugdroid1@chromium.org, May 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/9abe9a30922cf741a8ff1591da06b5ec471a486e

commit 9abe9a30922cf741a8ff1591da06b5ec471a486e
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri May 18 00:27:32 2018

[swarming] Change delete_old_bot_events to run every 5 min for 4.5 min

It was configured to run for 9.5 minutes every 10 minutes unsynchronized.
The 9.5 minutes run caused 'Exceeded soft private memory limit of 512 MB with
512 MB' by deleting ~30k entities per run, even if the handler is designed to
use as little memory as possible.

The cron job seems to be able to delete around 60 events/seconds, which is
likely not enough to keep up but that's still better than just accumulating.
Make it synchronized to it runs more for the allocated time. A follow up will be
to delete in pipeline more but the immediate need is to not polute the logs with
error messages.

Bug:  826343 
Change-Id: I71cc2acf5c56e3d6b4861dc375240c4e1fc6f3d7
Reviewed-on: https://chromium-review.googlesource.com/1064723
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/9abe9a30922cf741a8ff1591da06b5ec471a486e/appengine/swarming/cron.yaml
[modify] https://crrev.com/9abe9a30922cf741a8ff1591da06b5ec471a486e/appengine/swarming/server/bot_management.py

Comment 4 by mar...@chromium.org, Jun 12 2018

Status: Fixed (was: Available)
It's running well. It's only evicting >3 years old but the important point is to have something running up continuously.

I'll file a separate bug for other kinds of entities, so that the datastore stays clean and nice.

Sign in to add a comment