New issue
Advanced search Search tips

Issue 844151 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 801870

Blocking:
issue 813413



Sign in to add a comment

Swarming: Refactor cache handling; get_state() is miserably slow

Project Member Reported by mar...@chromium.org, May 17 2018

Issue description

It materially impacts throughput. It needs to be optimized.
 
Project Member

Comment 1 by bugdroid1@chromium.org, May 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827

commit e52e972ad8b9ad7d1ca6a863528e5fca41b1a827
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Fri May 18 18:06:02 2018

[client] move DiskCache into local_caching

Decided to follow through with the refactoring, as I want it to be accessible
via higher level tools without shelling into run_isolated.

Intentionally do not refactor too much yet:
- NoMoreSpace is now raised in _remove_lru_file() instead of Error.
- Renamed DiskCache and MemoryCache to the full name with ContentAddressed.

Bug:  844151 
Change-Id: Ie39743ef88a079bb629216446c3083d24684313f
Reviewed-on: https://chromium-review.googlesource.com/1064912
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/cipd.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/isolateserver.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/local_caching.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/run_isolated.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/swarming.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tests/isolateserver_test.py
[add] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tests/local_caching_test.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tests/run_isolated_test.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tests/swarming_test.py
[modify] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tools/isolateserver_load_test.py

Project Member

Comment 2 by bugdroid1@chromium.org, May 18 2018

Comment 3 by mar...@chromium.org, May 22 2018

Blocking: 813413
Summary: Swarming: Refactor cache handling; get_state() is miserably slow (was: Swarming: get_state() is miserably slow)
Project Member

Comment 4 by bugdroid1@chromium.org, May 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/a3c4858261e5708f885cb582cae73d443bc50d28

commit a3c4858261e5708f885cb582cae73d443bc50d28
Author: smut <smut@google.com>
Date: Tue May 22 23:43:29 2018

[Swarming] Remove non-existent named_cache.py reference from the bot code

Removed in 98e957c3aa3d12352bdbb0fdbd8305537b5d034c.

Bug:  844151 
Change-Id: I0538d50efc4de08f2a255ac94d5eb6ce59680736
Reviewed-on: https://chromium-review.googlesource.com/1069716
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: smut <smut@google.com>

[modify] https://crrev.com/a3c4858261e5708f885cb582cae73d443bc50d28/appengine/swarming/server/bot_archive.py
[modify] https://crrev.com/a3c4858261e5708f885cb582cae73d443bc50d28/appengine/swarming/swarming_bot/bot_code/task_runner_test.py

Project Member

Comment 5 by bugdroid1@chromium.org, May 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/8dbbfb53164ea8a395288bd310c729235617474f

commit 8dbbfb53164ea8a395288bd310c729235617474f
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed May 23 00:32:09 2018

[client] First local_caching improvement

- Start converging a common API for named cache and content addressed cache.
- Rename CacheManager to NamedCache, remove open(), change the locking
expectations.
- Locking is more consistent.

Bug:  844151 
Change-Id: I7dddaa2752622f6f3bc3928605a142d75ca6da14
Reviewed-on: https://chromium-review.googlesource.com/1066301
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/local_caching.py
[modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/run_isolated.py
[modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/tests/local_caching_test.py
[modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/tests/run_isolated_test.py

Project Member

Comment 6 by bugdroid1@chromium.org, May 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/2f831b868a05d0b501d34fb5774c6455ae5baede

commit 2f831b868a05d0b501d34fb5774c6455ae5baede
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed May 23 01:04:39 2018

Revert "[client] First local_caching improvement"

This reverts commit 8dbbfb53164ea8a395288bd310c729235617474f.

Reason for revert: It broke task_runner_test.py and it's taking more than 3 minutes to figure out why, so reverting to unblock other CLs.

Original change's description:
> [client] First local_caching improvement
> 
> - Start converging a common API for named cache and content addressed cache.
> - Rename CacheManager to NamedCache, remove open(), change the locking
> expectations.
> - Locking is more consistent.
> 
> Bug:  844151 
> Change-Id: I7dddaa2752622f6f3bc3928605a142d75ca6da14
> Reviewed-on: https://chromium-review.googlesource.com/1066301
> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

TBR=maruel@chromium.org,qyearsley@chromium.org

Change-Id: Ie88725d831d8d7605e308ed12517963e3ae1bca1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  844151 
Reviewed-on: https://chromium-review.googlesource.com/1069001
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/local_caching.py
[modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/run_isolated.py
[modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/tests/local_caching_test.py
[modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/tests/run_isolated_test.py

Project Member

Comment 7 by bugdroid1@chromium.org, May 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8

commit 5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed May 23 18:39:35 2018

[swarming] stop reporting caches in state

This takes too much time on macOS.

R=qyearsley@chromium.org

Bug:  844151 
Change-Id: I9f1b255ddecb42461c238e1da0079924e8e68794
Reviewed-on: https://chromium-review.googlesource.com/1069175
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8/appengine/swarming/swarming_bot/api/os_utilities.py
[modify] https://crrev.com/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8/appengine/swarming/swarming_bot/api/os_utilities_test.py

Comment 8 by mar...@chromium.org, May 23 2018

Labels: -Pri-1 Pri-3
Workaround is live, so the priority can be lowered. There's follow up work to publish the named cache sizes so I'll keep the bug open for now.
Project Member

Comment 10 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/b4c7e829abfe3ac265d1e229d82857b67c687499

commit b4c7e829abfe3ac265d1e229d82857b67c687499
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Jun 13 18:33:07 2018

[client] Refactor the caches to stop using context manager

Move from using with <cache>: context statement to an explicit .trim() call.

The rationale is that the cache cleanup will be done in a *separate process*,
after the task completed to reduce overhead.

Bug:  844151 
Change-Id: I7e9b9020a45186027815b51da73e3815960646cf
Reviewed-on: https://chromium-review.googlesource.com/1066351
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/cipd.py
[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/isolateserver.py
[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/local_caching.py
[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/run_isolated.py
[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/tests/local_caching_test.py
[modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/tests/run_isolated_test.py

Project Member

Comment 11 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/615e5f50799d23981765196ae8f2b298e2b781be

commit 615e5f50799d23981765196ae8f2b298e2b781be
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Wed Jun 13 19:54:58 2018

[swarming] fix test only regression from b4c7e829abfe3ac26

TBR=qyearsley@chromium.org
Bug:  844151 
Change-Id: I7f215f4edefcf71ab72f2d43a8e1012ab13aabe6
Reviewed-on: https://chromium-review.googlesource.com/1099644
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/615e5f50799d23981765196ae8f2b298e2b781be/appengine/swarming/swarming_bot/bot_code/task_runner_test.py

Project Member

Comment 12 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/5f9772d426660cb15c576267dd60fca480ec6b21

commit 5f9772d426660cb15c576267dd60fca480ec6b21
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Jun 14 19:02:00 2018

[client] Update lru.py and test

Add iteritems() and transform().
Remove keys_set(), wasn't necessary. set(lru_dict) works as well.
Rewrite the unit test to be more extensive.

This is in preparation for named cache refactoring. transform() will be used to
upgrade the named cache format.

iteritems() is not strictly needed but is helpful for the tests.

get_timestamp() will be removed later on but keeping this CL simpler for now.

Bug:  844151 
Change-Id: I3ff28d0c19551913fa5327cc7a8a26e806214142
Reviewed-on: https://chromium-review.googlesource.com/1100341
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/local_caching.py
[modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/tests/lru_test.py
[modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/utils/lru.py

Project Member

Comment 13 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/b182fa505b6e0e842761059dd074fee18a7c3586

commit b182fa505b6e0e842761059dd074fee18a7c3586
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Jun 14 19:08:01 2018

[client] cache the named cache size; stop keeping empty caches

Stop keeping empty named cache entries. An empty cache entry doesn't contain
anything, so we do not want this to be a signal (for dimensions) that any
incremental value is present, there isn't.

Named cache enumeration can be excruciably slow, especially on OSX but it's not
super fast on other OSes either.

Instead, when unmapping a named cache, calculate its size and store it in the
LRU data. This way when trying to trim, the size of each entry is already
readily available!

Remove get_recursive_size() for os_utilities. Confirmed it wasn't used. This
function can be surprisingly (Hello OSX!) low so we do not want to encourage
people to do that.

Add back named_cache to the bot state.

Bug:  844151 
Change-Id: If33077e72390ad2e06436886f9958efe538b05dc
Reviewed-on: https://chromium-review.googlesource.com/1100904
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/appengine/swarming/swarming_bot/api/os_utilities.py
[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/appengine/swarming/swarming_bot/api/os_utilities_test.py
[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/local_caching.py
[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/local_caching_test.py
[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/run_isolated_smoke_test.py
[modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/run_isolated_test.py

Status: Fixed (was: Assigned)
Everything should be amazing now (as soon as it's pushed to prod).

Sign in to add a comment