Swarming: Refactor cache handling; get_state() is miserably slow |
||||
Issue descriptionIt materially impacts throughput. It needs to be optimized.
,
May 18 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/98e957c3aa3d12352bdbb0fdbd8305537b5d034c commit 98e957c3aa3d12352bdbb0fdbd8305537b5d034c Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri May 18 19:12:43 2018 [client] Move named_cache into local_caching Now all the bot local caches are in a single python file. I'll refactor in a follow up, not refactoring to not make the CL more complex. Bug: 844151 Change-Id: I82b6fc17d903d9f816091a75d28f863fae8f405e Reviewed-on: https://chromium-review.googlesource.com/1065527 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/98e957c3aa3d12352bdbb0fdbd8305537b5d034c/client/local_caching.py [delete] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/named_cache.py [modify] https://crrev.com/98e957c3aa3d12352bdbb0fdbd8305537b5d034c/client/run_isolated.py [modify] https://crrev.com/98e957c3aa3d12352bdbb0fdbd8305537b5d034c/client/tests/local_caching_test.py [delete] https://crrev.com/e52e972ad8b9ad7d1ca6a863528e5fca41b1a827/client/tests/named_cache_test.py [modify] https://crrev.com/98e957c3aa3d12352bdbb0fdbd8305537b5d034c/client/tests/run_isolated_test.py
,
May 22 2018
,
May 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/a3c4858261e5708f885cb582cae73d443bc50d28 commit a3c4858261e5708f885cb582cae73d443bc50d28 Author: smut <smut@google.com> Date: Tue May 22 23:43:29 2018 [Swarming] Remove non-existent named_cache.py reference from the bot code Removed in 98e957c3aa3d12352bdbb0fdbd8305537b5d034c. Bug: 844151 Change-Id: I0538d50efc4de08f2a255ac94d5eb6ce59680736 Reviewed-on: https://chromium-review.googlesource.com/1069716 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: smut <smut@google.com> [modify] https://crrev.com/a3c4858261e5708f885cb582cae73d443bc50d28/appengine/swarming/server/bot_archive.py [modify] https://crrev.com/a3c4858261e5708f885cb582cae73d443bc50d28/appengine/swarming/swarming_bot/bot_code/task_runner_test.py
,
May 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/8dbbfb53164ea8a395288bd310c729235617474f commit 8dbbfb53164ea8a395288bd310c729235617474f Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed May 23 00:32:09 2018 [client] First local_caching improvement - Start converging a common API for named cache and content addressed cache. - Rename CacheManager to NamedCache, remove open(), change the locking expectations. - Locking is more consistent. Bug: 844151 Change-Id: I7dddaa2752622f6f3bc3928605a142d75ca6da14 Reviewed-on: https://chromium-review.googlesource.com/1066301 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/local_caching.py [modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/run_isolated.py [modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/tests/local_caching_test.py [modify] https://crrev.com/8dbbfb53164ea8a395288bd310c729235617474f/client/tests/run_isolated_test.py
,
May 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/2f831b868a05d0b501d34fb5774c6455ae5baede commit 2f831b868a05d0b501d34fb5774c6455ae5baede Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed May 23 01:04:39 2018 Revert "[client] First local_caching improvement" This reverts commit 8dbbfb53164ea8a395288bd310c729235617474f. Reason for revert: It broke task_runner_test.py and it's taking more than 3 minutes to figure out why, so reverting to unblock other CLs. Original change's description: > [client] First local_caching improvement > > - Start converging a common API for named cache and content addressed cache. > - Rename CacheManager to NamedCache, remove open(), change the locking > expectations. > - Locking is more consistent. > > Bug: 844151 > Change-Id: I7dddaa2752622f6f3bc3928605a142d75ca6da14 > Reviewed-on: https://chromium-review.googlesource.com/1066301 > Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> > Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> TBR=maruel@chromium.org,qyearsley@chromium.org Change-Id: Ie88725d831d8d7605e308ed12517963e3ae1bca1 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 844151 Reviewed-on: https://chromium-review.googlesource.com/1069001 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/local_caching.py [modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/run_isolated.py [modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/tests/local_caching_test.py [modify] https://crrev.com/2f831b868a05d0b501d34fb5774c6455ae5baede/client/tests/run_isolated_test.py
,
May 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8 commit 5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed May 23 18:39:35 2018 [swarming] stop reporting caches in state This takes too much time on macOS. R=qyearsley@chromium.org Bug: 844151 Change-Id: I9f1b255ddecb42461c238e1da0079924e8e68794 Reviewed-on: https://chromium-review.googlesource.com/1069175 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8/appengine/swarming/swarming_bot/api/os_utilities.py [modify] https://crrev.com/5e4be64da4f1ef9701ee9b6ed2143a1ebe339ba8/appengine/swarming/swarming_bot/api/os_utilities_test.py
,
May 23 2018
Workaround is live, so the priority can be lowered. There's follow up work to publish the named cache sizes so I'll keep the bug open for now.
,
May 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/6e39527c5e25e1eb72b2ffea473039ebd2e81d01 commit 6e39527c5e25e1eb72b2ffea473039ebd2e81d01 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri May 25 16:36:42 2018 Reapply "[client] First local_caching improvement" This reverts https://chromium-review.googlesource.com/1069001 Bug: 844151 Change-Id: Ia6b7fa8e53416d082e73207fee105f9149c89309 Reviewed-on: https://chromium-review.googlesource.com/1072547 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/6e39527c5e25e1eb72b2ffea473039ebd2e81d01/appengine/swarming/swarming_bot/bot_code/task_runner_test.py [modify] https://crrev.com/6e39527c5e25e1eb72b2ffea473039ebd2e81d01/client/local_caching.py [modify] https://crrev.com/6e39527c5e25e1eb72b2ffea473039ebd2e81d01/client/run_isolated.py [modify] https://crrev.com/6e39527c5e25e1eb72b2ffea473039ebd2e81d01/client/tests/local_caching_test.py [modify] https://crrev.com/6e39527c5e25e1eb72b2ffea473039ebd2e81d01/client/tests/run_isolated_test.py
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/b4c7e829abfe3ac265d1e229d82857b67c687499 commit b4c7e829abfe3ac265d1e229d82857b67c687499 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Jun 13 18:33:07 2018 [client] Refactor the caches to stop using context manager Move from using with <cache>: context statement to an explicit .trim() call. The rationale is that the cache cleanup will be done in a *separate process*, after the task completed to reduce overhead. Bug: 844151 Change-Id: I7e9b9020a45186027815b51da73e3815960646cf Reviewed-on: https://chromium-review.googlesource.com/1066351 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/cipd.py [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/isolateserver.py [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/local_caching.py [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/run_isolated.py [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/tests/local_caching_test.py [modify] https://crrev.com/b4c7e829abfe3ac265d1e229d82857b67c687499/client/tests/run_isolated_test.py
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/615e5f50799d23981765196ae8f2b298e2b781be commit 615e5f50799d23981765196ae8f2b298e2b781be Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Jun 13 19:54:58 2018 [swarming] fix test only regression from b4c7e829abfe3ac26 TBR=qyearsley@chromium.org Bug: 844151 Change-Id: I7f215f4edefcf71ab72f2d43a8e1012ab13aabe6 Reviewed-on: https://chromium-review.googlesource.com/1099644 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/615e5f50799d23981765196ae8f2b298e2b781be/appengine/swarming/swarming_bot/bot_code/task_runner_test.py
,
Jun 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/5f9772d426660cb15c576267dd60fca480ec6b21 commit 5f9772d426660cb15c576267dd60fca480ec6b21 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Jun 14 19:02:00 2018 [client] Update lru.py and test Add iteritems() and transform(). Remove keys_set(), wasn't necessary. set(lru_dict) works as well. Rewrite the unit test to be more extensive. This is in preparation for named cache refactoring. transform() will be used to upgrade the named cache format. iteritems() is not strictly needed but is helpful for the tests. get_timestamp() will be removed later on but keeping this CL simpler for now. Bug: 844151 Change-Id: I3ff28d0c19551913fa5327cc7a8a26e806214142 Reviewed-on: https://chromium-review.googlesource.com/1100341 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/local_caching.py [modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/tests/lru_test.py [modify] https://crrev.com/5f9772d426660cb15c576267dd60fca480ec6b21/client/utils/lru.py
,
Jun 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/b182fa505b6e0e842761059dd074fee18a7c3586 commit b182fa505b6e0e842761059dd074fee18a7c3586 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Jun 14 19:08:01 2018 [client] cache the named cache size; stop keeping empty caches Stop keeping empty named cache entries. An empty cache entry doesn't contain anything, so we do not want this to be a signal (for dimensions) that any incremental value is present, there isn't. Named cache enumeration can be excruciably slow, especially on OSX but it's not super fast on other OSes either. Instead, when unmapping a named cache, calculate its size and store it in the LRU data. This way when trying to trim, the size of each entry is already readily available! Remove get_recursive_size() for os_utilities. Confirmed it wasn't used. This function can be surprisingly (Hello OSX!) low so we do not want to encourage people to do that. Add back named_cache to the bot state. Bug: 844151 Change-Id: If33077e72390ad2e06436886f9958efe538b05dc Reviewed-on: https://chromium-review.googlesource.com/1100904 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/appengine/swarming/swarming_bot/api/os_utilities.py [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/appengine/swarming/swarming_bot/api/os_utilities_test.py [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/local_caching.py [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/local_caching_test.py [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/run_isolated_smoke_test.py [modify] https://crrev.com/b182fa505b6e0e842761059dd074fee18a7c3586/client/tests/run_isolated_test.py
,
Jun 14 2018
Everything should be amazing now (as soon as it's pushed to prod). |
||||
►
Sign in to add a comment |
||||
Comment 1 by bugdroid1@chromium.org
, May 18 2018