Swarming: implement tarfile support for isolated archive python code |
|||||
Issue descriptionhttps://ci.chromium.org/buildbot/tryserver.chromium.android/android_n5x_swarming_rel/380001 is an example, but I've seen other examples as well. I'm specifically looking at telemetry_perf_unittests in this build. The longest task takes 25 minutes, which is the execution timeout for each task. The collect step takes 6 minutes though; I'm not exactly sure why? I did a logdog cat of the log, and it's about 48 MB. Which is large for a log, but I don't know why it'd take *that* long. None of the tasks really create many output files, so I don't know why it'd be hard to collect them. I can take a look more. Not sure how impactful "fixing" this would be as well.
,
Apr 9 2018
Ok, after looking at this a bit, I think this is because layout tests are creating tons of output files. https://chromium-swarm.appspot.com/task?id=3cb07b6feb0aed10&refresh=10&show_raw=1 is an example task (page takes a while to load). It generates 10,000 new isolate uploads, and has an 11 minute overhead; more than the actual test runtime. This doesn't seem widespread thankfully; looks like the test itself just broke a bunch of stuff and generated a lot of logs. It's a pathological case that we should probably guard against, but not majorly impacting the CQ.
,
Apr 10 2018
The bot uses the python isolate implementation (instead of the Go one) and it is known to scale much poorly compared to the Go one. We could probably implement the accelerations (tar files mainly) which would give a good improvement to be able to, at least, keep it under relative control.
,
Jun 15 2018
I can do a reasonably performing one in python without spending too much time on that so at least we are not blocked on rewriting the world.
,
Nov 20
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/20f418d883556f6089d4f280cf38febfabea066f commit 20f418d883556f6089d4f280cf38febfabea066f Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Nov 20 19:48:53 2018 [client] internal refactoring adding ServerRef - Create isolate_storage.ServerRef instead of passing the isolate server base URL and its namespace, then make assumptions based on that. - This causes a fair amount of churn in the unit tests due to mocks but there shouldn't be externally visible impact. - Reduce usage of isolated_format.SUPPORTED_ALGOS_REVERSE. - Touch appengine/swarming/README.md to force the unit tests there to run. - Fix macOS failure in isolate_test.py and swarming_test.py due to python symlink (unrelated with the rest). Change-Id: Ice201231aa1959043dd816256f88b16ba323ec2d Bug: 825418 Reviewed-on: https://chromium-review.googlesource.com/c/1343298 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/appengine/swarming/README.md [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/isolate.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/isolate_storage.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/isolated_format.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/isolateserver.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/run_isolated.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/swarming.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/isolate_storage_test.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/isolate_test.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/isolated_format_test.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/isolateserver_test.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/run_isolated_test.py [modify] https://crrev.com/20f418d883556f6089d4f280cf38febfabea066f/client/tests/swarming_test.py
,
Nov 20
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/55064a029241876f089b3f7b813c5899e95981c6 commit 55064a029241876f089b3f7b813c5899e95981c6 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Nov 28 20:47:29 2018 [client] reduce the duplication of logic in isolateserver.py Splitting this as a separate CL because this is fairly tricky code: - To be able to add automatic tarring, I need to first reduce the duplicate code inside the uploading paths. - To accelerate the processing, we need to stop serializing the list and instead switch the inputs to be processed as a generator. Changes: - upload_tree() was removed, relevant parts pushed to isolate.py as _process_infiles(). - archive_files_to_storage() is now the base function. - Make archive_files_to_storage() accept an iterator and work with duplicates. - Extract directory logic from archive_files_to_storage() into _enqueue_dir(), that's likely where the tarring will be done. - inline archive(). Bug: 825418 Change-Id: I462a7f433ca013428fe377b4ca4d23cf8ad5d85b Reviewed-on: https://chromium-review.googlesource.com/c/1345190 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/appengine/swarming/README.md [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/isolate.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/isolateserver.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/run_isolated.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/tests/httpserver_mock.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/tests/isolate_test.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/tests/isolateserver_mock.py [modify] https://crrev.com/55064a029241876f089b3f7b813c5899e95981c6/client/tests/isolateserver_test.py
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/e61b23263a39b569cd655ff2a915d85d2b0d4bed commit e61b23263a39b569cd655ff2a915d85d2b0d4bed Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Nov 28 20:50:40 2018 [client] Convert the upload to be pipelined; various performance tuning Originally the full 'items' list had to be materialized before uploading. upload_items() now starts the lookup as items are processed, which will reduce the overall latency. Comment out logging and asserts in deep functions to help with performance. Intentionally leave the logging lines in, as this could be useful for follow up debugging. I do expect a significant performance boost by this CL, but not on the order of magnitude that tarring will do. But both will combine, so this is for the best. Mark many functions as private. After this the tarring will be much more efficient. Bug: 825418 Change-Id: I7a470ae96b2b93c46459cf96e919b814ba264baf Reviewed-on: https://chromium-review.googlesource.com/c/1348810 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/e61b23263a39b569cd655ff2a915d85d2b0d4bed/client/isolate.py [modify] https://crrev.com/e61b23263a39b569cd655ff2a915d85d2b0d4bed/client/isolate_storage.py [modify] https://crrev.com/e61b23263a39b569cd655ff2a915d85d2b0d4bed/client/isolateserver.py [modify] https://crrev.com/e61b23263a39b569cd655ff2a915d85d2b0d4bed/client/tests/isolateserver_test.py [modify] https://crrev.com/e61b23263a39b569cd655ff2a915d85d2b0d4bed/client/utils/file_path.py
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/df4bf3d52e3911f5415bf9613735b4811ef94fff commit df4bf3d52e3911f5415bf9613735b4811ef94fff Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Nov 28 21:00:41 2018 [client] Make TaskChannel usable as a generator For a class to be usable as a python generator, it must have: - __iter__() - next() I added __iter__() and renamed pull() to next(). This will be useful as a multithreaded python generator. Splitting as a separate CL to make it easier to review. Includes relevant unit test. Bug: 825418 Change-Id: Ic2e8a288a991dc214dea59e9533010679d0fde21 Reviewed-on: https://chromium-review.googlesource.com/c/1349377 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/df4bf3d52e3911f5415bf9613735b4811ef94fff/client/isolateserver.py [modify] https://crrev.com/df4bf3d52e3911f5415bf9613735b4811ef94fff/client/swarming.py [modify] https://crrev.com/df4bf3d52e3911f5415bf9613735b4811ef94fff/client/tests/isolateserver_test.py [modify] https://crrev.com/df4bf3d52e3911f5415bf9613735b4811ef94fff/client/tests/threading_utils_test.py [modify] https://crrev.com/df4bf3d52e3911f5415bf9613735b4811ef94fff/client/utils/threading_utils.py
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/637736ab96e7bc28d3f8bcce4e96a291869a03fc commit 637736ab96e7bc28d3f8bcce4e96a291869a03fc Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Wed Nov 28 21:05:01 2018 [client] Further use generator during enumeration Leverage that TaskChannel can now be used as a multithreaded python generator. This permits to send inputs to /contains RPC before the full isolated is rasterized, which helps with reducing overall latency. This change, coupled with the 3 previous ones results in 10 seconds saving on a Windows 7 VM on a 2658 files/639MiB dataset. The tarring should live between expand_directories_and_symlinks() and _directory_to_metadata(). Bug: 825418 Change-Id: I9321250ade2a3838a7009351fc06295ad1e23399 Reviewed-on: https://chromium-review.googlesource.com/c/1349376 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/isolate.py [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/isolated_format.py [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/isolateserver.py [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/tests/isolate_test.py [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/tests/isolateserver_test.py [modify] https://crrev.com/637736ab96e7bc28d3f8bcce4e96a291869a03fc/client/tests/run_isolated_test.py
,
Nov 29
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/1b757e2e26f97ea01ae893588c0f0ae84baff11f commit 1b757e2e26f97ea01ae893588c0f0ae84baff11f Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Nov 29 23:47:34 2018 [client] Fix symlink bug That happens only on task output with a symlink. Bug: 825418 Change-Id: I0412d7c2438b3a14a19712709c1f98ccab5109a5 Reviewed-on: https://chromium-review.googlesource.com/c/1355723 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/1b757e2e26f97ea01ae893588c0f0ae84baff11f/client/isolateserver.py
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/0f56bf605063abd99898e162cdf3db5f08b1be9b commit 0f56bf605063abd99898e162cdf3db5f08b1be9b Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri Nov 30 14:05:45 2018 [client] Fix a crash when no entry was processed Previously upload_items() would early exit if there is no item to upload. Now that items is a python generator instead of a list, there's no way to easily determine if the generator is going to be empty (it could be done but it's not worth the trouble). When there's no item, it causes a divide by zero. Oops. Include unit test to confirm the fix. TBR'ing because the breakage is live on production. TBR=qyearsley@chromium.org Bug: 825418 Change-Id: If35bb30bd4eed14c95005bb965e60e365ca408bf Reviewed-on: https://chromium-review.googlesource.com/c/1355725 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/0f56bf605063abd99898e162cdf3db5f08b1be9b/client/isolateserver.py [modify] https://crrev.com/0f56bf605063abd99898e162cdf3db5f08b1be9b/client/tests/isolateserver_test.py
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/1c28ee319deefabdc27364b5223198e492647782 commit 1c28ee319deefabdc27364b5223198e492647782 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Fri Nov 30 18:53:21 2018 [client] Add unit test for symlink bug A fix was added to 1b757e2e26f97ea01ae893588c0f0ae84baff11f but no unit test was added. Add one so this doesn't regress. Test only change. R=qyearsley@chromium.org Bug: 825418 Change-Id: Ib5571c57d297f211e29509ab0b351996b7aeba19 Reviewed-on: https://chromium-review.googlesource.com/c/1357019 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/1c28ee319deefabdc27364b5223198e492647782/client/tests/isolateserver_test.py
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0 commit ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 04 16:35:18 2018 [client] Small improvements No real functional change. - save_isolated() doesn't return anything. This was done this way because it was thought that eventually it would split automatically in isolated layers as a performance optimization. In practice we used tarring instead. - Make expand_symlinks() private. - Make create_symlinks() and get_zip_compression_level() private. - Change some tests to use package 'fs' instead of 'os'. - Prepare the replacement of FakeItem with BufferItem in isolateserver_test.py. Bug: 825418 Change-Id: I095ede8ae43a550848c85c3f7ea2b3a48b11503b Reviewed-on: https://chromium-review.googlesource.com/c/1358571 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0/client/isolate.py [modify] https://crrev.com/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0/client/isolated_format.py [modify] https://crrev.com/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0/client/isolateserver.py [modify] https://crrev.com/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0/client/tests/isolated_format_test.py [modify] https://crrev.com/ebc22e5559fe0fe0f6268baf851ad5bf5aeb7fb0/client/tests/isolateserver_test.py
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/36c824fc898f61fbe127b54897df1d329060e0ef commit 36c824fc898f61fbe127b54897df1d329060e0ef Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 04 17:39:22 2018 [client] Move expand_directories_and_symlinks() Move it from isolated_format.py to isolate.py as it's the only place it's used. No functional change, splitting in a separate CL to ease reviewing. Bug: 825418 Change-Id: I446ba5deed84a4ef1ee2b2d96e15ae76f66ac5ab Reviewed-on: https://chromium-review.googlesource.com/c/1358671 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/36c824fc898f61fbe127b54897df1d329060e0ef/client/isolate.py [modify] https://crrev.com/36c824fc898f61fbe127b54897df1d329060e0ef/client/isolated_format.py [modify] https://crrev.com/36c824fc898f61fbe127b54897df1d329060e0ef/client/tests/isolate_test.py [modify] https://crrev.com/36c824fc898f61fbe127b54897df1d329060e0ef/client/tests/isolated_format_test.py
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700 commit 07a3f1459c75ecc31c978ab8a8db7ecff6cb1700 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 04 18:01:05 2018 [client] Remove timestamp from file metadata Remove the 'prev_dict' from isolated_format.py file_to_metadata(). This existed to accelerate isolate.py archival. The problem is that it was done at the wrong layer! The performance optimization should have been done at the isolated layer, not isolate (which is too high). In practice almost no client use isolate.py archive, and this tool is going away. Since isolated doesn't have this performance optimization, this is not a regression. The other reason for this to be done is that hashing will be done lazily in a follow up CL, as when the file is included in a tarball, it is completely useless to hash individual files. This CL is split to simplify reviewability. Bug: 825418 Change-Id: Ic033dace3244b470f722b119eefec6088ea16a19 Reviewed-on: https://chromium-review.googlesource.com/c/1358672 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/isolate.py [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/isolated_format.py [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/isolateserver.py [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/tests/isolate_smoke_test.py [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/tests/isolate_test.py [modify] https://crrev.com/07a3f1459c75ecc31c978ab8a8db7ecff6cb1700/client/tests/isolated_format_test.py
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/7560565e15538cfca0387172aeb4123110cc20a1 commit 7560565e15538cfca0387172aeb4123110cc20a1 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 04 18:30:33 2018 [client] lazy hash and improve symlink processing - Stop hashing in file_to_metadata(). The hashing must be delayed for tarballs, as only the hash of the whole tarball is needed, not the hash of each of its individual files. - Small performance improvement for symlinks, to not call stat() twice on them. This is still the case for other files. :/ - Make isolate_storage.Item more 'immutable'. This is to help with data flow. - Make isolate_storage.Item know how to calculate its own digest. This is needed for tarfiles, as they won't be written to disk, nor be fully kept in memory. This removes the need for prepare(). - As a small performance improvement, do not saves isolated files created for directory to disk anymore. This should be fine even on 32-bit process, as the size and number of isolated files is generally small. This code is the final (!) preparation for adding tarball support to python isolated upload. Bug: 825418 Change-Id: I9e68cb2b2b9cf7bd9c3d099c32168f0414b3c14a Reviewed-on: https://chromium-review.googlesource.com/c/1358673 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/isolate.py [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/isolate_storage.py [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/isolated_format.py [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/isolateserver.py [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/tests/isolated_format_test.py [modify] https://crrev.com/7560565e15538cfca0387172aeb4123110cc20a1/client/tests/isolateserver_test.py
,
Dec 4
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/61640467dea4aff6609f5aa121f7b92c62c527e1 commit 61640467dea4aff6609f5aa121f7b92c62c527e1 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Dec 04 22:37:05 2018 [client] Add tarring code but keep it disabled Add new code to generate tar balls but do not enable the code. This is because the Go client can upload tarballs but cannot download (!!) them. Sigh. - Only do it for directory upload, not for individual files. This is similar to how the Go code does. - Only batch if a minimum of 5 files can be batched. - Batch up to 100,000 of data. - Use threaded tar file content generator, because there's no other way to do in python. - Add a unit test. Bug: 825418 Change-Id: Ic8cf089d0c3c65610aa67d0893eab087ffe158d8 Reviewed-on: https://chromium-review.googlesource.com/c/1357020 Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Robbie Iannucci <iannucci@chromium.org> [modify] https://crrev.com/61640467dea4aff6609f5aa121f7b92c62c527e1/client/isolateserver.py [modify] https://crrev.com/61640467dea4aff6609f5aa121f7b92c62c527e1/client/tests/isolateserver_test.py |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by martiniss@chromium.org
, Apr 9 2018