isolate: the tar file archiver should bucket more |
|
Issue descriptionThe current state of the tar archiver is that all files below 100kb are tarred. The bucketing level is per input entry listed in the isolate file, so that if "test/data/" is listed as an input in the isolate file, only files under this directory are grouped. If there were a second entry "out/Release/", they would be tarred independently. This generally works great, unless "test/data/" is several GiB in size and that often a single file is updated in this directory, it means that the whole thing is likely to be tarred an uploaded again, which is suboptimal. Goal: Improve the efficiency of incremental upload of large input tree with low (but non-zero) churn rate. This is especially true for layout tests. AIs: - Make the bucketing algorithm use a trade off to favor making more tarfiles, grouped by subdirectories in a relatively deterministic way. - Detect single item tar files and do not tar it. The current code may degenerate in this situation. In fact, groups of less than 4 (?) items should probably never be tarred, it's likely not worth it. References: Selection of files to tar: https://chromium.googlesource.com/infra/luci/luci-go/+/44ec31d1076c4f57e848afd199a8cf9f0ab30af5/client/archiver/tarring_archiver.go#140 Bucketing algorithm: https://chromium.googlesource.com/infra/luci/luci-go/+/44ec31d1076c4f57e848afd199a8cf9f0ab30af5/client/archiver/tar_archiver.go#41 Uploader of the tarred files: https://chromium.googlesource.com/infra/luci/luci-go/+/44ec31d1076c4f57e848afd199a8cf9f0ab30af5/client/archiver/upload_tracker.go#137 |
|
►
Sign in to add a comment |
|