New issue
Advanced search Search tips

Issue 912902 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Chrome , Mac
Pri: 3
Type: Bug

Blocking:
issue 877044



Sign in to add a comment

Investigate zstd for JS source string compression

Project Member Reported by lizeb@chromium.org, Dec 7

Issue description

zstd is faster and/or smaller than zlib, per online benchmarks (see https://quixdb.github.io/squash-benchmark/unstable/ for instance).

Investigate it for JS source string compression.
 
Cc: pasko@chromium.org
From a local test, on Linux (Xeon "broadwell" @3.2GHz).
Below:
- js-data.txt is a 113MB file containing JS data obtained from the web, see https://bugs.chromium.org/p/chromium/issues/detail?id=907489 for details (same corpus)
- zstd is compiled with the default options, straight from https://github.com/facebook/zstd


tl;dr: ztsd is faster than zlib, both for compression and decompression. It can also give better compression ratios while still being faster than zlib.
For compression:
- zstd level 3: 0.818s, 30.01%
- zlib level 6: 3.933s, 31.22%

For decompression:
- zstd level 3: 0.227s
- zlib level 6: 0.503s



# Compression
## zstd

$ for i in `seq 1 9`; do printf "\n\nCompression Setting: $i\nCompression Ratio = "; rm -f ~/js-data.txt.zstd && time (./zstd  ~/js-data.txt -o ~/js-data.txt.zstd -$i) 2>&1 | egrep
'%|real' | sed -e 's/.* : //;s/%.*/%/'; done


Compression Setting: 1
Compression Ratio = 34.88%

real    0m0.533s
user    0m0.536s
sys     0m0.134s


Compression Setting: 2
Compression Ratio = 32.82%

real    0m0.681s
user    0m0.655s
sys     0m0.158s


Compression Setting: 3
Compression Ratio = 30.01%

real    0m0.812s
user    0m0.818s
sys     0m0.126s


Compression Setting: 4
Compression Ratio = 29.85%

real    0m0.879s
user    0m0.907s
sys     0m0.106s


Compression Setting: 5
Compression Ratio = 28.90%

real    0m1.461s
user    0m1.480s
sys     0m0.114s


Compression Setting: 6
Compression Ratio = 25.54%

real    0m1.654s
user    0m1.651s
sys     0m0.142s


Compression Setting: 7
Compression Ratio = 24.89%

real    0m2.358s
user    0m2.375s
sys     0m0.118s


Compression Setting: 8
Compression Ratio = 24.78%

real    0m2.458s
user    0m2.474s
sys     0m0.117s


Compression Setting: 9
Compression Ratio = 24.56%

real    0m3.102s
user    0m3.140s
sys     0m0.099s


## zlib

$ for i in `seq 1 9`; do printf "\n\nCompression Setting: $i\nCompression Ratio = "; rm -f ~/js-data.txt.zstd && time (./zstd  ~/js-data.txt -o ~/js-data.txt.zstd -$i --format=gzip) 2>&1 | egrep '%|real' | sed -e 's/.* : //;s/%.*/%/'; done


Compression Setting: 1
Compression Ratio = 36.61%

real    0m1.603s
user    0m1.531s
sys     0m0.080s


Compression Setting: 2
Compression Ratio = 35.33%

real    0m1.690s
user    0m1.607s
sys     0m0.090s


Compression Setting: 3
Compression Ratio = 34.45%

real    0m2.184s
user    0m2.131s
sys     0m0.060s


Compression Setting: 4
Compression Ratio = 32.51%

real    0m2.420s
user    0m2.348s
sys     0m0.076s


Compression Setting: 5
Compression Ratio = 31.52%

real    0m3.049s
user    0m2.985s
sys     0m0.071s


Compression Setting: 6
Compression Ratio = 31.22%

real    0m3.963s
user    0m3.933s
sys     0m0.038s


Compression Setting: 7
Compression Ratio = 31.15%

real    0m4.531s
user    0m4.456s
sys     0m0.079s


Compression Setting: 8
Compression Ratio = 31.11%

real    0m5.574s
user    0m5.517s
sys     0m0.065s


Compression Setting: 9
Compression Ratio = 31.11%

real    0m5.661s
user    0m5.599s
sys     0m0.070s


# Decompression

## zlib
$ rm -f ~/js-data.txt.zstd && time ./zstd  ~/js-data.txt -o ~/js-data.txt.zstd -6 --format=gzip
/usr/local/google/home/lizeb/js-data.txt : 31.22%   (118276532 => 36928923 bytes, /usr/local/google/home/lizeb/js-data.txt.zstd)

real    0m3.918s
user    0m3.841s
sys     0m0.077s
14:27:15 [lizeb:/code/zstd] $
$ time ./zstd -d ~/js-data.txt.zstd -o /dev/null
/usr/local/google/home/lizeb/js-data.txt.zstd: 118276532 bytes

real    0m0.544s
user    0m0.503s
sys     0m0.047s
14:27:26 [lizeb:/code/zstd] $

## zstd
$ rm -f ~/js-data.txt.zstd && time ./zstd  ~/js-data.txt -o ~/js-data.txt.zstd -3
/usr/local/google/home/lizeb/js-data.txt : 30.01%   (118276532 => 35500171 bytes, /usr/local/google/home/lizeb/js-data.txt.zstd) 

real    0m0.841s
user    0m0.843s
sys     0m0.127s
14:28:00 [lizeb:/code/zstd] $ 
$ time ./zstd -d ~/js-data.txt.zstd -o /dev/null
/usr/local/google/home/lizeb/js-data.txt.zstd: 118276532 bytes                 

real    0m0.269s
user    0m0.227s
sys     0m0.047s

Cc: cavalcantii@chromium.org
zstd is pretty awesome and should have outstanding performance.

My 2 cents to the discussion:
a) Just ensure that you are validating the results using Chrome's zlib (as that has extensive optimization work on both ARM & x86).

b) It may be interesting to *also* validate the results on ARM devices (specially that they are the ones that memory pressure may be more significant).

Yes, absolutely :)

Before making any decision, we will likely reproduce the brotli evaluation with zstd, that is using the same parameters we use in Chrome, and testing on both ARM and x86.

See https://bugs.chromium.org/p/chromium/issues/detail?id=907489 for details regarding brotli.
Nice!

Just double checking, when you build zstd 'with the default options', it will link with which zlib? System's zlib or Chromium's zlib?

Another consideration: contributing to zstd requires to sign the Facebook CLA (https://github.com/facebook/zstd/blob/dev/CONTRIBUTING.md).

There is a patent clause in the license* that is a major blocker for us, at ARM, to be able to contribute to it (unlike zlib or brotli).

Maybe that is not a concern for google or the Chromium project, but I thought it would be interesting to point it out.


*The patent clause:
"You hereby grant to Facebook and to recipients of software distributed by Facebook a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import..."

Comment 6 by lizeb@chromium.org, Jan 17 (5 days ago)

Did the same benchmarking done for zlib and brotli with zstd as well. See the full data and results attached. For googlers, link to the full data:
https://colab.corp.google.com/drive/177uyTNTlpx3H0MoLuqTmC0tsbp_kQ-01#scrollTo=q_RUrVD0AuH3

For others, the same data is attached to this update.


tl;dr: With ~ the same compression ratio as zlib, zstd is:
- Faster for compression and decompression on Linux x86_64
- Faster for compression, and slower for decompression on Android (Pixel 3XL).

The compression speed difference is large on both Android and Linux, but decompression is significantly slower on Android (283MB/s vs 198MB/s for a 128kB chunk size, for instance). This is likely due to the optimization work done for zlib on ARM, as we used Chromium's defaults for all libraries.

Together with the binary size cost of zstd (needs to be evaluated in more details, but non-trivial), zstd is not necessarily attractive on Android, but may be on desktop for JS string compression.
zlib_zstd.html
579 KB View Download

Comment 7 by cavalcantii@chromium.org, Jan 17 (5 days ago)

Fascinating data, thanks for sharing it.

I got one question: for zlib compression, which compression level are you using? I noticed in my tests that the sweet spot seems to be compression '3', when factoring compression speed x compression ratio.

Comment 8 by cavalcantii@chromium.org, Jan 17 (5 days ago)

Nevermind, just read it in the report: "Zlib: level 6 (default)".

Would be possible to repeat the experiment with level 3?

Comment 9 by cavalcantii@chromium.org, Jan 17 (5 days ago)

> This is likely due to the optimization work done for zlib on ARM, as we used Chromium's defaults for all libraries.

Yes, all the patches contributed by ARM pretty much doubled the performance of zlib in decompression.

For further details: 
a) Zlib only: https://goo.gl/vaZA9o

b) PNG decoding: https://docs.google.com/presentation/d/1vX3Ue2RRLM4Wuopx4QuFJyvSDuxCNHJ-7ZxJpqwdeHQ/edit#slide=id.g36f45ded64_0_0

Comment 10 by cavalcantii@chromium.org, Jan 17 (5 days ago)

For compression, I only worked on it for 3 weeks but was able to improve it in average in 36% compared to vanilla zlib. Data:
https://goo.gl/qLVdvh 

Comment 11 by lizeb@chromium.org, Jan 18 (4 days ago)

re: #8

Attached are results with various compression levels. Indeed 3 is much faster to compress than 6, but at the cost of a lower compression ratio. We may use 3 on low-end Android then, thanks for the tip!
zlib_zstd-Copy1.html
483 KB View Download

Comment 12 by cavalcantii@chromium.org, Jan 18 (4 days ago)

@lizeb: thanks for repeating the experiment and sharing the data, I really appreciate it.

The reason for the suggested compression level being way faster is that some of optimizations I've implemented for ARM (i.e. insert_string_arm()) are in the fast path (i.e. deflate_fast), used for lower compression levels.

I haven't looked in deep on the slow path, my feeling is that it could potentially be improved.

To be quite honest, I've asked myself if it was worth the effort due to the fact that most of the time the browser uses zlib in the other way (i.e. doing decompression of gzipped webpages and decoding PNGs), where I dedicated more time to improve it.

It is quite interesting to learn that compression may be a relevant use case for Chromium usage of zlib.

A few questions:
a) May I ask how much RAM are you able to save by compressing JS source strings? b) Does it allow to have more tabs open?
c) How does it interact with use of zram in CrOS and Android?


Comment 13 by cavalcantii@chromium.org, Jan 18 (4 days ago)

Another (obvious) reason for level 3 being way faster is that it does less work while performing the compression job than, say, level 6.

Comment 14 by cavalcantii@chromium.org, Today (9 hours ago)

Another question: while running inside the browser, what is the priority of the compression task?

I'm not familiar with how background tasks are scheduled, but if they are low priority, it may be the case that the kernel will schedule them to run in a little core (in a big.LITTLE system).

If that is the case, the compression speed numbers will be smaller (e.g. around 1.5x to 1.9x slower) than the data you observed.

Sign in to add a comment