Issue metadata
Sign in to add a comment
|
[zlib][arm] Ensure zlib on Chrome OS/ARM uses crc instructions if available |
||||||||||||||||||||||||
Issue descriptionAs of http://crrev.com/537179, zlib has support for using crc instructions on ARM (and has bits to do feature detection in case crc isn't supported). Looks like this is disabled for Chrome OS, since, quoting a comment: "ChromeOS has wrapper scripts that are borking the compiler flags." ...This was apparently enough of an improvement on Android for us to go out of our way to write and enable it, so it's probably worth having on CrOS. Please note that this breaks ThinLTO when using an optimization level above O0. The workaround I have for Android is pretty ugly, but available here for reference: https://chromium-review.googlesource.com/c/chromium/src/+/1026879 Assigning to yunlian at llozano's request.
,
Jun 7 2018
We have these CFLAGS for kevin build CFLAGS = -pipe -march=armv8-a+crc -mtune=cortex-a57.cortex-a53 -mfpu=crypto-neon-fp-armv8 Would that be good enough for this one?
,
Jun 7 2018
If our default march is armv8-a+crc, then yeah. We may not see the ThinLTO complaints if we turn this on (Android sees them because its default march is armv7a or similar). There's custom code in crc32_simd.c that makes use of these, though, and it doesn't appear that we're pulling that in on CrOS on ARM (but we probably are on x86?).
,
Jun 12 2018
Did ThinLTO complain at build time or at runtime?
,
Jun 12 2018
Build-time; the complaints I get look like: FAILED: libchrome.so libchrome.so.TOC libchrome.so.whitelist lib.unstripped/libchrome.so lib.unstripped/libchrome.so.map.gz /path/to/chromium/out/third_party/llvm-build/Release+Asserts/bin/ld.lld: error: Linking two modules of different target triples: obj/third_party/android_tools/cpu_features/cpu-features.o' is 'thumbv7--linux-android' whereas 'obj/third_party/zlib/zlib_arm_crc32/arm_features.o' is 'thumbv8--linux-android' /path/to/chromium/third_party/llvm-build/Release+Asserts/bin/ld.lld: error: Linking two modules of different target triples: obj/third_party/zlib/zlib_arm_crc32/arm_features.o' is 'thumbv8--linux-android' whereas 'obj/third_party/zlib/libchrome_zlib.azlib/crc32.o2236' is 'thumbv7--linux-android'
,
Jun 12 2018
,
Jun 12 2018
,
Oct 22
,
Oct 22
,
Oct 22
,
Dec 5
@cavalcanti I have tested that Chrome OS is able to enable CRC instructions now. I have a question now. Which benchmark(s) you tested can benefit from the change (or which you think should)? I can test on those benchmarks to see if there's any problems and also see how much improvement we can get.
,
Dec 6
Hello there, Tiancong Basically enabling crc32 ARM instruction in zlib will boost the speed of gzip content decompression. It will help performance in quite a few cases. As an example, lots of webpages are served using 'content-encoding: gzip', where page content decompression will benefit. Another example (but not as dramatic) was in decoding PNGs. The best way to benchmark zlib is using zlib_bench (https://cs.chromium.org/chromium/src/third_party/zlib/contrib/bench/). You can build it using: $ninja -C out/mybuild zlib_bench -j64 It will generate a standalone binary that allows to test the speed of data compression/decompression. I would expect an average boost of 20 to 30% in decompression once the crc32 optimization is enabled on ARM. Pay attention that while testing you have to factor the presence of big.LITTLE cores in your ARM device (use 'taskset' to bind a process to a specific core). I would expect numbers similar to (running an 'elm' device in 'crouton'): (xenial)adenilson@localhost:~$ ./zlib_bench -h usage: ./gzlib_bench -wrapper gzip|zlib|raw -compression [0:9] files... (xenial)adenilson@localhost:~$ ./zlib_bench -wrapper gzip testdata/html* testdata/html : GZIP: [b 1M] bytes 102400 -> 13707 13.4% comp 27.9 ( 28.5) MB/s uncomp 443.1 (444.7) MB/s testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53285 13.0% comp 26.6 ( 26.6) MB/s uncomp 456.4 (458.8) MB/s (xenial)adenilson@localhost:~$ sudo taskset -c 0 ./zlib_bench -wrapper gzip testdata/html* testdata/html : GZIP: [b 1M] bytes 102400 -> 13707 13.4% comp 14.1 ( 14.3) MB/s uncomp 310.3 (311.5) MB/s testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53285 13.0% comp 13.4 ( 13.6) MB/s uncomp 299.5 (304.3) MB/s Generally I use the snappy data corpus for testing (https://github.com/google/snappy/tree/master/testdata), as it has files with varying levels of entropy. One of the reasons to have this optimization disabled for CrOS was due to the need to pass some special compiler flags to enable the use of the crc32 crypto extension. Lately there was a commit where we changed that to use instead compiler builtins (so no need to pass special flags to the compiler). Therefore, I would say that for CrOS would be a matter of simply enabling the compiler optimization in the BUILD.GN file.
,
Dec 7
This should provide some guidance (just care about 'uncomp' speed for this case): a) Vanilla zlib: (xenial)adenilson@localhost:~$ ./zlib_bench -wrapper gzip testdata/html testdata/html : GZIP: [b 1M] bytes 102400 -> 13711 13.4% comp 26.4 ( 26.7) MB/s uncomp 218.7 (219.0) MB/s b) Chunk copy (i.e. what is enabled today for CrOS): (xenial)adenilson@localhost:~$ ./chunky_compress_zlib_bench -wrapper gzip testdata/html testdata/html : GZIP: [b 1M] bytes 102400 -> 13701 13.4% comp 31.5 ( 31.6) MB/s uncomp 281.9 (282.4) MB/s c) Chunky copy + crc32: (xenial)adenilson@localhost:~$ ./gzlib_bench -wrapper gzip testdata/html testdata/html : GZIP: [b 1M] bytes 102400 -> 13707 13.4% comp 27.9 ( 28.7) MB/s uncomp 443.9 (444.5) MB/s
,
Dec 11
I have got similar results with my test after enabling zlib: GZIP: [b 1M] bytes 102400 -> 13707 13.4% comp 23.1 ( 23.2) MB/s uncomp 438.6 (440.0) MB/s Chunk copy: (default flags to build CrOS) GZIP: [b 1M] bytes 102400 -> 13711 13.4% comp 18.6 ( 18.6) MB/s uncomp 293.7 (294.4) MB/s
,
Dec 11
Pretty awesome! Just send the patch and I will review it.
,
Dec 13
With binding zlib_bench to core 0: (still similar improvement, was ~49%, now 58%) Vanilla test: testdata/html : GZIP: [b 1M] bytes 102400 -> 13711 13.4% comp 11.0 ( 11.4) MB/s uncomp 175.3 (176.2) MB/s testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53299 13.0% comp 10.5 ( 10.6) MB/s uncomp 174.6 (174.9) MB/s crc enabled: testdata/html : GZIP: [b 1M] bytes 102400 -> 13707 13.4% comp 11.6 ( 12.0) MB/s uncomp 279.3 (279.6) MB/s testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53285 13.0% comp 11.1 ( 11.6) MB/s uncomp 270.3 (283.5) MB/s
,
Dec 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9b3f32ad27fdd569e39289f241043f7111da78fa commit 9b3f32ad27fdd569e39289f241043f7111da78fa Author: Tiancong Wang <tcwang@google.com> Date: Thu Dec 13 04:20:08 2018 Enabling crc32 ARM instruction in zlib for ChromeOS. Using the ARM crc32 instruction can boost the speed of gzip content decompression, which can improve performances in cases where lots of webpages are served using 'content-encoding: gzip'. Expected to have ~50% decompression speed improvement on elm. Bug: 848897 , 810125 Change-Id: I9f090950209e6a68271c6926700b7335e14c7cbf Reviewed-on: https://chromium-review.googlesource.com/c/1372548 Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org> Cr-Commit-Position: refs/heads/master@{#616210} [modify] https://crrev.com/9b3f32ad27fdd569e39289f241043f7111da78fa/third_party/zlib/BUILD.gn
,
Dec 13
Nice to see #18 land. Not sure it's a 50% improv, though. If a thing that output 100Mb/s, now outputs at 200 Mb/s, that is 2x faster ...
,
Dec 13
Also re PNG. On the web, 30% of images decoded are PNG. On Chrome OS, 70% of images decode are PNG.
,
Dec 13
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by g...@chromium.org
, Jun 1 2018