New issue
Advanced search Search tips

Issue 688601 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Task

Blocked on:
issue 762468

Blocking:
issue 709707
issue 764085



Sign in to add a comment

Optimize Adler-32 checksum

Project Member Reported by cavalcantii@chromium.org, Feb 4 2017

Issue description

Loading PNGs require to verify the uncompressed data using Adler-32 (in zlib) and could be made much faster using SIMD instructions.
 
Labels: Arch-ARM Arch-ARM64
Owner: cavalcantii@chromium.org
Status: Started (was: Untriaged)
Cc: msarett@chromium.org fmalita@chromium.org
Blocking: 687631
I'm using the test case available at: http://codepen.io/Savago/pen/VPeQaX

Follow a screenshot of a trace collected in a Nexus 4.
adler32_277ms_191ms.png
260 KB View Download
The trace files.
trace_vanilla_798ms.json.gz
187 KB Download
trace_zlib_888ms.json.gz
204 KB Download
Disclaimers: This is only 1 data point (in 1 device) in 1 large PNG file (850KB).

YMMV.

Comparing the CPU self time:
>>> 1 - (188.463/232.320)
0.18877840909090904

Or 18% improvement.

Comparing the 'Wall Duration' we have: 
>>> 1 - (191.821/277.552)
0.30888265982590657

Or 30% improvement. 

It will definitely also vary depending on how the image was encoded (i.e. how many times the checksum will be called and the length of the byte array being checked).

Not mention if the code is running in the big or little core, CPU freq, thermal, behavior of EAS (Energy Aware Scheduler), etc, etc.
Ideally I would like to repeat the test in an ARMv8 device (e.g. Pixel).
Cc: simon.ho...@arm.com amaury.l...@arm.com
Blockedon: 687631
Blocking: -687631
Labels: -Type-Bug Type-Task
For reference, using all the 3 patches linked to this issue will yield a performance boost around 40% in PNG image decoding speed (using a Nexus 6).

The image shows the traces for a test page (http://codepen.io/Savago/pen/VPeQaX) where we compare Chromium m59 vanilla X patched (using the 3 optimizations resulting from the PNG investigation: Adler32, inflate_fast, palette).

It is interesting to see that the time spent decoding the image dropped from 116.187ms to 73.844ms (an improvement of around 40%).

It is interesting to see that now GPUImageDecodeCache::DecodeImage() will take longer to execute (94.025ms) than actually decoding the image in ImageFrameGenerator::decode().

Is the image cache compressed? I wonder if we could make it faster?

The Adler32 optimization was submitted to zlib-ng and is waiting for review/merge.


all_patches_vanilla.png
287 KB View Download
Blockedon: -687631
Blocking: 709707
Update: the optimization was submitted to Canonical zlib on https://github.com/madler/zlib/pull/251
zlib-ng has merged the optimization in their development branch:
https://github.com/Dead2/zlib-ng/commit/ec02ecf104e1d3f1836a908a359f20aa93494df5
Cc: -simon.ho...@arm.com -msarett@chromium.org
Labels: OS-Android
Traces collected in a Pixel (Snapdragon 821).
adler_traces.png
268 KB View Download
All patches combined (Pixel SnapDragon 821).
all_patches_trace.png
182 KB View Download
Blocking: 764085
Blockedon: 762468
Cc: -amaury.l...@arm.com cblume@chromium.org

Sign in to add a comment