New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 772870 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 769880
issue 796470

Blocking:
issue 764085



Sign in to add a comment

Improve zlib inflate_fast speed by using SSE

Project Member Reported by noel@chromium.org, Oct 9 2017

Issue description

zlib inflate_fast() speed on ARM devices was improved in  issue 697280  by performing large chunk copies via NEON SIMD code.

Similar improvement could be expected by using SSE SIMD on Intel devices.  The code from  issue 697280  could be modified to add support SSE, and also changed from being an arch-specific contrib to a more general inflate SIMD contrib.



 

Comment 1 by noel@chromium.org, Oct 9 2017

Blocking: 764085
Cc: cavalcantii@chromium.org cblume@chromium.org nigeltao@chromium.org sho...@google.com mtklein@chromium.org
Owner: noel@chromium.org

Comment 2 by noel@chromium.org, Oct 9 2017

Related issue for this "chunk-copy" code is the infback case,  issue 769880 . 

Comment 3 by noel@chromium.org, Oct 24 2017

Cc: scroggo@chromium.org
https://bugs.chromium.org/p/chromium/issues/detail?id=697280#c46 reported the decode perf for the ARM chunk copy case.  I repeated that experiment for Intel using SSE chunk copy on my macbook pro:

MAC BOOK PRO 13" 2016 (TOUCH BAR) LAPTOP
 * Dual-core Intel Core i7-6567U, 3.3GHz Skylake-U (14nm)
png mac-book-pro-sse-chunk-adler
https://docs.google.com/spreadsheets/d/1ZGSdjmPKbdvsBlXvUPRaIG4uJ2jyEOljUVf78e-BibQ/edit#gid=595603569

Chunk copy (the blue-line on the graph, label sse chunk + no alder) is an average 17% improvement in PNG decoding rate for the PNG 140 corpus.

Adding the adler32 SSE change (the read-line, label sse chunk + ssse3 alder), the PNG decoding improvement is average 37% for the PNG 140 corpus.

Comment 4 by noel@chromium.org, Oct 25 2017

Patch uploaded https://chromium-review.googlesource.com/c/chromium/src/+/708834 and we can begin review.

 - solves  issue 769880  by using inffastloose.c that was provided by cavalcantii@ in https://gist.github.com/Adenilson/0525bb60f79fcc90b4ccedb778a198d1 which I renamed to inffastchunk.c

 - defines the int splats needed by SIMD code for both NEON and SSE2 as macros

Be advised I will be OOO for the next two weeks, so you won't be hearing from me :)
Noel

That is pretty cool, I really like the spreadsheets with the the output of image decoding benchmark (do you have a script to import the data into Google docs?).

17% is in line with the NEON results for inffast (i.e. 16% as in https://docs.google.com/spreadsheets/d/1WOWEuAMXrxj608WMuHj-9DKv1wN08eY5hiSMufMrbJw/edit#gid=595603569).

I recall that Nigel had a patch optimizing inflate fast for Intel (https://chromium-review.googlesource.com/c/chromium/src/+/601694) that was pending on investigating the reported performance regression on media_perftests* on Intel related to the Adler-32 patch (https://bugs.chromium.org/p/chromium/issues/detail?id=772950).

*By the way, did you look into the regression? It is being 16 days already since the report.

You also reported that Nigel's patch yielded an average gain of 2% (https://bugs.chromium.org/p/chromium/issues/detail?id=760853#c21).

So what is the plan here so we can coordinate the work: land Nigel's patch and then the SIMD inffast for Intel? Something else?

In any case, I was planning to refactor the inffast NEON code to allow to easily implement the same optimization for other architectures. 

I will do that together with the infback bug fix before we implement further optimizations.

Then we can land the Intel specific code.

Blockedon: 769880
Status: Started (was: Untriaged)

Comment 8 by noel@chromium.org, Nov 21 2017

#5 > That is pretty cool, I really like the spreadsheets with the the output of image decoding benchmark (do you have a script to import the data into Google docs?).

No script, sorry.  I compute the results locally, storing them in a .cvs file, and then upload them to google spreadsheets.

> 17% is in line with the NEON results for inffast (i.e. 16% as in https://docs.google.com/spreadsheets/d/1WOWEuAMXrxj608WMuHj-9DKv1wN08eY5hiSMufMrbJw/edit#gid=595603569).

Yeap, about the same for Intel or NEON chunk copy.

> I recall that Nigel had a patch optimizing inflate fast for Intel (https://chromium-review.googlesource.com/c/chromium/src/+/601694) that was pending on investigating the reported performance regression on media_perftests* on Intel related to the Adler-32 patch (https://bugs.chromium.org/p/chromium/issues/detail?id=772950).

> *By the way, did you look into the regression? It is being 16 days already since the report.

I don't tend to deal with regressions while I'm away on vacation :)  Anyhow, the particular media test is about audio channel passthrough mixing, and I don't see how PNG encoding or decoding is involved with that at all.

> You also reported that Nigel's patch yielded an average gain of 2% (https://bugs.chromium.org/p/chromium/issues/detail?id=760853#c21).

> So what is the plan here so we can coordinate the work: land Nigel's patch and then the SIMD inffast for Intel? Something else?

I would land the SIMD inffast for Intel first, it has measurements (#3).  Nigel's patch could be added afterward, if measurements of it show that it's a progression on the chunk copy and SIMD adler32 changes.

Comment 9 Deleted

Comment 10 Deleted

Comment 11 Deleted

Comment 12 Deleted

Comment 13 Deleted

Comment 14 by noel@chromium.org, Dec 8 2017

The regression for  issue 771209  was examined and the adler32 is not even used by the test involved. (Removing comments above about that side issue).
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c293a3255eb27dee8879f85f2c45dedff58e2452

commit c293a3255eb27dee8879f85f2c45dedff58e2452
Author: Noel Gordon <noel@chromium.org>
Date: Fri Dec 08 11:39:34 2017

Improve zlib inflate speed by using SSE2 chunk copy

Using SSE2 chunk copies improves the decoding rate of the PNG 140
corpus by an average 17%, giving a total 37% performance increase
when combined with SIMD adler32 code ( https://crbug.com/772870#c3 
for details).

Move the arm-specific code back into the main chunk copy code and
generalize the SIMD parts of chunkset_core() with inline function
calls for ARM, and Intel SSE2 devices. This removes the TODO from
arm/chunkcopy_arm.h, and that file can be deleted as a result.

Add SSE2 vector load / store SSE helpers for chunkset_core(). The
existing NEON load code had alignment issues, as noted in review.
Fix that: use unaligned loads in the ARM helper code.

Change chunkcopy.h to use __builtin_memcpy if it's available, use
zmemcpy otherwise such as on MSVC. Also call x86_check_features()
in inflateInit2_() to keep the adler32 SIMD code path enabled.

Update BUILD.gn to conditionally compile the SIMD chunk copy code
on Intel SSE2 and ARM NEON devices. Update names.h to add the new
symbol defined by the inflate chunk copy code path.

Code had various comment styles; pick one and use it consistently
everywhere. Add inffast_chunk.h TODO(cblume).

Bug:  772870 
Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342
Reviewed-on: https://chromium-review.googlesource.com/708834
Commit-Queue: Noel Gordon <noel@chromium.org>
Reviewed-by: Chris Blume <cblume@chromium.org>
Cr-Commit-Position: refs/heads/master@{#522764}
[modify] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/BUILD.gn
[delete] https://crrev.com/14b857f982900cb6e61cb350342f35a46644f54a/third_party/zlib/contrib/optimizations/arm/chunkcopy_arm.h
[modify] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/contrib/optimizations/chunkcopy.h
[rename] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/contrib/optimizations/inffast_chunk.c
[rename] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/contrib/optimizations/inffast_chunk.h
[modify] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/contrib/optimizations/inflate.c
[modify] https://crrev.com/c293a3255eb27dee8879f85f2c45dedff58e2452/third_party/zlib/names.h

Comment 16 Deleted

Comment 17 by noel@chromium.org, Dec 13 2017

Per http://bit.ly/2AfaxhG, "This has shown some lovely performance improvements on all the perf bots EXCEPT Android One, where it shows a clear regression [1]. Any ideas why?"

We think it's the -O3 here:  It gave us wins on all Android bots _except_ the Android One.  Backing out the -O3 part for Android of #15 for now to see if the Android One perf bot recovers.
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/53edf03e13a1569f146f136c9e43bc6e476bf33e

commit 53edf03e13a1569f146f136c9e43bc6e476bf33e
Author: Noel Gordon <noel@chromium.org>
Date: Wed Dec 13 03:51:05 2017

Use default optimization on Android for zlib inflate chunk copy

Speculative fix: back off from using -O3 on Android for the inflate
chunk copy code to see if -O3 caused the perf bot regression on the
Android One bot, http://bit.ly/2AfaxhG

Bug:  772870 
Change-Id: I06bd941224dacbc5d7024a65934faaa53e4a4ce5
Reviewed-on: https://chromium-review.googlesource.com/823503
Commit-Queue: Noel Gordon <noel@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Chris Blume <cblume@chromium.org>
Reviewed-by: Noel Gordon <noel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#523679}
[modify] https://crrev.com/53edf03e13a1569f146f136c9e43bc6e476bf33e/third_party/zlib/BUILD.gn

Comment 19 by noel@chromium.org, Dec 15 2017

Cc: noel@chromium.org kraynov@chromium.org
 Issue 795003  has been merged into this issue.

Comment 20 by noel@chromium.org, Dec 20 2017

Blockedon: 796470

Comment 21 by noel@chromium.org, Feb 5 2018

@cblume, I believe there was regression nearby at the time,  issue 795180  by ajwong@ and I followed the trail to your discussion with him about it

https://chromium-review.googlesource.com/c/chromium/src/+/826255#message-c71b9219715adfc98132d6fc6bc2bc15977d0f00

Reading it, suggest we should try re-enabling -O3 on ARM again?  Excepting Android One, every ARM bot really liked the -O3, so let me know.

Comment 22 by noel@chromium.org, Feb 5 2018

And what I read from that linked discussion was

> cblume@
> So I'm sending this email to:
> 1.) Let everyone know there is a small regression, but
> 2.) it is probably fine, and
> 2.) Noel, it might be worth trying -O3 again.

I never got mail about it though :/
IIRC, the enable frame pointers (EFP) regression happened *after* the fix for Android one regression (i.e. reverting the -O3).

Just re-enabling the flag would produce the same results observed before: a nice improvement on higher end devices (with A72 cores) and a regression on lower end devices (probably with A53 cores and some ARMv7 processors).

Some issues of disabling the compiler flag *only* for Android One are: 
a) Android One is something relatively new (even though announced in 2014, it seems the devices are reaching mass market starting in 2017).

b) As a result, you have millions of cheap devices mostly in Asian/African/South American markets that still run some version of Android (!= Android One).

So a question that should be asked is: should we have a good gain in higher end devices at the cost of a considerable cost for lower end devices?

We still have some performance work to complete (e.g. using ARMv8-a crc32 will equally help both A72 as also A53).

What would be *ideal* would be if the Google App store distributed *target specific* builds of Chrome.

I suggested this in chromium-dev (i.e. have ARMv8 and ARMv7 specific builds) and heard that it would require considerable(?) changes in the app store backend.

I wouldn't ask to support this feature (i.e. target specific apks) for everyone, but maybe an exception could be made for Chrome?

Just an idea.

Maybe similar to Fuchsia and the ARM cpu-features syscall, you guys could ask around and see if that would be feasible (i.e. target specific apks in the app store *only* for chrome)?

I have the gut feeling that we could have a nice global gain if the *whole* Chromium was specially built to better use the ARM device hardware resources.

Just confirmed what I mentioned before, the EFP regression happened *after* the fix in Android one.

The first peak in Android One was thanks to the -O3 and the second was thanks to the EFP:
https://chromeperf.appspot.com/report?sid=f77857a3ac817e41f64f9d596aa1bc19b8dfe0f57dfb5610cb5e788c0563a355&start_rev=519183&end_rev=526297

So they are unrelated and we should expect to observe the same behavior as before.

As I understand it, separate ARMv7 and ARMv8 builds have been possible in the app store since ARMv8 was introduced, but Chromium just doesn't want to do that.  Chrome problem, not Android.

I think at one point Chromium did ship a A64 build, or maybe just one phone shipped with it, but later upgraded back to an ARMv7 compatible A32 build.
@Mike: so maybe is an issue of just getting buy in from the people involved provided that there are such performance gains?

So basically the idea would be to still keep building 32bits binaries (as 64bits binaries are bigger and use more memory), but using proper optimizations flags (e.g. -O3 || -O0), target (e.g. -march=armv7 || -march=armv8), etc and distribute the best match apk for the device.

Like I said, just an idea (and would require some hard data to justify the approach).

Comment 28 by noel@chromium.org, Feb 15 2018

The app store stuff is interesting, but I wasn't asking about that in particular.  I was considering, with the zlib optimizations we've added, how we have already given much better performance to Android One, and indeed to all other Android devices we test.

So can we take some of the gain we've provided back from Android One (5% loss with -O3) to provide additional gain to all those other Android devices?

Comment 29 by noel@chromium.org, Mar 19 2018

^^^ polite ping :)

Comment 30 by noel@chromium.org, Mar 19 2018

Components: Internals Blink>Internals
> So can we take some of the gain we've provided back from Android One (5% loss
> with -O3) to provide additional gain to all those other Android devices?

Sgtm. #22 sounds like cblume@ supports it.

Comment 32 by noel@chromium.org, Mar 21 2018

Owner: ----
Status: Available (was: Started)
Not actively working on this bug mind: FTR, required work would be to reapply
  https://chromium-review.googlesource.com/c/chromium/src/+/823503

Comment 33 by noel@chromium.org, Mar 21 2018

Ahem, undo that change :) to be precise. 
I can help on this one.

The first thing would be to assess what would be the impact on modern and affordable ARMv8 SoCs (e.g. A53, which are in-order), as those are commonly used in Android One cheap devices.

If it is not a regression in there, that would justify the possible penalty on older/cheaper ARMv7 devices.

We now have zlib_bench for the task (which also runs standalone on Android), which will make things much easier.

Ok, I'm still running the benchmarks but the initial data seems to point that:
a) It is *not* a perf regression on A53 (actually it may be a small perf gain in most cases, 1 to 2%).

b) It is a perf regression in some cases (up to 5%) on ARMv7, but at least it won't be worst than vanilla zlib.

Status: Started (was: Available)

Comment 37 by noel@chromium.org, Mar 22 2018

Plan sgtm, initial results promising and thank you for helping here, Adenlison.

Comment 38 by noel@chromium.org, Mar 22 2018

Owner: cavalcantii@chromium.org
Quite interesting that it seems to hurt perf on Intel (I guess we better enable the flag only for ARM).
a) ToT
adenilson@chrome-monster:~/chromium/src/out/Release$ ./zlib_bench gzip ~/temp/snappy/testdata/*
/home/adenilson/temp/snappy/testdata/alice29.txt :
GZIP: [b 1M] bytes 152089 ->  54397 35.8% comp  18.4 ( 18.5) MB/s uncomp 497.5 (499.2) MB/s
/home/adenilson/temp/snappy/testdata/asyoulik.txt :
GZIP: [b 1M] bytes 125179 ->  48927 39.1% comp  17.5 ( 17.8) MB/s uncomp 418.7 (428.1) MB/s
/home/adenilson/temp/snappy/testdata/baddata1.snappy :
GZIP: [b 1M] bytes  27512 ->  22920 83.3% comp  44.2 ( 44.8) MB/s uncomp 215.6 (217.5) MB/s
/home/adenilson/temp/snappy/testdata/baddata2.snappy :
GZIP: [b 1M] bytes  27483 ->  23000 83.7% comp  44.7 ( 44.9) MB/s uncomp 214.5 (216.9) MB/s
/home/adenilson/temp/snappy/testdata/baddata3.snappy :
GZIP: [b 1M] bytes  28384 ->  23705 83.5% comp  44.7 ( 44.7) MB/s uncomp 212.4 (212.9) MB/s
/home/adenilson/temp/snappy/testdata/fireworks.jpeg :
GZIP: [b 1M] bytes 123093 -> 122927 99.9% comp  47.0 ( 48.0) MB/s uncomp 1243.4 (1245.1) MB/s
/home/adenilson/temp/snappy/testdata/geo.protodata :
GZIP: [b 1M] bytes 118588 ->  15124 12.8% comp 101.4 (103.2) MB/s uncomp 1036.2 (1122.2) MB/s
/home/adenilson/temp/snappy/testdata/html :
GZIP: [b 1M] bytes 102400 ->  13707 13.4% comp  80.1 ( 80.2) MB/s uncomp 930.5 (978.9) MB/s
/home/adenilson/temp/snappy/testdata/html_x_4 :
GZIP: [b 1M] bytes 409600 ->  53285 13.0% comp  73.0 ( 73.1) MB/s uncomp 952.3 (977.3) MB/s
/home/adenilson/temp/snappy/testdata/kppkn.gtb :
GZIP: [b 1M] bytes 184320 ->  38727 21.0% comp  11.9 ( 11.9) MB/s uncomp 659.9 (661.7) MB/s
/home/adenilson/temp/snappy/testdata/lcet10.txt :
GZIP: [b 1M] bytes 426754 -> 144867 33.9% comp  19.9 ( 19.9) MB/s uncomp 496.8 (511.4) MB/s
/home/adenilson/temp/snappy/testdata/paper-100k.pdf :
GZIP: [b 1M] bytes 102400 ->  81276 79.4% comp  48.5 ( 48.6) MB/s uncomp 420.1 (429.8) MB/s
/home/adenilson/temp/snappy/testdata/plrabn12.txt :
GZIP: [b 1M] bytes 481861 -> 195102 40.5% comp  14.9 ( 14.9) MB/s uncomp 424.8 (444.2) MB/s
/home/adenilson/temp/snappy/testdata/urls.10K :
GZIP: [b 1M] bytes 702087 -> 222358 31.7% comp  40.6 ( 40.7) MB/s uncomp 462.4 (478.5) MB/s

b) Compiler flag enabled
adenilson@chrome-monster:~/chromium/src/out/Release$ time ./zlib_bench gzip ~/temp/snappy/testdata/*
/home/adenilson/temp/snappy/testdata/alice29.txt :
GZIP: [b 1M] bytes 152089 ->  54397 35.8% comp  19.2 ( 19.3) MB/s uncomp 483.7 (490.1) MB/s
/home/adenilson/temp/snappy/testdata/asyoulik.txt :
GZIP: [b 1M] bytes 125179 ->  48927 39.1% comp  17.5 ( 17.5) MB/s uncomp 417.0 (433.1) MB/s
/home/adenilson/temp/snappy/testdata/baddata1.snappy :
GZIP: [b 1M] bytes  27512 ->  22920 83.3% comp  44.0 ( 44.4) MB/s uncomp 210.8 (212.9) MB/s
/home/adenilson/temp/snappy/testdata/baddata2.snappy :
GZIP: [b 1M] bytes  27483 ->  23000 83.7% comp  44.1 ( 44.2) MB/s uncomp 213.4 (215.2) MB/s
/home/adenilson/temp/snappy/testdata/baddata3.snappy :
GZIP: [b 1M] bytes  28384 ->  23705 83.5% comp  44.0 ( 44.0) MB/s uncomp 210.5 (212.1) MB/s
/home/adenilson/temp/snappy/testdata/fireworks.jpeg :
GZIP: [b 1M] bytes 123093 -> 122927 99.9% comp  46.2 ( 46.8) MB/s uncomp 1196.4 (1196.8) MB/s
/home/adenilson/temp/snappy/testdata/geo.protodata :
GZIP: [b 1M] bytes 118588 ->  15124 12.8% comp  99.0 (100.8) MB/s uncomp 1044.0 (1056.2) MB/s
/home/adenilson/temp/snappy/testdata/html :
GZIP: [b 1M] bytes 102400 ->  13707 13.4% comp  75.4 ( 77.7) MB/s uncomp 925.3 (927.0) MB/s
/home/adenilson/temp/snappy/testdata/html_x_4 :
GZIP: [b 1M] bytes 409600 ->  53285 13.0% comp  71.9 ( 73.0) MB/s uncomp 940.1 (944.2) MB/s
/home/adenilson/temp/snappy/testdata/kppkn.gtb :
GZIP: [b 1M] bytes 184320 ->  38727 21.0% comp  11.6 ( 11.6) MB/s uncomp 633.8 (658.2) MB/s
/home/adenilson/temp/snappy/testdata/lcet10.txt :
GZIP: [b 1M] bytes 426754 -> 144867 33.9% comp  18.7 ( 19.3) MB/s uncomp 484.4 (505.8) MB/s
/home/adenilson/temp/snappy/testdata/paper-100k.pdf :
GZIP: [b 1M] bytes 102400 ->  81276 79.4% comp  48.9 ( 48.9) MB/s uncomp 354.6 (390.0) MB/s
/home/adenilson/temp/snappy/testdata/plrabn12.txt :
GZIP: [b 1M] bytes 481861 -> 195102 40.5% comp  14.5 ( 14.6) MB/s uncomp 427.9 (444.9) MB/s
/home/adenilson/temp/snappy/testdata/urls.10K :
GZIP: [b 1M] bytes 702087 -> 222358 31.7% comp  39.2 ( 40.1) MB/s uncomp 452.1 (472.7) MB/s


Experiment results are available here:
https://docs.google.com/spreadsheets/d/1kLp0UfvCbnAvJUnSQU3b9wMopjqQsfKR0izFMu1Zd0Q/edit?usp=sharing

Comments: for little ARMv8 cores seems to be a general small win (the 2 perf regressions are too small and could be within the noise), while for ARMv7 is a loss in 6/14 cases (on Nexus4 it actually pushed the result below vanilla zlib in 1 case).

That being said, those losses are in bad compression cases (i.e. baddata.snappy, jpeg, pdf), while we observed a win for html content (1 to 2% improvement on ARMv7).

The averages are:
a) rk3328 (ARMv8): 1.021337833
b) nexus4 (ARMv7): 0.9986796919
c) nexus6 (ARMv7): 1.007643199

I think is safe to proceed given that we already know that it will be a nice perf gain on big ARMv8 cores (around 8-10%) and the regression on ARMv7 is manageable given the NEON optimizations implemented in inflate_fast.

Comment 41 by noel@chromium.org, Mar 23 2018

"Quite interesting that it seems to hurt perf on Intel", when measuring on Intel you need to add the official build optimization rule inside every contribution in zlib/BUILD.gn file:

  source_set("zlib_some_contribution") {
    visibility = [ ":*" ]

    if (!is_ios && (current_cpu == "x86" || current_cpu == "x64")) {
      ...
      configs -= [ "//build/config/compiler:default_optimization" ]
      configs += [ "//build/config/compiler:optimize_speed" ]
    }
  }

to measure perf that reflects how our chrome official releases are built (for those builds, the default_optimization is optimize_speed).

Note: "those losses are in bad compression cases (i.e. baddata.snappy, jpeg, pdf)" they these cases are surprisingly common on the web [1].

[1] https://discuss.httparchive.org/t/sites-that-deliver-images-using-gzip-deflate-encoding/220

If we are concerned about ARMv7 regressions, maybe we can control it with a gn arg (I left code review comments about it).
> If we are concerned about ARMv7 regressions, maybe we can control it with a gn arg (I left code review comments about it).

Pretty sure we still only ship ARMv7 builds to both ARMv7 and ARMv8 devices.  If we disable for ARMv7, it'll turn it off on all ARM, right?

Comment 43 by noel@chromium.org, Mar 26 2018

Right, seems using a gn arg won't help here.  
Project Member

Comment 44 by bugdroid1@chromium.org, Mar 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ffc960a39d26fda610e0023667b2465118b8580a

commit ffc960a39d26fda610e0023667b2465118b8580a
Author: Adenilson Cavalcanti <cavalcantii@chromium.org>
Date: Tue Mar 27 04:50:58 2018

Enable compiler optimization in inflate_fast on ARM

Generate optimized code that will help ARMv8 chips (between 2
to 10% perf gain on ARMv8 little/big cores) at the expense of
a small perf regression on older chips (i.e. ARMv7).

Bug:  772870 
Change-Id: Ifc47870724da9790c944f76fedcc225d0a896caf
Reviewed-on: https://chromium-review.googlesource.com/976521
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Reviewed-by: Chris Blume <cblume@chromium.org>
Reviewed-by: Noel Gordon <noel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#546016}
[modify] https://crrev.com/ffc960a39d26fda610e0023667b2465118b8580a/third_party/zlib/BUILD.gn

Status: Fixed (was: Started)

Comment 46 by noel@chromium.org, Mar 28 2018

Status: Started (was: Fixed)
Now waiting on the Android One perf PNG test bot to complain (as it did before #17).

Comment 47 by noel@chromium.org, Jun 30 2018

Status: Fixed (was: Started)
No Android One perf PNG test bot complained.

Sign in to add a comment