Issue metadata
Sign in to add a comment
|
46% regression in rasterize_and_record_micro.top_25 at 477411:488973 |
||||||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241590839729952
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241373628528560
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241342505559328
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241300691123712
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241266833600464
,
Aug 14 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971241223197161888
,
Aug 14 2017
rasterize_and_record_micro.top_25 has been disabled on most platforms except Linux. On Linux there was a large increase in raster times between over a long range between 2017-06-07 and 2017-07-24, which got dismissed. Looking at individual pages it looks like there may be two or three individual regressions, so I've kicked off bisects (above). Graphics: https://chromeperf.appspot.com/report?sid=c0c9ce5e0718dd45cdbb663bfb5479a29da98c60dfd9f65993beff46853614d8&start_rev=469899&end_rev=492454
,
Aug 14 2017
,
Aug 15 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8971240127581427680
,
Aug 15 2017
This most likely is the cause of the UMA regression on both Windows and Linux. Marking RBS for 61.
,
Aug 15 2017
,
Aug 15 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Florin Malita Commit : 4d3622e28d891b3a445275d132a6817bb83973ed Date : Wed Jul 12 20:02:25 2017 Subject: Remove flag to stop relying on Skia's bilerp-hack Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/ESPN Change : 104.85% | 3.69316666667 -> 7.56533333333 Revision Result N chromium@486045 3.69317 +- 0.0995933 6 good chromium@486057 3.68017 +- 0.0193089 6 good chromium@486063 3.67917 +- 0.0271815 6 good chromium@486066 3.67433 +- 0.0414166 6 good chromium@486067 3.67083 +- 0.0381685 6 good chromium@486068 7.546 +- 0.0580689 6 bad <-- chromium@486090 7.58 +- 0.0276767 6 bad chromium@486135 7.56533 +- 0.0458621 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=ESPN rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241373628528560 For feedback, file a bug with component Speed>Bisection
,
Aug 15 2017
Keep issue open for the other regression ranges.
,
Aug 15 2017
=== BISECT JOB RESULTS === Bisect was unable to run to completion The bisect was able to narrow the range, you can try running with: good_revision: 401b447c0077c8d41035e7f81368b8554e1a1e06 bad_revision : 5cb9983f12f300422c16ba812c2d5cdeb278cb74 If failures persist contact the team (see below) and report the error. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/Blogger Change : 18.83% | 1.37 -> 1.628 Revision Result N chromium@480053 1.37 +- 0.021166 6 good chromium@480116 1.38383 +- 0.0247958 6 good chromium@480132 1.38483 +- 0.0149276 6 good chromium@480140 1.381 +- 0.0164317 6 good chromium@480142 1.38933 +- 0.0141185 6 good chromium@480144 1.64733 +- 0.0251263 6 bad chromium@480148 1.6525 +- 0.0349213 6 bad chromium@480179 1.628 +- 0.0245357 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Blogger rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241342505559328 For feedback, file a bug with component Speed>Bisection
,
Aug 15 2017
pcc@, there seems to have been 15~20% raster time regression on Linux associated with the ThinLTO change at https://chromium.googlesource.com/chromium/src/+/5cb9983f12f300422c16ba812c2d5cdeb278cb74.
,
Aug 15 2017
Florin, it seems to me the #13 and #16 cover the two main regressions here. I think I see the subsequent fix you landed partly recover the regression on ESPN, however most of the other pages in this set did not recover: https://chromeperf.appspot.com/report?sid=c0c9ce5e0718dd45cdbb663bfb5479a29da98c60dfd9f65993beff46853614d8&start_rev=469899&end_rev=492454
,
Aug 15 2017
=== Auto-CCing suspected CL author pcc@chromium.org === Hi pcc@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : pcc Commit : 5cb9983f12f300422c16ba812c2d5cdeb278cb74 Date : Fri Jun 16 20:13:24 2017 Subject: Enable ThinLTO for POSIX LTO by default on Linux. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/LinkedIn Change : 18.23% | 1.62316666667 -> 1.919 Revision Result N chromium@480053 1.62317 +- 0.0226458 6 good chromium@480116 1.6235 +- 0.0227486 6 good chromium@480132 1.62467 +- 0.0292119 6 good chromium@480140 1.61717 +- 0.0199708 6 good chromium@480142 1.62033 +- 0.0257682 9 good chromium@480143 1.69633 +- 0.568322 9 good chromium@480144 1.95422 +- 0.0438356 9 bad <-- chromium@480148 1.94533 +- 0.0303205 6 bad chromium@480179 1.919 +- 0.0226274 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=LinkedIn rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241266833600464 For feedback, file a bug with component Speed>Bisection
,
Aug 15 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : pcc Commit : 5cb9983f12f300422c16ba812c2d5cdeb278cb74 Date : Fri Jun 16 20:13:24 2017 Subject: Enable ThinLTO for POSIX LTO by default on Linux. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/Pinterest Change : 29.96% | 0.7265 -> 0.944166666667 Revision Result N chromium@480053 0.7265 +- 0.0144049 6 good chromium@480116 0.728833 +- 0.0135953 6 good chromium@480132 0.733833 +- 0.0191529 6 good chromium@480140 0.7275 +- 0.0120623 6 good chromium@480142 0.733167 +- 0.0154542 6 good chromium@480143 0.730667 +- 0.0162891 6 good chromium@480144 0.962333 +- 0.0113725 6 bad <-- chromium@480148 0.967833 +- 0.0149944 6 bad chromium@480179 0.944167 +- 0.00518009 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Pinterest rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241223197161888 For feedback, file a bug with component Speed>Bisection
,
Aug 15 2017
Victor, thanks for bisecting. Of the regressions in #17, only LinkedIn and ESPN seem related to the Skia image change (486068). I can reopen issue 744674 to investigate, and leave this for the ThinLTO regression.
,
Aug 15 2017
[Bulk Edit] URGENT - PTAL. Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and get it merged into the release branch ASAP. Thank you! Know that this issue shouldn't block the release? Remove the ReleaseBlock-Stable label or move to M62.
,
Aug 15 2017
Re #20: yes you're right, only LinkedIn and ESPN clearly relate to the Skia image change. Facebook & Docs look like were missing data and their regression range covers both the Skia change and ThinLTO change.
,
Aug 15 2017
This may be related to issue 738085 . Is there any code in Skia that we are expecting to be autovectorized?
,
Aug 15 2017
@pcc it looks like the ThinLTO affects 2 of the most compute-intensive and performance-critical operations in chrome (media and raster). Should we consider reverting on the branch while we investigate?
,
Aug 15 2017
,
Aug 15 2017
That seems risky to me because after the ThinLTO change landed we switched a number of other bots to test only the ThinLTO configuration, so we'd effectively be switching to a relatively untested configuration. I'll see if I can figure out which part of the code regressed by looking at the profile.
,
Aug 15 2017
+ Skia folks to help answer #23.
,
Aug 15 2017
Yes, we rely on autovectorization for our sk_memset{16,32,64} routines (see SkUtils_opts.h, with SSE2 and AVX versions compiled into SkOpts.cpp and SkOpts_avx.cpp respectively on x86 builds). sk_memset32() is probably the single hottest function in Skia. avx::memset32() isn't much faster than sse2::memset32(), but they're both zillions of times faster vectorized than not.
There's probably lots of other float-mathy code in Skia that benefits from autovectorization, all over the place. 2-float points, 4-float rects, and 9-float matrices are our bread and butter. A few of their routines are explicitly vectorized (any time you see Sk4f, Sk2s, things that look that that) but I've got to imagine we benefit a lot from autovectorization in addition.
,
Aug 15 2017
@#26: I would definitely want to entertain the possibility though. This is a big regression, affecting rendering performance for all pages / all users on Linux. It's really unfortunate that we burned the ships allowing us to go back.
,
Aug 15 2017
mtklein: Thanks for the explanation. What I don't understand is why the skia benchmarks are unmonitored on Linux if they are this performance critical. As I wrote on the other bug, I'd expect monitoring on any benchmarks where regressions may turn into release blockers.
,
Aug 15 2017
I honestly haven't read through this bug. Isn't it an alert about some regression? Is that not monitoring?
,
Aug 15 2017
I understand "monitoring" of a benchmark to mean that any regressions appear on the revision's perf dashboard (https://chromeperf.appspot.com/group_report?rev=480144 for my change). Am I mistaken in my understanding?
,
Aug 15 2017
I don't really know how any of that works. I only ever fix perf bugs when someone from Chromium assigns them to me. Want to see if this helps any? https://skia-review.googlesource.com/c/34746
,
Aug 15 2017
pcc: It looks like a bug that the alert didn't show up on that page, filed bug 755661 on it. vmiura: it seems like this alert didn't get triaged by the perf sheriff rotation either? If you think there might be a bug causing the rotation not to see it, please file a bug on that, too.
,
Aug 15 2017
> vmiura: it seems like this alert didn't get triaged by the perf sheriff > rotation either? If you think there might be a bug causing the rotation not > to see it, please file a bug on that, too. When I revieded it, there had been an alert which was closed/ignored by the perf sheriff without opening an issue. I don't know the trail there, but I re-opened and created this bug. Part of the issue here is that a number of sub-stories were failing between 477411 to 488973 (May 6 to July 24). Thus the alert on the top level metric triggered over 1 month after ThinLTO landed. Perhaps there is some value here in computing a partial metric based on available results, and alerting on that.
,
Aug 15 2017
mtklein: Thanks, I will see how much that helps. Unfortunately I am still waiting for my FullLTO build of chrome to link, so I can't do any perf measurements yet.
,
Aug 15 2017
@#30: I agree in theory. We should have performance benchmarks that catch the regressions as early as possible. But in practice some things fall through the cracks. Insufficient coverage, tests breaking / being disabled (what happened here IIUC) etc. This is akin to unit tests - ideally we have 100% coverage, but in practice we don't, and CLs land that break things. We still have to fix or revert, including late in the game (agree, it's not ideal), to make sure we ship a high quality browser to our users. I think for changes that can have a significant impact all across the codebase, I don't think it's unreasonable to keep a way to go back until it lands safely on stable.
,
Aug 15 2017
Re #35: Thanks, Victor! The alerts were configured to report on the summary. I changed them to the individual pages. This should ensure we catch issues like this in the future, and make the bisect faster and more accurate.
,
Aug 15 2017
Re 36: Cool beans. I'll wait to land it until I hear from you.
,
Aug 15 2017
According to my benchmarking mtklein's patch improved performance on raster time beyond the regression caused by ThinLTO on 23 out of 25 benchmarks. I just did one run, but the results seem conclusive enough for now IMHO; we can let the perf bots provide the final answer. In the attached results.html file official_fulllto is an official build of r492215 with full LTO, official_thinlto.bak is the same with ThinLTO and official_thinlto is the same with mtklein's patch applied to skia.
,
Aug 15 2017
Nice. I'll get it going in the CQ.
,
Aug 15 2017
The following revision refers to this bug: https://skia.googlesource.com/skia/+/25954b64c066b143819bc720b20a9b4287042ecc commit 25954b64c066b143819bc720b20a9b4287042ecc Author: Mike Klein <mtklein@chromium.org> Date: Tue Aug 15 22:28:17 2017 explicitly vectorize sk_memset{16,32,64} This ought to help clients who don't enable autovectorization. With autovectorization enabled, this new version is like, hyper-vectorized compared to the old autovectorization. Instead of handling 128 bytes max per loop, it now handles up to 512 bytes per loop. Pretty exciting. Locally perf effects are a mix, but we'd expect this to help Chrome unambiguously if they've turned off autovectorization. $ out/ok bench:samples=100 sw filter:match=memset32_\\d\* serial Before: [memset32_100000] 16ms @0 20.1ms @99 20.2ms @100 [memset32_10000] 1.07ms @0 1.26ms @99 1.31ms @100 [memset32_1000] 73.9µs @0 89.4µs @99 90.1µs @100 [memset32_100] 8.59µs @0 9.74µs @99 9.96µs @100 [memset32_10] 7.45µs @0 8.96µs @99 8.99µs @100 [memset32_1] 2.29µs @0 2.81µs @99 2.92µs @100 After: [memset32_100000] 16.2ms @0 17.3ms @99 17.3ms @100 [memset32_10000] 1.06ms @0 1.18ms @99 1.23ms @100 [memset32_1000] 72µs @0 75.6µs @99 84.7µs @100 [memset32_100] 9.14µs @0 10.6µs @99 10.7µs @100 [memset32_10] 5.43µs @0 5.88µs @99 5.99µs @100 [memset32_1] 3.43µs @0 3.65µs @99 3.83µs @100 BUG= chromium:755391 Change-Id: If9059a30ca7a345f1f7c37bd51473c29e8bb8922 Reviewed-on: https://skia-review.googlesource.com/34746 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> [modify] https://crrev.com/25954b64c066b143819bc720b20a9b4287042ecc/src/opts/SkUtils_opts.h
,
Aug 15 2017
=== BISECT JOB RESULTS === Bisect failed for unknown reasons Please contact the team (see below) and report the error. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/rasterize_time To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241590839729952 For feedback, file a bug with component Speed>Bisection
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e2b929d9e09c8701f0c07d740f41356a56d75d17 commit e2b929d9e09c8701f0c07d740f41356a56d75d17 Author: skia-deps-roller@chromium.org <skia-deps-roller@chromium.org> Date: Wed Aug 16 00:13:46 2017 Roll src/third_party/skia/ 15bb26ec7..25954b64c (8 commits) https://skia.googlesource.com/skia.git/+log/15bb26ec70c9..25954b64c066 $ git log 15bb26ec7..25954b64c --date=short --no-merges --format='%ad %ae %s' 2017-08-15 mtklein explicitly vectorize sk_memset{16,32,64} 2017-08-15 mtklein upstream cr/165303354 2017-08-15 herb Fix bogus math in object allocation. 2017-08-15 fmalita Revert "Fix bogus math in object allocation." 2017-08-15 bungeman Remove SkTypeface::style(). 2017-08-15 herb Fix bogus math in object allocation. 2017-08-15 fmalita Skip bilerp for integral-translate-only matrices (!clamp-clamp case) 2017-08-15 brianosman Speed up convexpaths GM in debug builds Created with: roll-dep src/third_party/skia BUG= 755391 ,744109,744109, 744674 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=bsalomon@chromium.org Change-Id: I14e7599bb7b37734099f0984baf5e6488e5b3e10 Reviewed-on: https://chromium-review.googlesource.com/616164 Reviewed-by: Skia Deps Roller <skia-deps-roller@chromium.org> Commit-Queue: Skia Deps Roller <skia-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#494622} [modify] https://crrev.com/e2b929d9e09c8701f0c07d740f41356a56d75d17/DEPS
,
Aug 16 2017
=== BISECT JOB RESULTS === Bisect failed for unknown reasons Please contact the team (see below) and report the error. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/Docs (1 open document tab) To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Docs...1.open.document.tab. rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971241300691123712 For feedback, file a bug with component Speed>Bisection
,
Aug 16 2017
=== BISECT JOB RESULTS === Bisect failed for unknown reasons Please contact the team (see below) and report the error. Bisect Details Configuration: linux_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : rasterize_time/Twitter To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Twitter rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8971240127581427680 For feedback, file a bug with component Speed>Bisection
,
Aug 16 2017
,
Aug 16 2017
The first data point is in after the Skia change in #44. https://chromeperf.appspot.com/report?sid=c0c9ce5e0718dd45cdbb663bfb5479a29da98c60dfd9f65993beff46853614d8&start_rev=469899 On the tests that seemed most affected by the ThinLTO change (Pinterest, Blogger) it looks like near 100% recovery. On LinkedIn & ESPN it looks like looks like somewhat of an improvement over the auto-vectorized code before ThinLTO.
,
Aug 16 2017
Rejecting merge to M61 as this is not a stable blocker per comment #47 and we're only taking critical merges in at this point. Please let us know if there is any concern here.
,
Aug 17 2017
This is a blocker, I've made a mistake.
,
Aug 17 2017
Applying back "Merge-Request-61" per comment #50. Per offline chat, this seems to be significant regression and Stable blocker. Please answer following before we approve the merge: * Is change listed at #44 well baked/verified in Canary, having enough automation tests coverage and safe to merge to M61?
,
Aug 17 2017
This bug requires manual review: DEPS changes referenced in bugdroid comments. Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), ketakid@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 17 2017
The change in 42+44 is to a very fundamental Skia routine, covered by pretty much every automated test Skia and Blink have. I'm not quite sure where it is re Canary---we could give it another day or two if you like---but if it weren't working correctly basically every web page would draw wrong. I don't personally have much of an opinion on whether this is critical to land to M61 or not, but I'd be happy to do so.
,
Aug 17 2017
Re #51: There is significant coverage in layout tests. This just landed today, so let's leave it to bake on Canary.
,
Aug 17 2017
Sure, pls update once it looks good in Canary.
,
Aug 17 2017
Sorry, missed the Cl at #42. We will need to merge Cl listed at #42 and #44 after canary coverage once approved, correct?
,
Aug 17 2017
No, it's the same CL. 42 is the real CL, 44 is just us rolling Chromium DEPS to pick it up on master.
,
Aug 17 2017
Ok, got it. Thank you.
,
Aug 18 2017
The NextAction date has arrived: 2017-08-18
,
Aug 18 2017
Assigning to govind@ to answer question in c#51.
,
Aug 18 2017
+vimura@, how is the change looking in Canary per comment #54?
,
Aug 18 2017
govind@: Stability looks good as expected. I think we should go ahead and merge. Looking at the graphs, all improved. Pinterest seems to have regressed at a later point which shouldn't be related to this change. I think that Pinterest started showing 404 due to some other Chrome or Telemetry issue. https://chromeperf.appspot.com/report?sid=c0c9ce5e0718dd45cdbb663bfb5479a29da98c60dfd9f65993beff46853614d8&start_rev=469899
,
Aug 18 2017
Approving merge to M61 branch 3163 based on comment #62. Please merge. Thank you.
,
Aug 18 2017
Assigning to mtklein@ to merge.
,
Aug 21 2017
mtklein: there was a small regression in CPU time with the explicit unrolling change on Android, Nexus 5. Issue 757247 . Test case http://output.jsbin.com/beqojupo/1/quiet?JS_FULL_SCREEN_INVALIDATION with software raster. I think we shouldn't block this merge on that since Android uses Ganesh a majority of the time, so just FYI.
,
Aug 21 2017
[Bulk Edit] URGENT - PTAL. M61 Stable promotion is coming soon and your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and get it merged into the release branch ASAP. Know that this issue shouldn't block the release? Remove the ReleaseBlock-Stable label or move to M62. Thank you! Note: We will only have 2 beta releases before Stable promotion. Plan is to cut M61 Stable RC on 08/31/17.
,
Aug 22 2017
> I think we shouldn't block this merge on that since Android uses Ganesh a majority of the time, so just FYI. Completely agreed.
,
Aug 22 2017
The following revision refers to this bug: https://skia.googlesource.com/skia/+/0eefc0552cfb5ac077560b7c2630c5bd475ea585 commit 0eefc0552cfb5ac077560b7c2630c5bd475ea585 Author: Mike Klein <mtklein@chromium.org> Date: Tue Aug 22 14:42:03 2017 explicitly vectorize sk_memset{16,32,64} This ought to help clients who don't enable autovectorization. With autovectorization enabled, this new version is like, hyper-vectorized compared to the old autovectorization. Instead of handling 128 bytes max per loop, it now handles up to 512 bytes per loop. Pretty exciting. Locally perf effects are a mix, but we'd expect this to help Chrome unambiguously if they've turned off autovectorization. $ out/ok bench:samples=100 sw filter:match=memset32_\\d\* serial Before: [memset32_100000] 16ms @0 20.1ms @99 20.2ms @100 [memset32_10000] 1.07ms @0 1.26ms @99 1.31ms @100 [memset32_1000] 73.9µs @0 89.4µs @99 90.1µs @100 [memset32_100] 8.59µs @0 9.74µs @99 9.96µs @100 [memset32_10] 7.45µs @0 8.96µs @99 8.99µs @100 [memset32_1] 2.29µs @0 2.81µs @99 2.92µs @100 After: [memset32_100000] 16.2ms @0 17.3ms @99 17.3ms @100 [memset32_10000] 1.06ms @0 1.18ms @99 1.23ms @100 [memset32_1000] 72µs @0 75.6µs @99 84.7µs @100 [memset32_100] 9.14µs @0 10.6µs @99 10.7µs @100 [memset32_10] 5.43µs @0 5.88µs @99 5.99µs @100 [memset32_1] 3.43µs @0 3.65µs @99 3.83µs @100 BUG= chromium:755391 Change-Id: If9059a30ca7a345f1f7c37bd51473c29e8bb8922 Reviewed-on: https://skia-review.googlesource.com/34746 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-on: https://skia-review.googlesource.com/37000 Reviewed-by: Mike Klein <mtklein@chromium.org> [modify] https://crrev.com/0eefc0552cfb5ac077560b7c2630c5bd475ea585/src/opts/SkUtils_opts.h
,
Aug 22 2017
Per comment #68, this is already merged to M61. So removing "Merge-Approved-61" label. Please let me know if there is anything is still pending for M61. Thank you.
,
Aug 22 2017
As far as I know that's all. |
|||||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Aug 14 2017