Issue metadata
Sign in to add a comment
|
10.2%-15.9% regression in blink_perf.paint at 488948:489012 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 27 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8972959276066489328
,
Jul 27 2017
=== Auto-CCing suspected CL author jschuh@chromium.org === Hi jschuh@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Justin Schuh Commit : 4a572da99ce704e632221bd05e6d894d49a73dcc Date : Mon Jul 24 14:37:24 2017 Subject: Replace Saturated*() calls with Clamp*() templates Bisect Details Configuration: winx64_10_perf_bisect Benchmark : blink_perf.layout Metric : floats_20_100/floats_20_100 Change : 15.00% | 233.461833333 -> 268.483666667 Revision Result N chromium@488947 233.462 +- 0.468311 6 good chromium@488964 233.982 +- 0.972761 6 good chromium@488965 238.568 +- 1.5997 6 good chromium@488966 262.346 +- 0.894873 6 bad <-- chromium@488968 269.182 +- 0.970667 6 bad chromium@488972 261.88 +- 1.72175 6 bad chromium@488980 261.825 +- 0.994219 6 bad chromium@489012 268.484 +- 4.77013 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests blink_perf.layout More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8972959276066489328 For feedback, file a bug with component Speed>Bisection
,
Jul 27 2017
Interesting... It certainly seems possible that my CL was the cause here. The only functional difference on MSVC after template expansion should be that the new code is less aggressive about the inlining (the old code used __forceinline). I'll have to look at some disassembly to verify, but the hot functions here almost certainly involve the switch from SaturatedAddition/SaturatedSubtraction to ClampAdd/ClampSub -- both sets of which appear to compile down to identical code on MSVC. However, they also weigh in at ~14 instructions on Intel, which is large enough that the optimizer is unlikely to inline the new code unless it's forced. I'll see about forcing inlines more aggressively in the new code, and if it doesn't significantly bloat binary size (because the new code is used more broadly) then I'll do it. That stated, this regression is likely to be entirely moot. We're switching to Clang on Windows in the next few days, so this regression shouldn't actually ship to Windows users on stable channel. Rather, this CL should result in a performance improvement on Windows, because we'll start using the Clang overflow builtins. That will replace the current ~14 instructions on Intel with ~5 instructions, which are almost certain to be inlined. Although, that might also be a very good argument to just aggressively force inlining on all compilers, since it won't negatively affect code size on Clang, which we'll soon be using everywhere. @eae for his thoughts.
,
Jul 27 2017
Let's wait and see what the effects are once we switch to clang.
,
Jul 27 2017
,
Jul 28 2017
,
Jul 29 2017
,
Jul 31 2017
,
Jul 31 2017
Issue 750672 has been merged into this issue.
,
Jul 31 2017
It looks like this is now lost in the noise from the switch to clang. Some baselines got better, and some got worse, but since it's not MSVC anymore I'm inclined to close it.
,
Aug 17 2017
WontFix-ing per #11 |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 27 2017