Issue metadata
Sign in to add a comment
|
10.2%-11.1% regression in blink_perf.layout at 489038:489212 |
||||||||||||||||||||||||
Issue descriptionRegression is limited to win7 bots but interestingly it is a regression in layout and paint metrics.
,
Jul 28 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8972811693163430192
,
Jul 28 2017
=== Auto-CCing suspected CL author bsheedy@chromium.org === Hi bsheedy@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : bsheedy Commit : df7a4c59b34a2075f3ddb61042b07ab1704747c6 Date : Mon Jul 24 22:51:54 2017 Subject: Make VR perf tests compatible with bots Bisect Details Configuration: winx64nvidia_perf_bisect Benchmark : blink_perf.layout Metric : floats_20_100_nested/floats_20_100_nested Change : 5.87% | 300.641 -> 318.291666667 Revision Result N chromium@489098 300.641 +- 2.91093 6 good chromium@489113 301.826 +- 3.99107 6 good chromium@489114 308.214 +- 5.80372 6 bad <-- chromium@489115 309.398 +- 4.42409 6 bad chromium@489117 309.105 +- 2.82922 6 bad chromium@489120 309.398 +- 9.30412 6 bad chromium@489127 309.685 +- 3.99573 6 bad chromium@489155 310.671 +- 8.19883 6 bad chromium@489212 318.292 +- 5.018 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests blink_perf.layout More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8972811693163430192 For feedback, file a bug with component Speed>Bisection
,
Jul 29 2017
This CL shouldn't be touching anything other than a couple of non-Telemetry VR perf tests that are only run on our FYI bot. Starting another bisect.
,
Jul 29 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8972773867908602736
,
Jul 29 2017
=== Auto-CCing suspected CL author jschuh@chromium.org === Hi jschuh@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Justin Schuh Commit : 576f51b8db142fbf8515443ff413ebe37668b929 Date : Tue Jul 25 00:26:20 2017 Subject: Reduce branches in ClampedNumeric Bisect Details Configuration: winx64nvidia_perf_bisect Benchmark : blink_perf.layout Metric : floats_20_100_nested/floats_20_100_nested Change : 5.88% | 301.8355 -> 319.594166667 Revision Result N chromium@489098 301.836 +- 1.73802 6 good chromium@489155 310.548 +- 1.56554 6 good chromium@489157 310.465 +- 4.57951 6 good chromium@489158 309.737 +- 5.66977 6 good chromium@489159 322.383 +- 4.81887 6 bad <-- chromium@489163 320.651 +- 9.48778 6 bad chromium@489170 321.704 +- 3.99937 6 bad chromium@489184 319.062 +- 2.75816 6 bad chromium@489212 319.594 +- 6.69139 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests blink_perf.layout More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8972773867908602736 For feedback, file a bug with component Speed>Bisection
,
Jul 29 2017
,
Jul 29 2017
Merged prematurely.
,
Jul 29 2017
This certainly could be me. It's a bit moot since we switched Windows to Clang yesterday, where the new code isn't showing a performance difference (and shouldn't in fact be any different based on my examination of the emitted assembler). That stated, this is a very interesting case. The new code removes a branch on Intel when addition or subtraction saturate. So, the performance regression implies that saturation in those cases is very uncommon, and thus the previous branching code (which executed three fewer instructions in the likely path) was highly predictable. That would mean that it might actually be noticeably better for Clang to branch on computing the saturation value rather than precomputing it and using a CMOV (as it has consistently done). So, we could see a performance bump on clang if I use __builtin_expect to force branching code on Intel. And looking at the assembly, doing that wouldn't change the Arm code at all, so it should be cross-platform safe. Circling back to MSVC, I'm inclined to just abandon it given the switch to Clang. It has no equivalent to __builtin_expect, and if you PGO the build you should get the old behavior anyway. (We PGO official release builds, but it takes too long to do so for perf bot builds.) eae@ - What do you think about using __builtin_expect aggressively on the saturation/overflow cases? I can do it in the same instruction count as the current CMOV case, and if it's highly predictable it should be a perf win.
,
Jul 31 2017
> This certainly could be me. It's a bit moot since we switched Windows to Clang yesterday, where the new code isn't showing a performance difference (and shouldn't in fact be any different based on my examination of the emitted assembler). Actually the metrics appear to have recovered a bit as well but not all the way to the previous value. It is about a %1 regression now. I suspect this is also another side effect of switching to clang. > eae@ - What do you think about using __builtin_expect aggressively on the saturation/overflow cases? I can do it in the same instruction count as the current CMOV case, and if it's highly predictable it should be a perf win. I will leave it up to eae@ but it seems the suggestioned improvement is an easy win which may hopefully help us recover from the remaining lost %1.
,
Jul 31 2017
> Actually the metrics appear to have recovered a bit as well but not all the way to the previous value. It is about a %1 regression now. I suspect this is also another side effect of switching to clang. Just to clarify, my CL didn't regress the performance here on Clang binaries. However, given that we just switched compilers on Windows, all the performance baselines are expected to shift a bit in either direction.
,
Jul 31 2017
,
Aug 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda commit 5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda Author: Justin Schuh <jschuh@chromium.org> Date: Tue Aug 01 04:58:07 2017 Improve checked_cast and saturated_cast performance This is a small performance and code size improvement for the general case of integer range checks and saturated conversions. NOTRY=True Bug: 750184 Change-Id: Ibef8419ec53ecb88a23c81b64d11421c9f74b724 Reviewed-on: https://chromium-review.googlesource.com/591052 Commit-Queue: Justin Schuh <jschuh@chromium.org> Reviewed-by: Emil A Eklund <eae@chromium.org> Cr-Commit-Position: refs/heads/master@{#490852} [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/clamped_math_impl.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/safe_conversions.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/safe_conversions_arm_impl.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/safe_conversions_impl.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/safe_math_arm_impl.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/numerics/safe_math_clang_gcc_impl.h [modify] https://crrev.com/5cc2abf4c5a12d1a7a79f86106fa98515ba4bbda/base/safe_numerics_unittest.cc
,
Aug 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71 commit 4ad909a931bd1f3e2b1a9861b251096c7b8f9b71 Author: Justin Schuh <jschuh@chromium.org> Date: Thu Aug 03 13:40:42 2017 Improve branching in CheckedNumeric and ClampedNumeric * Applied BASE_NUMERICS_LIKELY to all predictable clamped and checked branches. (I skipped things like abs, where the branch is not reliably predictable). * Simplified CheckedAddOp and CheckedSubOp, and optimized the generic code for smaller types (although Clang will always use the builtins). * Expanded the cases where CheckedMulOp will use the Clang builtin (this branch is compile-time determined, and should be an improvement on 64-bit). * Restructured CheckedDivOp to allow the compiler remove more of the overflow checking based on compile-time type checks. * Simplified CheckedModOp by putting all the logic in one function. * Simplified branching and eliminated multiplication in CheckedLshOp and ClampedLshOp (much more compact code on Arm, but still unavoidably a bit branchy). * Removed branch in CheckMathOp (variadic template wrapper). Note: Arm code should remain mostly non-branching, unless Clang's optimizer behavior changes in the future to assume better branch prediction (at which point it should branch on the exceptional conditions, like Intel currently does). Bug: 750184 Change-Id: I9948b94722d64e9a5bf363690acf9086dae0c65d Reviewed-on: https://chromium-review.googlesource.com/592852 Commit-Queue: Justin Schuh <jschuh@chromium.org> Reviewed-by: Emil A Eklund <eae@chromium.org> Cr-Commit-Position: refs/heads/master@{#491725} [modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/checked_math.h [modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/checked_math_impl.h [modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/clamped_math_impl.h [modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/safe_numerics_unittest.cc
,
Aug 21 2017
jschuh: anything else to do on this bug?
,
Aug 21 2017
Nah. Closing it since it will be invalidated when we switch back to Clang.
,
Aug 29 2017
Adding Performance-Noise label due to bisect that landed on wrong revision. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 28 2017