New issue
Advanced search Search tips

Issue 750184 link

Starred by 1 user

Issue metadata

Status: WontFix
Merged: issue 749359
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

10.2%-11.1% regression in blink_perf.layout at 489038:489212

Project Member Reported by majidvp@google.com, Jul 28 2017

Issue description

Regression is limited to win7 bots but interestingly it is a 
regression in layout and paint metrics.


 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 28 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=750184

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=e165d01deb474a8099bebfbf69dff847d169690840b736c1e80793b06622a3d9


Bot(s) for this bug's original alert(s):

chromium-rel-win7-gpu-ati
chromium-rel-win7-gpu-intel
chromium-rel-win7-gpu-nvidia
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 28 2017

Cc: bsheedy@chromium.org
Owner: bsheedy@chromium.org

=== Auto-CCing suspected CL author bsheedy@chromium.org ===

Hi bsheedy@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : bsheedy
  Commit : df7a4c59b34a2075f3ddb61042b07ab1704747c6
  Date   : Mon Jul 24 22:51:54 2017
  Subject: Make VR perf tests compatible with bots

Bisect Details
  Configuration: winx64nvidia_perf_bisect
  Benchmark    : blink_perf.layout
  Metric       : floats_20_100_nested/floats_20_100_nested
  Change       : 5.87% | 300.641 -> 318.291666667

Revision             Result                  N
chromium@489098      300.641 +- 2.91093      6      good
chromium@489113      301.826 +- 3.99107      6      good
chromium@489114      308.214 +- 5.80372      6      bad       <--
chromium@489115      309.398 +- 4.42409      6      bad
chromium@489117      309.105 +- 2.82922      6      bad
chromium@489120      309.398 +- 9.30412      6      bad
chromium@489127      309.685 +- 3.99573      6      bad
chromium@489155      310.671 +- 8.19883      6      bad
chromium@489212      318.292 +- 5.018        6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests blink_perf.layout

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972811693163430192


For feedback, file a bug with component Speed>Bisection
Owner: ----
This CL shouldn't be touching anything other than a couple of non-Telemetry VR perf tests that are only run on our FYI bot. Starting another bisect.
Project Member

Comment 6 by 42576172...@developer.gserviceaccount.com, Jul 29 2017

Cc: jsc...@chromium.org
Owner: jsc...@chromium.org

=== Auto-CCing suspected CL author jschuh@chromium.org ===

Hi jschuh@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Justin Schuh
  Commit : 576f51b8db142fbf8515443ff413ebe37668b929
  Date   : Tue Jul 25 00:26:20 2017
  Subject: Reduce branches in ClampedNumeric

Bisect Details
  Configuration: winx64nvidia_perf_bisect
  Benchmark    : blink_perf.layout
  Metric       : floats_20_100_nested/floats_20_100_nested
  Change       : 5.88% | 301.8355 -> 319.594166667

Revision             Result                  N
chromium@489098      301.836 +- 1.73802      6      good
chromium@489155      310.548 +- 1.56554      6      good
chromium@489157      310.465 +- 4.57951      6      good
chromium@489158      309.737 +- 5.66977      6      good
chromium@489159      322.383 +- 4.81887      6      bad       <--
chromium@489163      320.651 +- 9.48778      6      bad
chromium@489170      321.704 +- 3.99937      6      bad
chromium@489184      319.062 +- 2.75816      6      bad
chromium@489212      319.594 +- 6.69139      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests blink_perf.layout

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972773867908602736


For feedback, file a bug with component Speed>Bisection

Comment 7 by jsc...@chromium.org, Jul 29 2017

Mergedinto: 749359
Status: Duplicate (was: Untriaged)

Comment 8 by jsc...@chromium.org, Jul 29 2017

Status: Assigned (was: Duplicate)
Merged prematurely.

Comment 9 by jsc...@chromium.org, Jul 29 2017

This certainly could be me. It's a bit moot since we switched Windows to Clang yesterday, where the new code isn't showing a performance difference (and shouldn't in fact be any different based on my examination of the emitted assembler).

That stated, this is a very interesting case. The new code removes a branch on Intel when addition or subtraction saturate. So, the performance regression implies that saturation in those cases is very uncommon, and thus the previous branching code (which executed three fewer instructions in the likely path) was highly predictable. That would mean that it might actually be noticeably better for Clang to branch on computing the saturation value rather than precomputing it and using a CMOV (as it has consistently done).

So, we could see a performance bump on clang if I use __builtin_expect to force branching code on Intel. And looking at the assembly, doing that wouldn't change the Arm code at all, so it should be cross-platform safe.

Circling back to MSVC, I'm inclined to just abandon it given the switch to Clang. It has no equivalent to __builtin_expect, and if you PGO the build you should get the old behavior anyway. (We PGO official release builds, but it takes too long to do so for perf bot builds.)

eae@ - What do you think about using __builtin_expect aggressively on the saturation/overflow cases? I can do it in the same instruction count as the current CMOV case, and if it's highly predictable it should be a perf win.

> This certainly could be me. It's a bit moot since we switched Windows to Clang yesterday, where the new code isn't showing a performance difference (and shouldn't in fact be any different based on my examination of the emitted assembler).

Actually the metrics appear to have recovered a bit as well but not all the way to the previous value. It is about a %1 regression now. I suspect this is also another side effect of switching to clang. 

> eae@ - What do you think about using __builtin_expect aggressively on the saturation/overflow cases? I can do it in the same instruction count as the current CMOV case, and if it's highly predictable it should be a perf win.

I will leave it up to eae@ but it seems the suggestioned improvement
is an easy win which may hopefully help us recover from the remaining lost %1.


> Actually the metrics appear to have recovered a bit as well but not all the way to the previous value. It is about a %1 regression now. I suspect this is also another side effect of switching to clang. 

Just to clarify, my CL didn't regress the performance here on Clang binaries. However, given that we just switched compilers on Windows, all the performance baselines are expected to shift a bit in either direction.
Components: Internals>BaseNumerics
Project Member

Comment 13 by bugdroid1@chromium.org, Aug 1 2017

Project Member

Comment 14 by bugdroid1@chromium.org, Aug 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71

commit 4ad909a931bd1f3e2b1a9861b251096c7b8f9b71
Author: Justin Schuh <jschuh@chromium.org>
Date: Thu Aug 03 13:40:42 2017

Improve branching in CheckedNumeric and ClampedNumeric

* Applied BASE_NUMERICS_LIKELY to all predictable clamped and checked
  branches. (I skipped things like abs, where the branch is not
  reliably predictable).
* Simplified CheckedAddOp and CheckedSubOp, and optimized the generic
  code for smaller types (although Clang will always use the builtins).
* Expanded the cases where CheckedMulOp will use the Clang builtin
  (this branch is compile-time determined, and should be an
  improvement on 64-bit).
* Restructured CheckedDivOp to allow the compiler remove more of the
  overflow checking based on compile-time type checks.
* Simplified CheckedModOp by putting all the logic in one function.
* Simplified branching and eliminated multiplication in CheckedLshOp
  and ClampedLshOp (much more compact code on Arm, but still
  unavoidably a bit branchy).
* Removed branch in CheckMathOp (variadic template wrapper).

Note: Arm code should remain mostly non-branching, unless Clang's
optimizer behavior changes in the future to assume better branch
prediction (at which point it should branch on the exceptional
conditions, like Intel currently does).

Bug:  750184 
Change-Id: I9948b94722d64e9a5bf363690acf9086dae0c65d
Reviewed-on: https://chromium-review.googlesource.com/592852
Commit-Queue: Justin Schuh <jschuh@chromium.org>
Reviewed-by: Emil A Eklund <eae@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491725}
[modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/checked_math.h
[modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/checked_math_impl.h
[modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/numerics/clamped_math_impl.h
[modify] https://crrev.com/4ad909a931bd1f3e2b1a9861b251096c7b8f9b71/base/safe_numerics_unittest.cc

jschuh: anything else to do on this bug?
Status: WontFix (was: Assigned)
Nah. Closing it since it will be invalidated when we switch back to Clang.
Labels: Performance-Noise
Adding Performance-Noise label due to bisect that landed on wrong revision.

Sign in to add a comment