New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 750923 link

Starred by 8 users

54.9% regression in media_perftests at 490448:490515

Project Member Reported by sande...@chromium.org, Jul 31 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 31 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=750923

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=f1ba06349eb79b27e5f729fb332e6d9eb01141fbb8c84306fec896d9b0043a9f


Bot(s) for this bug's original alert(s):

chromium-rel-win8-dual
Cc: h...@chromium.org
Owner: h...@chromium.org

=== Auto-CCing suspected CL author hans@chromium.org ===

Hi hans@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Hans Wennborg
  Commit : d2c91228a51bdf37ae3b2e501fb53c0528f1629c
  Date   : Fri Jul 28 20:11:05 2017
  Subject: win: Set is_clang=true by default

Bisect Details
  Configuration: win_8_perf_bisect
  Benchmark    : media_perftests
  Metric       : audio_converter/convert_fifo_only
  Change       : 53.55% | 1369035.03985 -> 635869.919234

Revision             Result                  N
chromium@490447      1369035 +- 18625.6      6      good
chromium@490481      1372256 +- 12191.3      6      good
chromium@490490      1361809 +- 27941.8      6      good
chromium@490492      1371743 +- 38374.6      6      good
chromium@490493      1367999 +- 11075.7      6      good
chromium@490494      636643 +- 7339.42       6      bad       <--
chromium@490498      637718 +- 6185.32       6      bad
chromium@490515      635870 +- 5850.33       6      bad

To Run This Test
  .\src\out\Release\media_perftests.exe --single-process-tests

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972511192980415776


For feedback, file a bug with component Speed>Bisection
Cc: habl...@google.com hablich@chromium.org bmeu...@chromium.org
 Issue 750672  has been merged into this issue.
Cc: yukishiino@chromium.org haraken@chromium.org r...@opera.com robho...@gmail.com wangxianzhu@chromium.org majidvp@chromium.org hayato@chromium.org pdfium-deps-roller@chromium.org jbroman@chromium.org
Issue 750703 has been merged into this issue.

Comment 6 by h...@chromium.org, Aug 1 2017

Status: Started (was: Untriaged)
Started looking.
Issue 750703 has been merged into this issue.
Issue 750703 has been merged into this issue.

Comment 9 by h...@chromium.org, Aug 1 2017

The perf drop is due to FMAC_FUNC in media/base/vector_math.cc chosing the non-intrinsic version with win/clang. Clang is supposed to be able to autovectorize that to get better performance, but for some reason that's not happening here.

(Kind of related: https://chromium-review.googlesource.com/557059)

Comment 10 by h...@chromium.org, Aug 1 2017

(The audio converter code is valling it as vector_math::FMAC)

Comment 11 by h...@chromium.org, Aug 1 2017

*calling it

Comment 12 by h...@chromium.org, Aug 1 2017

In my build, FMAC_C looks like this:

?FMAC_C@vector_math@media@@YAXQEBMMHQEAM@Z (void __cdecl media::vector_math::FMAC_C(float const * const,float,int,float * const)):
  0000000000000000: 45 85 C0           test        r8d,r8d
  0000000000000003: 7E 22              jle         0000000000000027
  0000000000000005: 44 89 C0           mov         eax,r8d
  0000000000000008: F3 0F 10 01        movss       xmm0,dword ptr [rcx]
  000000000000000C: F3 0F 59 C1        mulss       xmm0,xmm1
  0000000000000010: F3 41 0F 58 01     addss       xmm0,dword ptr [r9]
  0000000000000015: F3 41 0F 11 01     movss       dword ptr [r9],xmm0
  000000000000001A: 48 83 C1 04        add         rcx,4
  000000000000001E: 49 83 C1 04        add         r9,4
  0000000000000022: 48 FF C8           dec         rax
  0000000000000025: 75 E1              jne         0000000000000008
  0000000000000027: C3                 ret


Clang is getting invoked with /O1 for this code, which we interpret as "optimize for size" (it becomes -Os in the cc1 invocation).

If I pass -O2 instead, we get fancier code:

?FMAC_C@vector_math@media@@YAXQEBMMHQEAM@Z (void __cdecl media::vector_math::FMAC_C(float const * const,float,int,float * const)):
  0000000000000000: 56                 push        rsi
  0000000000000001: 45 85 C0           test        r8d,r8d
  0000000000000004: 0F 8E 40 01 00 00  jle         000000000000014A
  000000000000000A: 45 89 C2           mov         r10d,r8d
  000000000000000D: 41 83 F8 07        cmp         r8d,7
  0000000000000011: 76 1A              jbe         000000000000002D
  0000000000000013: 4A 8D 04 91        lea         rax,[rcx+r10*4]
  0000000000000017: 4C 39 C8           cmp         rax,r9
  000000000000001A: 0F 86 C9 00 00 00  jbe         00000000000000E9
  0000000000000020: 4B 8D 04 91        lea         rax,[r9+r10*4]
  0000000000000024: 48 39 C8           cmp         rax,rcx
  0000000000000027: 0F 86 BC 00 00 00  jbe         00000000000000E9
  000000000000002D: 31 D2              xor         edx,edx
  000000000000002F: 44 89 D0           mov         eax,r10d
  0000000000000032: 29 D0              sub         eax,edx
  0000000000000034: 4D 8D 42 FF        lea         r8,[r10-1]
  0000000000000038: 49 29 D0           sub         r8,rdx
  000000000000003B: 48 83 E0 03        and         rax,3
  000000000000003F: 74 2C              je          000000000000006D
  0000000000000041: 48 F7 D8           neg         rax
  0000000000000044: 66 66 66 2E 0F 1F  nop         word ptr cs:[rax+rax+0000000000000000h]
                    84 00 00 00 00 00
  0000000000000050: F3 0F 10 04 91     movss       xmm0,dword ptr [rcx+rdx*4]
  0000000000000055: F3 0F 59 C1        mulss       xmm0,xmm1
  0000000000000059: F3 41 0F 58 04 91  addss       xmm0,dword ptr [r9+rdx*4]
  000000000000005F: F3 41 0F 11 04 91  movss       dword ptr [r9+rdx*4],xmm0
  0000000000000065: 48 FF C2           inc         rdx
  0000000000000068: 48 FF C0           inc         rax
  000000000000006B: 75 E3              jne         0000000000000050
  000000000000006D: 49 83 F8 03        cmp         r8,3
  0000000000000071: 0F 82 D3 00 00 00  jb          000000000000014A
  0000000000000077: 49 29 D2           sub         r10,rdx
  000000000000007A: 49 8D 44 91 0C     lea         rax,[r9+rdx*4+0Ch]
  000000000000007F: 48 8D 4C 91 0C     lea         rcx,[rcx+rdx*4+0Ch]
  0000000000000084: 66 66 66 2E 0F 1F  nop         word ptr cs:[rax+rax+0000000000000000h]
                    84 00 00 00 00 00
  0000000000000090: F3 0F 10 41 F4     movss       xmm0,dword ptr [rcx-0Ch]
  0000000000000095: F3 0F 59 C1        mulss       xmm0,xmm1
  0000000000000099: F3 0F 58 40 F4     addss       xmm0,dword ptr [rax-0Ch]
  000000000000009E: F3 0F 11 40 F4     movss       dword ptr [rax-0Ch],xmm0
  00000000000000A3: F3 0F 10 41 F8     movss       xmm0,dword ptr [rcx-8]
  00000000000000A8: F3 0F 59 C1        mulss       xmm0,xmm1
  00000000000000AC: F3 0F 58 40 F8     addss       xmm0,dword ptr [rax-8]
  00000000000000B1: F3 0F 11 40 F8     movss       dword ptr [rax-8],xmm0
  00000000000000B6: F3 0F 10 41 FC     movss       xmm0,dword ptr [rcx-4]
  00000000000000BB: F3 0F 59 C1        mulss       xmm0,xmm1
  00000000000000BF: F3 0F 58 40 FC     addss       xmm0,dword ptr [rax-4]
  00000000000000C4: F3 0F 11 40 FC     movss       dword ptr [rax-4],xmm0
  00000000000000C9: F3 0F 10 01        movss       xmm0,dword ptr [rcx]
  00000000000000CD: F3 0F 59 C1        mulss       xmm0,xmm1
  00000000000000D1: F3 0F 58 00        addss       xmm0,dword ptr [rax]
  00000000000000D5: F3 0F 11 00        movss       dword ptr [rax],xmm0
  00000000000000D9: 48 83 C0 10        add         rax,10h
  00000000000000DD: 48 83 C1 10        add         rcx,10h
  00000000000000E1: 49 83 C2 FC        add         r10,0FFFFFFFFFFFFFFFCh
  00000000000000E5: 75 A9              jne         0000000000000090
  00000000000000E7: EB 61              jmp         000000000000014A
  00000000000000E9: 41 83 E0 07        and         r8d,7
  00000000000000ED: 4C 89 D2           mov         rdx,r10
  00000000000000F0: 4C 29 C2           sub         rdx,r8
  00000000000000F3: 0F 28 C1           movaps      xmm0,xmm1
  00000000000000F6: 0F C6 C0 00        shufps      xmm0,xmm0,0
  00000000000000FA: 4C 8D 59 10        lea         r11,[rcx+10h]
  00000000000000FE: 49 8D 41 10        lea         rax,[r9+10h]
  0000000000000102: 48 89 D6           mov         rsi,rdx
  0000000000000105: 66 66 2E 0F 1F 84  nop         word ptr cs:[rax+rax+0000000000000000h]
                    00 00 00 00 00
  0000000000000110: 41 0F 10 53 F0     movups      xmm2,xmmword ptr [r11-10h]
  0000000000000115: 41 0F 10 1B        movups      xmm3,xmmword ptr [r11]
  0000000000000119: 0F 59 D0           mulps       xmm2,xmm0
  000000000000011C: 0F 59 D8           mulps       xmm3,xmm0
  000000000000011F: 0F 10 60 F0        movups      xmm4,xmmword ptr [rax-10h]
  0000000000000123: 0F 58 E2           addps       xmm4,xmm2
  0000000000000126: 0F 10 10           movups      xmm2,xmmword ptr [rax]
  0000000000000129: 0F 58 D3           addps       xmm2,xmm3
  000000000000012C: 0F 11 60 F0        movups      xmmword ptr [rax-10h],xmm4
  0000000000000130: 0F 11 10           movups      xmmword ptr [rax],xmm2
  0000000000000133: 49 83 C3 20        add         r11,20h
  0000000000000137: 48 83 C0 20        add         rax,20h
  000000000000013B: 48 83 C6 F8        add         rsi,0FFFFFFFFFFFFFFF8h
  000000000000013F: 75 CF              jne         0000000000000110
  0000000000000141: 45 85 C0           test        r8d,r8d
  0000000000000144: 0F 85 E5 FE FF FF  jne         000000000000002F
  000000000000014A: 5E                 pop         rsi
  000000000000014B: C3                 ret
Project Member

Comment 14 by bugdroid1@chromium.org, Aug 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf

commit 7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf
Author: Hans Wennborg <hans@chromium.org>
Date: Wed Aug 02 15:00:08 2017

media: Build vector_math.cc with max optimization

It was currently built at "optimize for size", which means important
function such as FMAC_C weren't getting autovectorized and unrolled.
Since this file is performance sensitive, using max optimization seems
reasonable.

BUG= 750923 

Change-Id: Id38e66365ee1b8b636e391ba07b83a804142eba1
Reviewed-on: https://chromium-review.googlesource.com/597017
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491372}
[modify] https://crrev.com/7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf/media/BUILD.gn

Cc: sullivan@chromium.org
 Issue 751634  has been merged into this issue.

Comment 16 by h...@chromium.org, Aug 2 2017

Status: Fixed (was: Started)
The media_perftests issue is fixed. Please don't dupe other issues against this.
Cc: tebbi@chromium.org
 Issue 752057  has been merged into this issue.
 Issue 752089  has been merged into this issue.
> The media_perftests issue is fixed. Please don't dupe other issues against this.

hans@ dupe happens automatically by bisect bots once they figure out the regression is caused by the same CL. In this case, I expect your CL which
enables clang on window to have caused regression on a few different metrics.
So I suggest you just assign each duped issue to yourself when needs a
different fix. I cannot think of any better way to do this.

 

Comment 20 by h...@chromium.org, Aug 4 2017

I see, thanks for clarifying.
Cc: bsalo...@google.com kouhei@google.com
 Issue 753655  has been merged into this issue.
Project Member

Comment 22 by 42576172...@developer.gserviceaccount.com, Aug 21 2017

 Issue 751639  has been merged into this issue.
Project Member

Comment 23 by 42576172...@developer.gserviceaccount.com, Aug 25 2017

Cc: zmo@chromium.org vmi...@chromium.org agrieve@chromium.org kbr@chromium.org jmad...@chromium.org ericrk@chromium.org
 Issue 750822  has been merged into this issue.

Sign in to add a comment