Issue metadata
Sign in to add a comment
|
54.9% regression in media_perftests at 490448:490515 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 31 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8972511192980415776
,
Aug 1 2017
=== Auto-CCing suspected CL author hans@chromium.org === Hi hans@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Hans Wennborg Commit : d2c91228a51bdf37ae3b2e501fb53c0528f1629c Date : Fri Jul 28 20:11:05 2017 Subject: win: Set is_clang=true by default Bisect Details Configuration: win_8_perf_bisect Benchmark : media_perftests Metric : audio_converter/convert_fifo_only Change : 53.55% | 1369035.03985 -> 635869.919234 Revision Result N chromium@490447 1369035 +- 18625.6 6 good chromium@490481 1372256 +- 12191.3 6 good chromium@490490 1361809 +- 27941.8 6 good chromium@490492 1371743 +- 38374.6 6 good chromium@490493 1367999 +- 11075.7 6 good chromium@490494 636643 +- 7339.42 6 bad <-- chromium@490498 637718 +- 6185.32 6 bad chromium@490515 635870 +- 5850.33 6 bad To Run This Test .\src\out\Release\media_perftests.exe --single-process-tests More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8972511192980415776 For feedback, file a bug with component Speed>Bisection
,
Aug 1 2017
Issue 750672 has been merged into this issue.
,
Aug 1 2017
Issue 750703 has been merged into this issue.
,
Aug 1 2017
Started looking.
,
Aug 1 2017
Issue 750703 has been merged into this issue.
,
Aug 1 2017
Issue 750703 has been merged into this issue.
,
Aug 1 2017
The perf drop is due to FMAC_FUNC in media/base/vector_math.cc chosing the non-intrinsic version with win/clang. Clang is supposed to be able to autovectorize that to get better performance, but for some reason that's not happening here. (Kind of related: https://chromium-review.googlesource.com/557059)
,
Aug 1 2017
(The audio converter code is valling it as vector_math::FMAC)
,
Aug 1 2017
*calling it
,
Aug 1 2017
In my build, FMAC_C looks like this:
?FMAC_C@vector_math@media@@YAXQEBMMHQEAM@Z (void __cdecl media::vector_math::FMAC_C(float const * const,float,int,float * const)):
0000000000000000: 45 85 C0 test r8d,r8d
0000000000000003: 7E 22 jle 0000000000000027
0000000000000005: 44 89 C0 mov eax,r8d
0000000000000008: F3 0F 10 01 movss xmm0,dword ptr [rcx]
000000000000000C: F3 0F 59 C1 mulss xmm0,xmm1
0000000000000010: F3 41 0F 58 01 addss xmm0,dword ptr [r9]
0000000000000015: F3 41 0F 11 01 movss dword ptr [r9],xmm0
000000000000001A: 48 83 C1 04 add rcx,4
000000000000001E: 49 83 C1 04 add r9,4
0000000000000022: 48 FF C8 dec rax
0000000000000025: 75 E1 jne 0000000000000008
0000000000000027: C3 ret
Clang is getting invoked with /O1 for this code, which we interpret as "optimize for size" (it becomes -Os in the cc1 invocation).
If I pass -O2 instead, we get fancier code:
?FMAC_C@vector_math@media@@YAXQEBMMHQEAM@Z (void __cdecl media::vector_math::FMAC_C(float const * const,float,int,float * const)):
0000000000000000: 56 push rsi
0000000000000001: 45 85 C0 test r8d,r8d
0000000000000004: 0F 8E 40 01 00 00 jle 000000000000014A
000000000000000A: 45 89 C2 mov r10d,r8d
000000000000000D: 41 83 F8 07 cmp r8d,7
0000000000000011: 76 1A jbe 000000000000002D
0000000000000013: 4A 8D 04 91 lea rax,[rcx+r10*4]
0000000000000017: 4C 39 C8 cmp rax,r9
000000000000001A: 0F 86 C9 00 00 00 jbe 00000000000000E9
0000000000000020: 4B 8D 04 91 lea rax,[r9+r10*4]
0000000000000024: 48 39 C8 cmp rax,rcx
0000000000000027: 0F 86 BC 00 00 00 jbe 00000000000000E9
000000000000002D: 31 D2 xor edx,edx
000000000000002F: 44 89 D0 mov eax,r10d
0000000000000032: 29 D0 sub eax,edx
0000000000000034: 4D 8D 42 FF lea r8,[r10-1]
0000000000000038: 49 29 D0 sub r8,rdx
000000000000003B: 48 83 E0 03 and rax,3
000000000000003F: 74 2C je 000000000000006D
0000000000000041: 48 F7 D8 neg rax
0000000000000044: 66 66 66 2E 0F 1F nop word ptr cs:[rax+rax+0000000000000000h]
84 00 00 00 00 00
0000000000000050: F3 0F 10 04 91 movss xmm0,dword ptr [rcx+rdx*4]
0000000000000055: F3 0F 59 C1 mulss xmm0,xmm1
0000000000000059: F3 41 0F 58 04 91 addss xmm0,dword ptr [r9+rdx*4]
000000000000005F: F3 41 0F 11 04 91 movss dword ptr [r9+rdx*4],xmm0
0000000000000065: 48 FF C2 inc rdx
0000000000000068: 48 FF C0 inc rax
000000000000006B: 75 E3 jne 0000000000000050
000000000000006D: 49 83 F8 03 cmp r8,3
0000000000000071: 0F 82 D3 00 00 00 jb 000000000000014A
0000000000000077: 49 29 D2 sub r10,rdx
000000000000007A: 49 8D 44 91 0C lea rax,[r9+rdx*4+0Ch]
000000000000007F: 48 8D 4C 91 0C lea rcx,[rcx+rdx*4+0Ch]
0000000000000084: 66 66 66 2E 0F 1F nop word ptr cs:[rax+rax+0000000000000000h]
84 00 00 00 00 00
0000000000000090: F3 0F 10 41 F4 movss xmm0,dword ptr [rcx-0Ch]
0000000000000095: F3 0F 59 C1 mulss xmm0,xmm1
0000000000000099: F3 0F 58 40 F4 addss xmm0,dword ptr [rax-0Ch]
000000000000009E: F3 0F 11 40 F4 movss dword ptr [rax-0Ch],xmm0
00000000000000A3: F3 0F 10 41 F8 movss xmm0,dword ptr [rcx-8]
00000000000000A8: F3 0F 59 C1 mulss xmm0,xmm1
00000000000000AC: F3 0F 58 40 F8 addss xmm0,dword ptr [rax-8]
00000000000000B1: F3 0F 11 40 F8 movss dword ptr [rax-8],xmm0
00000000000000B6: F3 0F 10 41 FC movss xmm0,dword ptr [rcx-4]
00000000000000BB: F3 0F 59 C1 mulss xmm0,xmm1
00000000000000BF: F3 0F 58 40 FC addss xmm0,dword ptr [rax-4]
00000000000000C4: F3 0F 11 40 FC movss dword ptr [rax-4],xmm0
00000000000000C9: F3 0F 10 01 movss xmm0,dword ptr [rcx]
00000000000000CD: F3 0F 59 C1 mulss xmm0,xmm1
00000000000000D1: F3 0F 58 00 addss xmm0,dword ptr [rax]
00000000000000D5: F3 0F 11 00 movss dword ptr [rax],xmm0
00000000000000D9: 48 83 C0 10 add rax,10h
00000000000000DD: 48 83 C1 10 add rcx,10h
00000000000000E1: 49 83 C2 FC add r10,0FFFFFFFFFFFFFFFCh
00000000000000E5: 75 A9 jne 0000000000000090
00000000000000E7: EB 61 jmp 000000000000014A
00000000000000E9: 41 83 E0 07 and r8d,7
00000000000000ED: 4C 89 D2 mov rdx,r10
00000000000000F0: 4C 29 C2 sub rdx,r8
00000000000000F3: 0F 28 C1 movaps xmm0,xmm1
00000000000000F6: 0F C6 C0 00 shufps xmm0,xmm0,0
00000000000000FA: 4C 8D 59 10 lea r11,[rcx+10h]
00000000000000FE: 49 8D 41 10 lea rax,[r9+10h]
0000000000000102: 48 89 D6 mov rsi,rdx
0000000000000105: 66 66 2E 0F 1F 84 nop word ptr cs:[rax+rax+0000000000000000h]
00 00 00 00 00
0000000000000110: 41 0F 10 53 F0 movups xmm2,xmmword ptr [r11-10h]
0000000000000115: 41 0F 10 1B movups xmm3,xmmword ptr [r11]
0000000000000119: 0F 59 D0 mulps xmm2,xmm0
000000000000011C: 0F 59 D8 mulps xmm3,xmm0
000000000000011F: 0F 10 60 F0 movups xmm4,xmmword ptr [rax-10h]
0000000000000123: 0F 58 E2 addps xmm4,xmm2
0000000000000126: 0F 10 10 movups xmm2,xmmword ptr [rax]
0000000000000129: 0F 58 D3 addps xmm2,xmm3
000000000000012C: 0F 11 60 F0 movups xmmword ptr [rax-10h],xmm4
0000000000000130: 0F 11 10 movups xmmword ptr [rax],xmm2
0000000000000133: 49 83 C3 20 add r11,20h
0000000000000137: 48 83 C0 20 add rax,20h
000000000000013B: 48 83 C6 F8 add rsi,0FFFFFFFFFFFFFFF8h
000000000000013F: 75 CF jne 0000000000000110
0000000000000141: 45 85 C0 test r8d,r8d
0000000000000144: 0F 85 E5 FE FF FF jne 000000000000002F
000000000000014A: 5E pop rsi
000000000000014B: C3 ret
,
Aug 1 2017
,
Aug 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf commit 7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf Author: Hans Wennborg <hans@chromium.org> Date: Wed Aug 02 15:00:08 2017 media: Build vector_math.cc with max optimization It was currently built at "optimize for size", which means important function such as FMAC_C weren't getting autovectorized and unrolled. Since this file is performance sensitive, using max optimization seems reasonable. BUG= 750923 Change-Id: Id38e66365ee1b8b636e391ba07b83a804142eba1 Reviewed-on: https://chromium-review.googlesource.com/597017 Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#491372} [modify] https://crrev.com/7c7dcc2872d7edd2ca4b9121eadf9ed5a6a664bf/media/BUILD.gn
,
Aug 2 2017
,
Aug 2 2017
The media_perftests issue is fixed. Please don't dupe other issues against this.
,
Aug 3 2017
,
Aug 3 2017
Issue 752089 has been merged into this issue.
,
Aug 4 2017
> The media_perftests issue is fixed. Please don't dupe other issues against this. hans@ dupe happens automatically by bisect bots once they figure out the regression is caused by the same CL. In this case, I expect your CL which enables clang on window to have caused regression on a few different metrics. So I suggest you just assign each duped issue to yourself when needs a different fix. I cannot think of any better way to do this.
,
Aug 4 2017
I see, thanks for clarifying.
,
Aug 9 2017
,
Aug 21 2017
Issue 751639 has been merged into this issue.
,
Aug 25 2017
Issue 750822 has been merged into this issue. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 31 2017