New issue
Advanced search Search tips
Starred by 24 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 4
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug

Blocked on:
issue 681103
issue 583166
issue 596934
issue 610772

Blocking:
issue 82385



Sign in to add a comment

chrome.dll is larger with clang than with cl

Project Member Reported by thakis@chromium.org, Feb 10 2015 Back to list

Issue description

(mostly filing this so we don't forget)

In a static release build without debug information (at chrome trunk, `set GYP_DEFINES=fastbuild=1 clang=1`), a clang-built chrome.dll is 33% larger than a cl-built one (47 MB vs 35 MB).

We should figure out why, and make the clang-produced binary 30% smaller than the cl-produced one instead.
 

Comment 1 by thakis@chromium.org, Feb 10 2015

Blocking: chromium:82385

Comment 2 by thakis@chromium.org, Apr 28 2015

In official builds, it's even 30,400 KB vs 40,386 KB, even worse. (chrome_child.dll is slighly larger than that with both compilers, but again quite a bit larger with clang-cl)

I wonder if -Oz instead of -Os would help clang?

Comment 3 by thakis@chromium.org, Apr 28 2015

I hacked up gyp to pass -Oz instead of -Os for FavorSizeOfSpeed:2. With that, chrome.dll (in official build mode) is 35,772 KB.

Comment 4 by thakis@chromium.org, Apr 28 2015

I also tried hacking up build/common.gypi to never FavorSizeOverSpeed and always size. This made my Windows box have a hardware problem and now it won't turn on, so no data for that yet :-/

Comment 5 by thakis@chromium.org, Oct 27 2015

Owner: h...@chromium.org
Let's use this as tracking bug for size stuff.

So far, Hans landed http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151019/308031.html which reduced chrome_child.dll by 120 kB or so.

Comment 6 by h...@chromium.org, Oct 27 2015

Status: Started (was: NULL)
Summary: chrome.dll is larger with clang than with cl (was: chrome.dll is 33% larger with clang than with cl)
I think we've improved a lot from the 33% when this was filed though. Let's make the title a little less specific :-)

David got our /O flags straightened out, and the external commit for "mov to push conversion" helped a lot.


Below are my numbers from the other week. Note that the MSVC build does LTCG, and we don't, so we have a handicap. I don't expect we can close the gap completely without LTO, but let's see how close we can get it -- and then kill it with LTO :-)


32-bit MSVC [0]:
chrome.dll= 33,030,656 bytes
chrome_child.dll= 41,761,280 bytes
chrome.exe= 735,232 bytes
total: 75,527,168 bytes

32-bit Clang [1]:
chrome.dll= 39,260,160 bytes (+ 19%)
chrome_child.dll= 47,836,160 bytes (+ 15%)
chrome.exe= 778,752 bytes (+ 6%)
total: 87,875,072 bytes (+16 %)

 [0]. http://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/2534/steps/sizes/logs/stdio
 [1]. http://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/3659/steps/sizes/logs/stdio

Comment 7 by h...@chromium.org, Nov 3 2015

I'm currently looking at more efficient legalization of 64-bit compares when targeting 32-bit.

For equality comparisons, Clang generates "xor, xor, or, jcc". MSVC generates "cmp, jcc, cmp, jcc". Both sequences have the same size in the ideal case, but Clang's causes higher register pressure, which means the code often gets larger in practice. MSVC's lowering looks more straight-forward IMHO, though it has the downside of using one more branch.

For example:

void foo();
void bar(long long);
void f(long long a, long long b) {
  if (a == b) {
    foo();
  } else {
    bar(b);
  }
}



Clang:
?f@@YAX_J0@Z (void __cdecl f(__int64,__int64)):
  00000000: 56                 push        esi
  00000001: 83 EC 08           sub         esp,8
  00000004: 8B 4C 24 1C        mov         ecx,dword ptr [esp+1Ch]
  00000008: 8B 44 24 18        mov         eax,dword ptr [esp+18h]
  0000000C: 8B 54 24 14        mov         edx,dword ptr [esp+14h]
  00000010: 31 CA              xor         edx,ecx
  00000012: 8B 74 24 10        mov         esi,dword ptr [esp+10h]
  00000016: 31 C6              xor         esi,eax
  00000018: 09 D6              or          esi,edx
  0000001A: 75 09              jne         00000025
  0000001C: 83 C4 08           add         esp,8
  0000001F: 5E                 pop         esi
  00000020: E9 00 00 00 00     jmp         ?foo@@YAXXZ
  00000025: 89 4C 24 04        mov         dword ptr [esp+4],ecx
  00000029: 89 04 24           mov         dword ptr [esp],eax
  0000002C: E8 00 00 00 00     call        ?bar@@YAX_J@Z
  00000031: 83 C4 08           add         esp,8
  00000034: 5E                 pop         esi
  00000035: C3                 ret

MSVC:

?f@@YAX_J0@Z (void __cdecl f(__int64,__int64)):
  00000000: 8B 4C 24 0C        mov         ecx,dword ptr [esp+0Ch]
  00000004: 8B 44 24 10        mov         eax,dword ptr [esp+10h]
  00000008: 39 4C 24 04        cmp         dword ptr [esp+4],ecx
  0000000C: 75 0A              jne         00000018
  0000000E: 39 44 24 08        cmp         dword ptr [esp+8],eax
  00000012: 0F 84 00 00 00 00  je          ?foo@@YAXXZ  <--- Bonus points for folding the tail call.
  00000018: 50                 push        eax
  00000019: 51                 push        ecx
  0000001A: E8 00 00 00 00     call        ?bar@@YAX_J@Z
  0000001F: 83 C4 08           add         esp,8
  00000022: C3                 ret



void g(long long a, long long b) {
  if (a < b) {
    foo();
  } else {
    bar(b);
  }
}



For non-equality relational operators, it gets worse because Clang can't use the xor trick, but uses cmp and setcc for the two comparisons. There's even a comment about this in DAGTypeLegalizer::IntegerExpandSetCCOperands: "FIXME: This generated code sucks". That goes back to at least 2007 so maybe it's time to fix :-)

Clang:

?g@@YAX_J0@Z (void __cdecl g(__int64,__int64)):
  00000040: 83 EC 08           sub         esp,8
  00000043: 8B 4C 24 18        mov         ecx,dword ptr [esp+18h]
  00000047: 8B 44 24 14        mov         eax,dword ptr [esp+14h]
  0000004B: 39 44 24 0C        cmp         dword ptr [esp+0Ch],eax
  0000004F: 0F 93 C2           setae       dl
  00000052: 39 4C 24 10        cmp         dword ptr [esp+10h],ecx
  00000056: 0F 9D C6           setge       dh
  00000059: 74 02              je          0000005D
  0000005B: 88 F2              mov         dl,dh
  0000005D: 84 D2              test        dl,dl
  0000005F: 75 08              jne         00000069
  00000061: 83 C4 08           add         esp,8
  00000064: E9 00 00 00 00     jmp         ?foo@@YAXXZ
  00000069: 89 4C 24 04        mov         dword ptr [esp+4],ecx
  0000006D: 89 04 24           mov         dword ptr [esp],eax
  00000070: E8 00 00 00 00     call        ?bar@@YAX_J@Z
  00000075: 83 C4 08           add         esp,8
  00000078: C3                 ret

MSVC:

?g@@YAX_J0@Z (void __cdecl g(__int64,__int64)):
  00000030: 8B 4C 24 10        mov         ecx,dword ptr [esp+10h]
  00000034: 8B 44 24 0C        mov         eax,dword ptr [esp+0Ch]
  00000038: 39 4C 24 08        cmp         dword ptr [esp+8],ecx
  0000003C: 7F 0D              jg          0000004B
  0000003E: 7C 06              jl          00000046
  00000040: 39 44 24 04        cmp         dword ptr [esp+4],eax
  00000044: 73 05              jae         0000004B
  00000046: E9 00 00 00 00     jmp         ?foo@@YAXXZ
  0000004B: 51                 push        ecx
  0000004C: 50                 push        eax
  0000004D: E8 00 00 00 00     call        ?bar@@YAX_J@Z
  00000052: 83 C4 08           add         esp,8
  00000055: C3                 ret



The MSVC code is always a little shorter for this, and it has less register pressure.


In a Chrome build, we expand about 50k 64-bit comparisons, 13k of which are for equality, so I'd expect a couple hundred KB savings.

(It turns out to be harder to fix than I thought though, hence this post instead of a quick patch.)

Comment 8 by h...@chromium.org, Nov 3 2015

For comparison, GCC also does the "xor xor or" dance for equality comparisons, but does do the "three way branch" for non-equality relational comparison.
Instead of doing

  1. Identify problem
  2. Fix problem
  3. Go to 1

does it make sense to instead do

  1. Identify a list of 10-20 problems
  2. Roughly estimate impact and difficulty of fix of each
  3. Do them in priority order?

"Several hundred KB savings" sounds great, but there are hopefully more opportunities of that size, and maybe some of them don't turn out to be harder than expected :-)

Comment 10 by h...@chromium.org, Nov 16 2015

Re #9: This makes perfect sense. I'll try to do another comparison run and identify some more problems.

Last week I was working on the 64-bit comparisons (http://reviews.llvm.org/D14496). Hopefully I can get that done this week. This now only touches non-equality comparisons, but generates nicer (faster, and 4 bytes shorter) than MSVC and GCC. Impact will probably be smaller than I first thought though, I'm guessing maybe 40 KB.


I also noticed a few other things (small impact probably, but would be nice to fix):


We should zero-extend via register zeroing, at least for setcc:
define i32 @test(i32 %x, i32 %y) {
  %cmp = icmp slt i32 %x, %y
  %res = zext i1 %cmp to i32
  ret i32 %res
}
We generate:
00000000 <test>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   3b 44 24 08             cmp    0x8(%esp),%eax
   8:   0f 9c c0                setl   %al
   b:   0f b6 c0                movzbl %al,%eax
   e:   c3                      ret
But we could do:
00000000 <test>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   3b 44 24 08             cmp    0x8(%esp),%eax
   8:   31 c0                   xor    %eax,%eax
   a:   0f 9c c0                setl   %al
   d:   c3
This saves a byte and breaks the sub-register %eax dependency into the setcc. MSVC does this as well. I think GCC does sometimes, but not in this example. We have some LLVM bugs for this already: PR8785, PR17113, PR22532 9see also https://bugzilla.mozilla.org/show_bug.cgi?id=869525#c6)



We could fold tail calls into conditional branches:

declare void @foo()
declare void @bar()
define void @f(i1 %x) {
entry:
  br i1 %x, label %bb1, label %bb2
bb1:
  tail call void @foo()
  ret void
bb2:
  tail call void @bar()
  ret void
}
We generate:
	testb	$1, 4(%esp)
	je	.LBB0_2
	jmp	foo
.LBB0_2:
	jmp	bar
But MSVC would generate:
	testb	$1, 4(%esp)
	je	bar
	jmp	foo
Which looks nice.

Comment 11 by h...@chromium.org, Nov 18 2015

I ran the numbers on the 64-bit comparisons patch: we save 68.5 KB -- so better than my last guess :-) I'm hoping to land this any time now.


Putting down another random thing I noticed, but it's probably not very profitable:

  unsigned f(unsigned x, unsigned y, unsigned *p) {
    *p = x + y;
    return *p < x;
  }

  addl	%esi, %edi
  movl	%edi, (%rdx)
  sbbl	%eax, %eax  <-- Unnecessary.
  andl	$1, %eax
  retq

The SBBL is cute, but I think just using "SETC %al" would be better.

It seems we're already detecting this sometimes. For code like:

  sum = x + y;
  if (sum < x) {

We will branch based on the carry bit, which is nice.

For extra bonus points, if the carry value gets fed into the addition of two other values, we could try folding to ADC.

Comment 12 by h...@chromium.org, Nov 18 2015

About zero-extensions, I stumbled across this in libosmesa.so:

  115022:       41 8a 0b                mov    (%r11),%cl
  115025:       0f b6 c1                movzbl %cl,%eax

We should be folding this to MOVZBL (%r11), %eax. I'm not sure why that's not happening here, it does happen in many other places around it. %cl doesn't have any other uses that I could see.

Comment 13 by h...@chromium.org, Nov 18 2015

Another one:

  1139a5: c1 ee 0a              shr    $0xa,%esi
  1139a8: 66 83 e6 07           and    $0x7,%si
  1139ac: 44 0f b7 fe           movzwl %si,%r15d

If the AND was widened to "andl $0x7, %esi", there would be no need to zero-extend.

Comment 14 by h...@chromium.org, Nov 19 2015

Repro for the above:
#define MAX(A, B) ((A)>(B) ? (A) : (B))
struct s {
  unsigned a : 7;
  unsigned e : 3;
};
unsigned f(struct s *s) {
  return MAX(1, s->e);
}
It seems we're being clever and narrowing the AND, but it turns out to be counter-productive since we then have to zero-extend the result.

Comment 15 by h...@chromium.org, Nov 21 2015

Moving 1 and -1 into registers. Clang just does 5-byte moves.

For -1, GCC and MSVC or with -1 (3 bytes). ICC does "push -1, pop" (3 bytes, no dependency). We should check with IACA, but ICC probably has this right.

For 1, MSVC does "xor xor, inc". ICC again does the push-pop.

I suspect this will fire a lot, so should probably do this before the zero-extend thing.

Comment 16 by h...@chromium.org, Nov 23 2015

> I suspect this will fire a lot, so should probably do this before the zero-extend thing.

The push-pop trick works for all immediates that can be expressed as a sign-extended 8-bit value. In the Chrome binaries this seems to occur about 40k times:

$ objdump -d /tmp/chrome_child.dll /tmp/chrome.dll /tmp/chrome.exe | grep -P "\tb[8-9a-f] (ff ff ff [8-9a-f][0-9a-f]|00 00 00 [0-7][0-9a-f])" | wc -l
42372

So the potential is about 80 KB savings.

(The push-pop trick also saves 1 byte for 16-bit immediates, but that probably doesn't fire as much.)

Comment 17 by h...@chromium.org, Nov 23 2015

What does IACA say about performance of a regular MOV vs. PUSH-POP?

Throughput: 0.25 vs 1.0 cycles. Latency: 1 vs 6 cycles.


The throughput is because the MOV can dispatch to either of the ALU ports 0, 1, 5 or 6, whereas the PUSH-POP pair always needs to use Port 4 (Store Data), so that's the bottleneck.



Throughput
==========

$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca -32 -arch HSW -analysis THROUGHPUT -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  movl $-1, %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Throughput Analysis Report
--------------------------
Block Throughput: 0.25 Cycles       Throughput Bottleneck: FrontEnd, Port0, Port1, Port5, Port6

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 0.2    0.0  | 0.2  | 0.0    0.0  | 0.0    0.0  | 0.0  | 0.2  | 0.2  | 0.0  |
---------------------------------------------------------------------------------------


| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     | CP | mov eax, 0xffffffff
Total Num Of Uops: 1




$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca 
-32 -arch HSW -analysis THROUGHPUT -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  pushl $-1
  popl %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Throughput Analysis Report
--------------------------
Block Throughput: 1.00 Cycles       Throughput Bottleneck: Port4

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 0.0    0.0  | 0.0  | 0.6    0.5  | 0.7    0.5  | 1.0  | 0.0  | 0.0  | 0.6  |
---------------------------------------------------------------------------------------


| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   2^   |           |     | 0.2       | 0.2       | 1.0 |     |     | 0.6 | CP | push 0xffffffff
|   1    |           |     | 0.5   0.5 | 0.5   0.5 |     |     |     |     |    | pop eax
Total Num Of Uops: 3



Latency
=======
$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca 
-32 -arch HSW -analysis LATENCY -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  movl $-1, %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Latency Analysis Report
---------------------------
Latency: 1 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    | CP | mov eax, 0xffffffff




$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca -32 -arch HSW -analysis LATENCY -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  pushl $-1
  popl %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Latency Analysis Report
---------------------------
Latency: 6 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    |    | push 0xffffffff
|  1   |         |    |         |         |    |    |    |    |    | CP | pop eax

Comment 18 by h...@chromium.org, Nov 23 2015

What about other strategies for -1?

orl $-1, %eax                throughput: 1 cycle (*), latency: 1 cycle
xorl %eax, %eax, decl %eax   throughput: 0.5 cycles,  latency: 2 cycles

(We get the same for xorl %eax, %eax, incl %eax.)




(*) Due to the data depency from %eax in the previous iteration.



$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca 
-32 -arch HSW -analysis THROUGHPUT -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  orl $-1, %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Throughput Analysis Report
--------------------------
Block Throughput: 1.00 Cycles       Throughput Bottleneck: InterIteration

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 0.2    0.0  | 0.2  | 0.0    0.0  | 0.0    0.0  | 0.0  | 0.2  | 0.2  | 0.0  |
---------------------------------------------------------------------------------------


| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     | CP | or eax, 0xffffffff


$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca -32 -arch HSW -analysis LATENCY -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  orl $-1, %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Latency Analysis Report
---------------------------
Latency: 1 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    | CP | or eax, 0xffffffff







$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca 
-32 -arch HSW -analysis THROUGHPUT -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  xorl %eax, %eax
  decl %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Throughput Analysis Report
--------------------------
Block Throughput: 0.50 Cycles       Throughput Bottleneck: FrontEnd

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 0.2    0.0  | 0.2  | 0.0    0.0  | 0.0    0.0  | 0.0  | 0.2  | 0.2  | 0.0  |
---------------------------------------------------------------------------------------


| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   0*   |           |     |           |           |     |     |     |     |    | xor eax, eax
|   1    | 0.2       | 0.2 |           |           |     | 0.2 | 0.2 |     |    | dec eax
Total Num Of Uops: 1


$ cat | as --32 -o /tmp/a.o - && LD_LIBRARY_PATH=/work/iaca-lin64/lib /work/iaca-lin64/bin/iaca 
-32 -arch HSW -analysis LATENCY -reduceout /tmp/a.o
  f:
  .byte 0xbb, 0x6f, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  xorl %eax, %eax
  decl %eax
  .byte 0xbb, 0xde, 0x00, 0x00, 0x00, 0x64, 0x67, 0x90
  ret

Latency Analysis Report
---------------------------
Latency: 2 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    | CP | xor eax, eax
|  1   |         |    |         |         |    |    |    |    |    | CP | dec eax

Comment 19 by h...@chromium.org, Nov 23 2015

Looking at which constants are the most common:

$ objdump -d /tmp/chrome_child.dll /tmp/chrome.dll /tmp/chrome.exe | grep -P "\tb[8-9a-f] (ff ff
 ff [8-9a-f][0-9a-f]|00 00 00 [0-7][0-9a-f])" | cut -f2 | cut -d' ' -f5 | awk '{ a[$1]++ } END { for (n in a) print n, a[n] }' | sort
00 17749
01 39
02 137
03 12
04 44
05 4
06 9
07 36
08 10
0a 1
0c 8
0d 2
0e 5
0f 907
10 22
11 2
13 2
14 5
15 2
16 2
17 4
18 18
1a 2
1b 2
1e 2
20 22
24 1
28 3
2a 2
2b 2
30 1
34 2
35 2
36 2
37 2
38 17
3f 14
40 90
41 47
42 16
46 1
48 1
50 2
56 2
64 2
70 24
78 10
80 4
87 6
8f 3
bf 4
c7 4
c8 2
cf 3
df 8
ef 4
f1 2
f3 5
f7 4
f9 2
fb 6
fc 5
fd 1
fe 4
ff 23014

It seems we're emitting "movl $0x0, %reg" 17749 times (unless my grepping is wrong). That's really bad, but on the positive side it should save us 52 KB easy :-)

After 0, -1 completely dominates, so we should focus on that.

Comment 20 by h...@chromium.org, Nov 24 2015

> It seems we're emitting "movl $0x0, %reg" 17749 times (unless my grepping is wrong). That's really bad, but on the positive side it should save us 52 KB easy :-)

I think this was a red herring. I suspect these are instructions where a relocation will fill in a final address at load-time:

$ echo 'int a;void f(void*); void g() { f(&a); }' | /work/llvm/build.release/bin/clang -c -Os -xc - -o a.o && objdump -rd a.o
0000000000000000 <g>:
   0:	bf 00 00 00 00       	mov    $0x0,%edi
			1: R_X86_64_32	a

Comment 21 by h...@chromium.org, Nov 24 2015

Re #20, *all* of it might not be a red herring actually. From X86InstrInfo::reMaterialize:

 // MOV32r0 is implemented with a xor which clobbers condition code.
 // Re-materialize it as movri instructions to avoid side effects.
 unsigned Opc = Orig->getOpcode();
 if (Opc == X86::MOV32r0 && !isSafeToClobberEFLAGS(MBB, I)) {
   DebugLoc DL = Orig->getDebugLoc();
   BuildMI(MBB, I, DL, get(X86::MOV32ri)).addOperand(Orig->getOperand(0))
     .addImm(0);

But that's a very minor point, I think.

Comment 22 by h...@chromium.org, Nov 26 2015

Some notes:

The wide integer compare patch landed in r253572 last week; worth about 70 KB.

Got a patch out for 8-bit constant materialization: http://reviews.llvm.org/D14971 Expecting ~40 KB for that, but haven't done full builds to measure yet.


Random thing I noticed:
For long long f() { return -2; } in 32-bit mode, we could use CQD to fill %edx.


nbjoerg pointed to some PRs; my patch would fix PR8784. I should look at PR9784 at some point. Should ask if he has more issues filed for code size.

Comment 23 by r...@chromium.org, Nov 30 2015

Cc: r...@chromium.org

Comment 24 by p...@chromium.org, Nov 30 2015

 Issue 563748  covers the same thing but for Android. I'd imagine there could be a bit of overlap here.

Comment 25 by h...@chromium.org, Dec 2 2015

While debugging something else, I noticed that many of our constructors mostly fill objects with zeros, and in a pretty size-inefficient way.

Example:

  struct s {
    int a, b, c, d, e;
  };
  void f(struct s *s) {
    s->a = s->b = s->c = s->d = s->e = 0;
  }

Clang: (GCC does this too)

00000000 <f>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   c7 40 04 00 00 00 00    movl   $0x0,0x4(%eax)
   b:   c7 00 00 00 00 00       movl   $0x0,(%eax)
  11:   c7 40 0c 00 00 00 00    movl   $0x0,0xc(%eax)
  18:   c7 40 08 00 00 00 00    movl   $0x0,0x8(%eax)
  1f:   c7 40 10 00 00 00 00    movl   $0x0,0x10(%eax)
  26:   c3

ICC:

00000000 <f>:
   0:   31 c0                   xor    %eax,%eax
   2:   8b 54 24 04             mov    0x4(%esp),%edx
   6:   89 42 10                mov    %eax,0x10(%edx)
   9:   89 42 0c                mov    %eax,0xc(%edx)
   c:   89 42 08                mov    %eax,0x8(%edx)
   f:   89 42 04                mov    %eax,0x4(%edx)
  12:   89 02                   mov    %eax,(%edx)
  14:   c3                      ret

This potentially saves 4 bytes per move, minus the one-time overhead of clearing %eax, and probably occurs quite a bit. It does require an extra register, but in simple initialization functions that shouldn't be an issue.

Comment 26 by h...@chromium.org, Dec 2 2015

MSVC also does this:

_f:
  00000000: 8B 44 24 04        mov         eax,dword ptr [esp+4]
  00000004: 33 C9              xor         ecx,ecx
  00000006: 89 48 10           mov         dword ptr [eax+10h],ecx
  00000009: 89 48 0C           mov         dword ptr [eax+0Ch],ecx
  0000000C: 89 48 08           mov         dword ptr [eax+8],ecx
  0000000F: 89 48 04           mov         dword ptr [eax+4],ecx
  00000012: 89 08              mov         dword ptr [eax],ecx
  00000014: C3                 ret00
Doesn't the current code pipeline better? Or do cpus special case data deps on self-xored registers?

Comment 28 by h...@chromium.org, Dec 2 2015

IIUC, the self-xor is basically telling the register renamer to give you a zero, so the instruction doesn't even execute and it doesn't block the following instructions.

IACA output for the Clang code:

Latency Analysis Report
---------------------------
Latency: 11 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    | CP | mov eax, dword ptr [esp+0x4]
|  1   |         |    |         |         |    |    |    |    |    |    | mov dword ptr [eax+0x4], 0x0
|  2   |         |    |         |         | 1  |    |    |    |    |    | mov dword ptr [eax], 0x0
|  3   |         |    |         |         | 1  |    |    |    | 1  |    | mov dword ptr [eax+0xc], 0x0
|  4   |         |    |         |         | 2  |    |    | 1  | 1  | CP | mov dword ptr [eax+0x8], 0x0
|  5   |         |    |         | 1       | 2  |    |    |    | 2  | CP | mov dword ptr [eax+0x10], 0x0

Resource Conflict on Critical Paths: 
-----------------------------------------------------------------
|  Port  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  |
-----------------------------------------------------------------
| Cycles | 0    0  | 0  | 0    0  | 1    0  | 0  | 0  | 0  | 1  |
-----------------------------------------------------------------

List Of Delays On Critical Paths
-------------------------------
1 --> 4 1 Cycles Delay On Port7
2 --> 5 1 Cycles Delay On PORT3_AGU



For the ICC code:

Latency Analysis Report
---------------------------
Latency: 11 Cycles

The Resource delay is counted since all the sources of the instructions are ready
and until the needed resource becomes available

| Inst |                 Resource Delay In Cycles                  |    |
| Num  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  | FE |    |
-------------------------------------------------------------------------
|  0   |         |    |         |         |    |    |    |    |    |    | xor eax, eax
|  1   |         |    |         |         |    |    |    |    |    | CP | mov edx, dword ptr [esp+0x4]
|  2   |         |    |         |         |    |    |    |    |    |    | mov dword ptr [edx+0x10], eax
|  3   |         |    |         |         | 1  |    |    |    |    |    | mov dword ptr [edx+0xc], eax
|  4   |         |    |         |         |    |    |    |    |    |    | mov dword ptr [edx+0x8], eax
|  5   |         |    |         |         | 2  |    |    | 1  |    | CP | mov dword ptr [edx+0x4], eax
|  6   |         |    |         | 1       | 3  |    |    |    |    | CP | mov dword ptr [edx], eax

Resource Conflict on Critical Paths: 
-----------------------------------------------------------------
|  Port  | 0  - DV | 1  | 2  - D  | 3  - D  | 4  | 5  | 6  | 7  |
-----------------------------------------------------------------
| Cycles | 0    0  | 0  | 0    0  | 1    0  | 0  | 0  | 0  | 1  |
-----------------------------------------------------------------

List Of Delays On Critical Paths
-------------------------------
2 --> 5 1 Cycles Delay On Port7
3 --> 6 1 Cycles Delay On PORT3_AGU

Comment 29 by h...@chromium.org, Dec 3 2015

Hmm, I lied - Clang does seem to do the self-xor for the example in #25 *at trunk*, but not at 3.7. Progress :-)

I wonder why I didn't see this in the code. Am I confusing the flags again?

Comment 30 by h...@chromium.org, Dec 3 2015

Yeah, we're not using -Os in non-official builds. Why do we do that?
https://code.google.com/p/chromium/codesearch#chromium/src/build/common.gypi&l=5550 

              # In official builds, targets can self-select an optimization
              # level by defining a variable named 'optimize', and setting it
              # to one of
              # - "size", optimizes for minimal code size - the default.
              # - "speed", optimizes for speed over code size.
              # - "max", whole program optimization and link-time code
              #   generation. This is very expensive and should be used
              #   sparingly.

Probably because of that last sentence.

Comment 32 by h...@chromium.org, Dec 11 2015

Doing a new comparison like the one in #6:

MSVC [0]:
chrome.dll: chrome.dll= 34242048 bytes
chrome_child.dll: chrome_child.dll= 42960896 bytes
chrome.exe: chrome.exe= 867840 bytes
Total: 78070784

Clang [1]:
chrome.dll: chrome.dll= 40206848 bytes (+ 17%)
chrome_child.dll: chrome_child.dll= 49154048 bytes (+ 14%)
chrome.exe: chrome.exe= 920064 bytes (+ 6%)
Total: 90280960 (+ 15.64%)

In #6, the total difference was 16.35%, so at least we haven't regressed, but moved slightly in the right direction.


 0. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/3531:
 1. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/5294

Comment 33 by h...@chromium.org, Dec 15 2015

Smaller code for materializing -1 just landed in r255656, saves 76.5 KB.

Comment 34 by h...@chromium.org, Jan 23 2016

Another random note on tail calls. MSVC will tail call here:

  int g(int);

  int f(int x) {
    return g(42);
  }

re-using the stack slot for x. Clang doesn't do this.

Comment 35 by h...@chromium.org, Jan 25 2016

David's r258506: "[MSVC Compat] Don't provide /volatile:ms semantics to types > pointer"
should save a few bytes.

For example, for PersistentMemoryAllocator::Id() we would generate:

00000000: 53                 push        ebx
00000001: 56                 push        esi
00000002: 8B 71 04           mov         esi,dword ptr [ecx+4]
00000005: 31 C0              xor         eax,eax
00000007: 31 D2              xor         edx,edx
00000009: 31 DB              xor         ebx,ebx
0000000B: 31 C9              xor         ecx,ecx
0000000D: F0 0F C7 4E 18     lock cmpxchg8b qword ptr [esi+18h]
00000012: 5E                 pop         esi
00000013: 5B                 pop         ebx
00000014: C3                 ret

And MSVC will generate:

00000000: 8B 51 04           mov         edx,dword ptr [ecx+4]
00000003: 8B 42 18           mov         eax,dword ptr [edx+18h]
00000006: 8B 52 1C           mov         edx,dword ptr [edx+1Ch]
00000009: C3                 ret

David's patch should make us generate the latter too.

Comment 36 by h...@chromium.org, Jan 25 2016

David's r258447 "[MSVC Compat] Don't omit frame pointers if /Oy- is specified before /O2"
probably made us emit frame pointer more often. That's what Chromium wants though.

But there's lots of cases where MSVC omits frame pointers and we don't. For example, from logging::LogEventProvider::OnEventsDisabled()

Clang:

00000000: 55                 push        ebp
00000001: 89 E5              mov         ebp,esp
00000003: 50                 push        eax
00000004: 8B 41 30           mov         eax,dword ptr [ecx+30h]
00000007: 89 04 24           mov         dword ptr [esp],eax
0000000A: E8 00 00 00 00     call        ?SetMinLogLevel@logging@@YAXH@Z
0000000F: 83 C4 04           add         esp,4
00000012: 5D                 pop         ebp
00000013: C3                 ret

MSVC:

00000000: FF 71 30           push        dword ptr [ecx+30h]
00000003: E8 00 00 00 00     call        ?SetMinLogLevel@logging@@YAXH@Z
00000008: 59                 pop         ecx
00000009: C3                 ret

Comment 37 by h...@chromium.org, Jan 26 2016

I started to build a list of optimization opportunities here: http://llvm.org/PR26299

I used a horrible Perl script (https://codereview.chromium.org/1628613002/) to scrape dumpbin output from a Clang and MSVC build.

Comment 38 by h...@chromium.org, Jan 29 2016

New data (produced by the attached script):

MSVC (Chromium #372408) [0]
=============================================
chrome.exe: 864768 b
chrome.dll: 34661888 b
chrome_child.dll: 44251136 b
Total: 79777792 b

Clang (Chromium #372400, Clang r259233) [1]
=============================================
chrome.exe: 920576 b (+55808 b, +6.45 %)
chrome.dll: 41016320 b (+6354432 b, +18.33 %)
chrome_child.dll: 51042304 b (+6791168 b, +15.35 %)
Total: 92979200 b (+13201408 b, +16.55 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/4385
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/5944



Looks like the size has regressed since #32. Maybe related to the frame pointer change.
are-we-small-yet.pl
2.3 KB View Download

Comment 39 Deleted

Comment 40 Deleted

Comment 41 Deleted

Comment 42 by h...@chromium.org, Feb 5 2016

http://reviews.llvm.org/D16907 should save 42 KB on chrome_child.dll

Painstakingly bisected a 150 KB regression in chrome_child.dll size to r252595. I'll look into that.

The Clang build grew more than the MSVC one when Oilpain landed in #371208. I'll double check if there were Clang changes in the same timespan.

Comment 43 by h...@chromium.org, Feb 5 2016

> The Clang build grew more than the MSVC one when Oilpain landed in #371208. I'll double check if there were Clang changes in the same timespan.

Confirmed that a fixed version of Chromium doesn't see any size change between Clang r258659 and r259161 (the revisions around the Oilpan change). We should figure out why we grew more than MSVC, though.

Comment 45 by h...@chromium.org, Feb 22 2016

Blockedon: 583166

Comment 46 by h...@chromium.org, Feb 25 2016

There's been some commits lately that dropped our size a lot:

MSVC (Chromium #377584) [0]
=============================================
chrome.exe: 871424 B
chrome.dll: 34943488 B
chrome_child.dll: 44559872 B
Total: 80374784 B

Clang (Chromium #377581, Clang r261875) [1]
=============================================
chrome.exe: 909824 B (+38400 B, +4.41 %)
chrome.dll: 40284160 B (+5340672 B, +15.28 %)
chrome_child.dll: 50555392 B (+5995520 B, +13.45 %)
Total: 91749376 B (+11374592 B, +14.15 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/4815
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/6646



Notably:
r259915 - fix inlining regression
r260133 - don't extend i1/i8/i16 returns to 32 bits
r260917 - stack reorder (this is the big one!)
r261429 - lea optimization


The attached plot shows the size of chrome_child.dll built with MSVC (red), Clang (green), and the ratio between the two (blue).
Screenshot from 2016-02-25 11:19:52.png
199 KB View Download
Cool visualization!

Now we only need 12 more jumps like that and we're even :-P

Comment 48 by h...@chromium.org, Mar 15 2016

There were a number of significant movements on binary size last week:

Chromium: #379870--#379997
Clang: r262948--r262974
Ratio change: 0.003049
MSVC size change:   -111616 bytes (-0.249 %)
Clang size change:     9216 bytes (0.018 %)
http://test-results.appspot.com/revision_range?start=379870&end=379997
svn log -r 262948:262974 http://llvm.org/svn/llvm-project

Not an LLVM change: this is the MSVC build shrinking while the Clang build not moving. I suspect its this one: https://codereview.chromium.org/1763983002 "Change scoped_ptr to a type alias for std::unique_ptr on OS_WIN".
I verified that with MSVC, that CL reduces chrome_child.dll size with 70 KB on MSVC, and with 10 KB.

That doesn't account for all of the bytes above, but most of them. Still don't know why this differs between the two compilers, though.





Chromium: #379997--#380117
Clang: r262974--r263006
Ratio change: 0.003061
MSVC size change:    291840 bytes (0.653 %)
Clang size change:   470528 bytes (0.923 %)
http://test-results.appspot.com/revision_range?start=379997&end=380117
svn log -r 262974:263006 http://llvm.org/svn/llvm-project

This one shows significant growth in both the MSVC and Clang builds, but the Clang build grew much more. Nothing interesting in the LLVM range. Possible Chromium candidates, but none of them look like a smoking gun:

https://codereview.chromium.org/1767343004
https://codereview.chromium.org/1699773002





Chromium: #380117--#380426
Clang: r263006--r263135
Ratio change: 0.002872
MSVC size change:    205312 bytes (0.457 %)
Clang size change:   364544 bytes (0.709 %)
http://test-results.appspot.com/revision_range?start=380117&end=380426
svn log -r 263006:263135 http://llvm.org/svn/llvm-project

LLVM candidates:
r263047 "InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2"
Chandler landed some SROA patches, I hope it's not one of those..

Chromium candidates:
https://codereview.chromium.org/1774443002 - Replace template_util.h stuff with C++11 <type_traits>

Not really sure what happened here..




Chromium: #380655--#380954
Clang: r263255--r263418
Ratio change: -0.002615
MSVC size change:    -32768 bytes (-0.073 %)
Clang size change:  -155648 bytes (-0.300 %)
http://test-results.appspot.com/revision_range?start=380655&end=380954
svn log -r 263255:263418 http://llvm.org/svn/llvm-project

Yay, moving in the right direction!

LLVM candidates:
r263406 [CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegative
Doesn't seem very likely.

Chromium candidates:
https://codereview.chromium.org/1783613004
Also doesn't seem very likely.



It's really hard to figure out what changed when there's nothing obvious in the logs. I'm trying to come up with ways to maybe craft a tryjob that would build and measure size at a specific Chromium and Clang revision. Might also be worth tracking sizes on more bots to have more data to compare with. E.g. maybe trybots in general could measure sizes so we could see the impact of each individual CL..
Project Member

Comment 49 by bugdroid1@chromium.org, Mar 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/739b0037c0e967f08ce333ffffb5e96f7bd0323b

commit 739b0037c0e967f08ce333ffffb5e96f7bd0323b
Author: hans@chromium.org <hans@chromium.org>
Date: Fri Mar 18 17:42:11 2016

Run Sizes on Windows perf builders

I want to be able to track down Windows size regressions with the win perf
trybots.

BUG= 457078 

Review URL: https://codereview.chromium.org/1809783003

git-svn-id: svn://svn.chromium.org/chrome/trunk/tools/build@299385 0039d316-1c4b-4281-b951-d872f2087c98

[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_Builder.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_x64_Builder.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_8_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_fyi_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect_builder.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_x64_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_cq.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_bisect_builder.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_zen_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64ati_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64intel_perf_bisect.json
[modify] https://crrev.com/739b0037c0e967f08ce333ffffb5e96f7bd0323b/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64nvidia_perf_bisect.json

Comment 50 by h...@chromium.org, Mar 22 2016

Blockedon: 596934
Project Member

Comment 51 by bugdroid1@chromium.org, Mar 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/91a548ce90a89cabefd2048030c959a39e7bb534

commit 91a548ce90a89cabefd2048030c959a39e7bb534
Author: agable@chromium.org <agable@chromium.org>
Date: Tue Mar 22 17:15:05 2016

Revert of Run Sizes on Windows perf builders (patchset #1 id:1 of https://codereview.chromium.org/1809783003/ )

Reason for revert:
New sizes step always fails: https://build.chromium.org/p/chromium.perf/builders/Win%20Builder/builds/5630/steps/sizes/logs/stdio

See https://bugs.chromium.org/p/chromium/issues/detail?id=596145#c7 for additional details

Original issue's description:
> Run Sizes on Windows perf builders
> 
> I want to be able to track down Windows size regressions with the win perf
> trybots.
> 
> BUG= 457078 
> 
> Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=299385

TBR=eakuefner@chromium.org,dtu@chromium.org,eakuefner@google.com,prasadv@chromium.org,robertocn@chromium.org,thakis@chromium.org,hans@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG= 457078 , 596145 

Review URL: https://codereview.chromium.org/1827443002

git-svn-id: svn://svn.chromium.org/chrome/trunk/tools/build@299424 0039d316-1c4b-4281-b951-d872f2087c98

[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_Builder.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_x64_Builder.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_8_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_fyi_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect_builder.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_x64_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_cq.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_bisect_builder.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_zen_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64ati_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64intel_perf_bisect.json
[modify] https://crrev.com/91a548ce90a89cabefd2048030c959a39e7bb534/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64nvidia_perf_bisect.json

Project Member

Comment 52 by bugdroid1@chromium.org, Mar 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/233ab1db124001fe7f2d333b475ef97820dbce91

commit 233ab1db124001fe7f2d333b475ef97820dbce91
Author: hans@chromium.org <hans@chromium.org>
Date: Wed Mar 23 00:00:34 2016

Run Sizes on Windows perf builders (take 2)

(The first version of this patch was at https://codereview.chromium.org/1809783003/)

This is the same version as the first patch, but without passing --annotate=graphing
when running Sizes without a perf_id, as that apparently leads to failure (see  crbug.com/596145 ).

BUG= 457078 

Review URL: https://codereview.chromium.org/1823183002

git-svn-id: svn://svn.chromium.org/chrome/trunk/tools/build@299431 0039d316-1c4b-4281-b951-d872f2087c98

[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipe_modules/chromium/api.py
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_chrome_Google_Chrome_Linux_x64.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_chrome_Google_Chrome_Mac.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_chrome_Google_Chrome_Win.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTLinux.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTLinuxASan.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTLinuxUBSanVptr.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTLinux__dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTMac.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTMacASan.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTMac__dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin64.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin64_dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin64_dll_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin_dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_fyi_ClangToTWin_dll_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_Builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_perf_Win_x64_Builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_chromium_perf_fyi_Win_Clang_Builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_8_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_fyi_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_perf_bisect_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_win_x64_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_10_perf_cq.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_bisect_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64_zen_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64ati_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64intel_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/chromium.expected/full_tryserver_chromium_perf_winx64nvidia_perf_bisect.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_arm64_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_arm64_builder__dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_armv6_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_builder__dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_data_reduction_proxy_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_mips_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_x86_builder.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/android_cronet_x86_builder__dbg_.json
[modify] https://crrev.com/233ab1db124001fe7f2d333b475ef97820dbce91/scripts/slave/recipes/cronet.expected/local_test.json

Comment 53 by h...@chromium.org, Mar 29 2016

I managed to bisect this regression:

Chromium: #380117--#380426
Clang: r263006--r263135
Ratio change: 0.002872
MSVC size change:    205312 bytes (0.457 %)
Clang size change:   364544 bytes (0.709 %)
http://test-results.appspot.com/revision_range?start=380117&end=380426
svn log -r 263006:263135 http://llvm.org/svn/llvm-project

The size is exactly the same with Clang r263006 and r263135, so this is only due to Chromium changes.

Bisection points to this V8 roll: https://codereview.chromium.org/1776013004
Within the roll, bisection points to: https://codereview.chromium.org/1770353002

For chrome_child.dll:
MSVC size before: 44972032 [1], after: 45169152 [3] (+197120 bytes, 0.4%)
Clang size before: 51431936 [3], after: 51761152 [4] (+329216 bytes, 0.6%)

The V8 change adds tracing around each entry and exit of the V8 run-time. My theory is that the instruction sequences for calling the tracing framework are responsible for the growth.

The function call arguments can be put on the stack with "push" or "mov" instructions. MSVC always pushes, which results in smaller code, but Clang only uses pushes at -Os, and V8 is one of those parts of Chromium that don't use -Os in official builds. Hence, the code growth with Clang was larger.

http://llvm.org/PR26325 is for enabling the "push lowering" more broadly.


Reid also pointed out that one explanation for other regressions could be if they add code which passes non-trivially copyable objects by value, for which Clang generates inefficient code. That's http://llvm.org/PR27076


 [1] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13539/steps/sizes/logs/stdio
 [2] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13538/steps/sizes/logs/stdio
 [3] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13536/steps/sizes/logs/stdio
 [4] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13537/steps/sizes/logs/stdio

Comment 54 by h...@chromium.org, Mar 31 2016

Clang r264966 enabled the "push lowering" more broadly.
chrome_child.dll size before:  52605440 b [1], after: 51072000 b [2]. That's a 1497.5 KB, or 2.9% drop!

 [1] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13548/steps/sizes/logs/stdio
 [2] https://build.chromium.org/p/tryserver.chromium.perf/builders/win_perf_bisect_builder/builds/13549/steps/sizes/logs/stdio

Comment 55 by h...@chromium.org, Mar 31 2016

This is our new status:

MSVC (Chromium #384292) [0]
=============================================
chrome.exe: 910848 B
chrome.dll: 36209664 B
chrome_child.dll: 45765632 B
Total: 82886144 B

Clang (Chromium #384261, Clang r265011) [1]
=============================================
chrome.exe: 931328 B (+20480 B, +2.25 %)
chrome.dll: 40098816 B (+3889152 B, +10.74 %)
chrome_child.dll: 51073024 B (+5307392 B, +11.60 %)
Total: 92103168 B (+9217024 B, +11.12 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/5691
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7290


Attaching the latest size plot for chrome_child.dll. The blue line shows the ratio between Clang and MSVC size, and the big drop on the right is due to yesterday's change.

The huge green spikes on the right are due to https://codereview.chromium.org/1814423002 which landed and got reverted a few times. There's no corresponding red (MSVC) spike, as that build was broken at the time.
plot.png
158 KB View Download

Comment 56 by h...@chromium.org, Apr 5 2016

Size dropped some more recently. This is our new status:

MSVC (Chromium #385194) [0]
=============================================
chrome.exe: 929792 B
chrome.dll: 36267520 B
chrome_child.dll: 45634560 B
Total: 82831872 B

Clang (Chromium #385174, Clang r265405) [1]
=============================================
chrome.exe: 946688 B (+16896 B, +1.82 %)
chrome.dll: 40078848 B (+3811328 B, +10.51 %)
chrome_child.dll: 50878976 B (+5244416 B, +11.49 %)
Total: 91904512 B (+9072640 B, +10.95 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/5855
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7357




I thought this was due to my change (r265345, merging of adjacent stack adjustments), but the effect turned out to be small (3 KB on chrome_child.dll) so something else must have changed:

Chromium: #384196--#384244
Clang: r264992--r265006
Ratio change: -0.001814
MSVC size change:     22528 bytes (0.049 %)
Clang size change:   -57856 bytes (-0.113 %)
http://test-results.appspot.com/revision_range?start=384196&end=384244
svn log -r 264992:265006 http://llvm.org/svn/llvm-project

Comment 57 by h...@chromium.org, Apr 14 2016

Cc: -majnemer@chromium.org
We've regressed a little since last time and are now above 11% again:

MSVC (Chromium #387330) [0]
=============================================
chrome.exe: 929792 B
chrome.dll: 36455936 B
chrome_child.dll: 46041600 B
Total: 83427328 B

Clang (Chromium #387268, Clang r266284) [1]
=============================================
chrome.exe: 946688 B (+16896 B, +1.82 %)
chrome.dll: 40442368 B (+3986432 B, +10.93 %)
chrome_child.dll: 51225088 B (+5183488 B, +11.26 %)
Total: 92614144 B (+9186816 B, +11.01 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/6159
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7455


The biggest change is in this range:

Chromium: #386016--#386308
Clang: r265772--r265881
Ratio change: 0.006898
MSVC size change:      6144 bytes (0.013 %)
Clang size change:   323072 bytes (0.635 %)
http://test-results.appspot.com/revision_range?start=386016&end=386308
svn log -r 265772:265881 http://llvm.org/svn/llvm-project

I'm currently investigating that.

Comment 58 by h...@chromium.org, Apr 18 2016

The regression in r265772--r265881 mentioned in #57 was actually two regressions:

r265790 grew chrome_child.dll about 100 KB. That has since been recovered by r265547.

r265836 grew chrome_child.dll by 240 KB. IIUC, that one is causing more expansion of memcpy calls, which makes me think we want to re-evaluate  X86TargetLowering::MaxStoresPerMemcpyOptSize etc. I'll look into that (http://llvm.org/PR27415).



Overall, we're now back below 11% again:

MSVC (Chromium #388037) [0]
=============================================
chrome.exe: 931840 B
chrome.dll: 36499968 B
chrome_child.dll: 46008320 B
Total: 83440128 B

Clang (Chromium #387950, Clang r266639) [1]
=============================================
chrome.exe: 950784 B (+18944 B, +2.03 %)
chrome.dll: 40484352 B (+3984384 B, +10.92 %)
chrome_child.dll: 51126272 B (+5117952 B, +11.12 %)
Total: 92561408 B (+9121280 B, +10.93 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/6265
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7516


Comment 59 by r...@chromium.org, Apr 18 2016

Well, r265547 got re-committed six days later in r266162. Did we see another regression?

Comment 60 by h...@chromium.org, Apr 18 2016

> Well, r265547 got re-committed six days later in r266162. Did we see another regression?

I think I messed up the text above.

r265790 reverted r265547, and we regressed 100 KB
r266162 re-committed r265547, and we got it back

Comment 61 by h...@chromium.org, Apr 22 2016

I've put some notes from looking at the memcpy thresholds at http://llvm.org/PR27415. Probably won't pursue that anymore right now.

Comment 62 by h...@chromium.org, May 3 2016

New numbers after http://llvm.org/viewvc/llvm-project?rev=268261&view=rev and http://llvm.org/viewvc/llvm-project?rev=268321&view=rev:

MSVC (Chromium #391265) [0]
=============================================
chrome.exe: 941568 B
chrome.dll: 36713984 B
chrome_child.dll: 46310912 B
Total: 83966464 B

Clang (Chromium #391225, Clang r268386) [1]
=============================================
chrome.exe: 949760 B (+8192 B, +0.87 %)
chrome.dll: 40065024 B (+3351040 B, +9.13 %)
chrome_child.dll: 50476544 B (+4165632 B, +8.99 %)
Total: 91491328 B (+7524864 B, +8.96 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/6715
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7695

Comment 63 by h...@chromium.org, May 3 2016

Nico pointed out we should take a look at the 64-bit numbers too, since that will become the important metric eventually:

MSVC (Chromium #391332 64-bit) [0]
=============================================
chrome.exe: 1130496 B
chrome.dll: 51691520 B
chrome_child.dll: 61723136 B
Total: 114545152 B

Clang (Chromium #391266 64-bit, Clang r268398) [1]
=============================================
chrome.exe: 1033728 B (-96768 B, -8.56 %)
chrome.dll: 47405568 B (-4285952 B, -8.29 %)
chrome_child.dll: 58556416 B (-3166720 B, -5.13 %)
Total: 106995712 B (-7549440 B, -6.59 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/7136
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/7827


Looks like we're doing alright :-)
Project Member

Comment 64 by bugdroid1@chromium.org, May 4 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a3d3aa4d4d3bba1fae533eb88ca2fd2ef2ecf7e2

commit a3d3aa4d4d3bba1fae533eb88ca2fd2ef2ecf7e2
Author: hans <hans@chromium.org>
Date: Wed May 04 22:12:52 2016

build mini_installer as part of the chromium_builder_perf target on Windows

I want the perf builders to build this so we can track its size.

BUG= 457078 

Review-Url: https://codereview.chromium.org/1944103002
Cr-Commit-Position: refs/heads/master@{#391656}

[modify] https://crrev.com/a3d3aa4d4d3bba1fae533eb88ca2fd2ef2ecf7e2/BUILD.gn
[modify] https://crrev.com/a3d3aa4d4d3bba1fae533eb88ca2fd2ef2ecf7e2/build/all.gyp

Re comment 63: Nice! :-) But we need to keep in mind that we don't implement /GS yet ( bug 598767 )

Comment 66 by h...@chromium.org, May 9 2016

Monday numbers:

MSVC (Chromium #392398 32-bit) [0]
=============================================
chrome.exe: 942080 B
chrome.dll: 36799488 B
chrome_child.dll: 46494720 B
Total: 84236288 B
mini_installer.exe: 43708928 B

Clang (Chromium #392401 32-bit, Clang r268962) [1]
=============================================
chrome.exe: 950784 B (+8704 B, +0.92 %)
chrome.dll: 40167424 B (+3367936 B, +9.15 %)
chrome_child.dll: 50745856 B (+4251136 B, +9.14 %)
Total: 91864064 B (+7627776 B, +9.06 %)
mini_installer.exe: 45503488 B (+1794560 B, +4.11 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/6909
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7761

---

MSVC (Chromium #392426 64-bit) [0]
=============================================
chrome.exe: 1130496 B
chrome.dll: 51698688 B
chrome_child.dll: 61882880 B
Total: 114712064 B
mini_installer.exe: 51149312 B

Clang (Chromium #392394 64-bit, Clang r268958) [1]
=============================================
chrome.exe: 1034752 B (-95744 B, -8.47 %)
chrome.dll: 47607296 B (-4091392 B, -7.91 %)
chrome_child.dll: 59141632 B (-2741248 B, -4.43 %)
Total: 107783680 B (-6928384 B, -6.04 %)
mini_installer.exe: 48309248 B (-2840064 B, -5.55 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/7495
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/7900

---

Looks like we regressed a little. This seems to be the interesting range:

Chromium: #391502--#391621
Clang: r268497--r268539
Ratio change: 0.002232
MSVC size change:      5632 bytes (0.012 %)
Clang size change:   109568 bytes (0.217 %)
http://test-results.appspot.com/revision_range?start=391502&end=391621
svn log -r 268497:268539 http://llvm.org/svn/llvm-project

I suspect this one:
r268509 Do not disable completely loop unroll when optimizing for size.

Comment 67 by r...@chromium.org, May 9 2016

Yes, I reviewed r268509. We should change LLVM to reduce its threshold in Os. The author of the patch just wanted the pass to be enabled so that it would honor explicit loop unrolling pragmas.

Comment 68 by h...@chromium.org, May 10 2016

Blockedon: 610772

Comment 69 by h...@chromium.org, May 16 2016

We recovered the size regression from the loop unroll threshold with r269124, but there are now three new regressions:

Chromium: #392880--#393237
Clang: r269162--r269291
Ratio change: 0.001573
MSVC size change:    -16384 bytes (-0.035 %)
Clang size change:    55296 bytes (0.109 %)
http://test-results.appspot.com/revision_range?start=392880&end=393237
svn log -r 269162:269291 http://llvm.org/svn/llvm-project

Chromium: #393237--#393465
Clang: r269291--r269398
Ratio change: 0.000622
MSVC size change:     13824 bytes (0.030 %)
Clang size change:    44032 bytes (0.087 %)
http://test-results.appspot.com/revision_range?start=393237&end=393465
svn log -r 269291:269398 http://llvm.org/svn/llvm-project

Chromium: #393509--#393736
Clang: r269412--r269575
Ratio change: 0.000886
MSVC size change:     96256 bytes (0.207 %)
Clang size change:   146432 bytes (0.288 %)
http://test-results.appspot.com/revision_range?start=393509&end=393736
svn log -r 269412:269575 http://llvm.org/svn/llvm-project

Comment 70 by h...@chromium.org, May 16 2016

None of the regressions in #69 are due to Clang changes. Bisecting Chromium now.

Comment 71 by h...@chromium.org, May 20 2016

Not much luck with the bisections this week, I'm afraid:

> #392880--#393237
Bisection points to this V8 roll: https://codereview.chromium.org/1971753002
I haven't bisected inside that yet.

>  #393237--#393465
Failed to bisect this, too many compile errors in the range.

> #393509--#393736
Largest growth is from this Skia roll: https://codereview.chromium.org/1977053002
Nothing conclusive there either.


Some optimizations landed this week:
r269949: More efficient dynamic alloca lowering, saved 77 KB on chrome_child.dll
r270109: Don't reset stack ptr ater noreturn calls, saved 3 KB on chrome_child.dll


Overall, we're back below 9% for the total size:

MSVC (Chromium #395170 32-bit) [0]
=============================================
chrome.exe: 949248 B
chrome.dll: 36833792 B
chrome_child.dll: 47718400 B
Total: 85501440 B
mini_installer.exe: 43928064 B

Clang (Chromium #395131 32-bit, Clang r270253) [1]
=============================================
chrome.exe: 956928 B (+7680 B, +0.81 %)
chrome.dll: 40032256 B (+3198464 B, +8.68 %)
chrome_child.dll: 52049408 B (+4331008 B, +9.08 %)
Total: 93038592 B (+7537152 B, +8.82 %)
mini_installer.exe: 45728256 B (+1800192 B, +4.10 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/7293
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/7895


MSVC (Chromium #395175 64-bit) [0]
=============================================
chrome.exe: 1139712 B
chrome.dll: 51601408 B
chrome_child.dll: 63688192 B
Total: 116429312 B
mini_installer.exe: 51482112 B

Clang (Chromium #395124 64-bit, Clang r270250) [1]
=============================================
chrome.exe: 1043968 B (-95744 B, -8.40 %)
chrome.dll: 47617024 B (-3984384 B, -7.72 %)
chrome_child.dll: 60949504 B (-2738688 B, -4.30 %)
Total: 109610496 B (-6818816 B, -5.86 %)
mini_installer.exe: 48753152 B (-2728960 B, -5.30 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/8327
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/8041

Comment 72 by h...@chromium.org, Jun 21 2016

Cc: dxf@google.com
+dxf fyi

etienneb's stack protector patch got enabled in r272832

For 32-bit, it grew chrome_child.dll by 3.7%:

Chromium: #399992--#400097
Clang: r272825--r272874
Ratio change: 0.039557
MSVC size change:      8704 bytes (0.018 %)
Clang size change:  1884672 bytes (3.664 %)
http://test-results.appspot.com/revision_range?start=399992&end=400097
svn log -r 272825:272874 http://llvm.org/svn/llvm-project

Setting us back quite a bit on chrome_child.dll size, but the effect on mini_installer.exe was not as bad. This is our current status:

Clang (Chromium #401020 32-bit, Clang r273289)
=============================================
chrome.exe: 987136 B (-274944 B, -21.78 %)
chrome.dll: 42216960 B (+3887616 B, +10.14 %)
chrome_child.dll: 53447680 B (+6366208 B, +13.52 %)
Total: 96651776 B (+9978880 B, +11.51 %)
mini_installer.exe: 48955392 B (+2485248 B, +5.35 %)





On 64-bit, we're still ahead. This is the chrome_child.dll impact:

Chromium: #399396--#400083
Clang: r272519--r272872
Ratio change: 0.029185
MSVC size change:    141312 bytes (0.223 %)
Clang size change:  1987584 bytes (3.307 %)

And out current status:

Clang (Chromium #401020 64-bit, Clang r273287)
=============================================
chrome.exe: 1078784 B (-340480 B, -23.99 %)
chrome.dll: 49961984 B (-3166208 B, -5.96 %)
chrome_child.dll: 62221312 B (-1392128 B, -2.19 %)
Total: 113262080 B (-4898816 B, -4.15 %)
mini_installer.exe: 51965440 B (-2962944 B, -5.39 %)

It's a bit mysterious that mini_installer.exe is now smaller than it was before though, and chrome.exe went from -9% to -24% since my last post.

We're still not XOR-ing the stack cookie with esp/ebp, so that will grow us some more.
(We're inserting way more stack probes than MSVC now, so if need be we could hook up /GS to -fstack-protector instead of -fstack-protector-strong. That would give us some of that size back, for fewer stack probes -- but still more than MSVC.)

Comment 74 by ebra...@gnu.org, Jul 13 2016

Cc: ebra...@gnu.org

Comment 75 by h...@chromium.org, Aug 2 2016

Labels: -Clang clang
Fresh numbers below. TL;DR: Clang is still smaller on 64-bit and bigger on 32-bit.


MSVC (Chromium #409309 32-bit) [0]
=============================================
chrome.exe: 990208 B
chrome.dll: 37778944 B
chrome_child.dll: 48536064 B
Total: 87305216 B
mini_installer.exe: 46703104 B

Clang (Chromium #409208 32-bit, Clang r277481) [1]
=============================================
chrome.exe: 975360 B (-14848 B, -1.50 %)
chrome.dll: 42210304 B (+4431360 B, +11.73 %)
chrome_child.dll: 53893120 B (+5357056 B, +11.04 %)
Total: 97078784 B (+9773568 B, +11.19 %)
mini_installer.exe: 49054720 B (+2351616 B, +5.04 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/9555
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/9149



MSVC (Chromium #409321 64-bit) [0]
=============================================
chrome.exe: 1163264 B
chrome.dll: 52850176 B
chrome_child.dll: 64506368 B
Total: 118519808 B
mini_installer.exe: 53902848 B

Clang (Chromium #409208 64-bit, Clang r277481) [1]
=============================================
chrome.exe: 1052672 B (-110592 B, -9.51 %)
chrome.dll: 50437632 B (-2412544 B, -4.56 %)
chrome_child.dll: 63108608 B (-1397760 B, -2.17 %)
Total: 114598912 B (-3920896 B, -3.31 %)
mini_installer.exe: 52147200 B (-1755648 B, -3.26 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/12767
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/9315


The graphs have been moving a bit due to the gyp/gn flips, but will hopefully be calmer going forward.

Comment 76 by h...@chromium.org, Sep 6 2016

New numbers:

MSVC (Chromium #416627 32-bit) [0]
=============================================
chrome.exe: 3222528 B
chrome.dll: 39756288 B
chrome_child.dll: 49308672 B
Total: 92287488 B
mini_installer.exe: 40909312 B

Clang (Chromium #416614 32-bit, Clang r280704) [1]
=============================================
chrome.exe: 972800 B (-2249728 B, -69.81 %)
chrome.dll: 43546624 B (+3790336 B, +9.53 %)
chrome_child.dll: 54266368 B (+4957696 B, +10.05 %)
Total: 98785792 B (+6498304 B, +7.04 %)
mini_installer.exe: 42029568 B (+1120256 B, +2.74 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/10600
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/9752

-----

MSVC (Chromium #416628 64-bit) [0]
=============================================
chrome.exe: 4038144 B
chrome.dll: 55124480 B
chrome_child.dll: 65324032 B
Total: 124486656 B
mini_installer.exe: 44963840 B

Clang (Chromium #416614 64-bit, Clang r280704) [1]
=============================================
chrome.exe: 1045504 B (-2992640 B, -74.11 %)
chrome.dll: 51395584 B (-3728896 B, -6.76 %)
chrome_child.dll: 62758400 B (-2565632 B, -3.93 %)
Total: 115199488 B (-9287168 B, -7.46 %)
mini_installer.exe: 42396672 B (-2567168 B, -5.71 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/18912
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/9891

-----


It looks like the 32-bit size shrunk a bit recently. First, Clang size was reduced in this range:

Chromium: #415032--#416181
Clang: r280011--r280460
Ratio change: -0.003617
MSVC size change:     20480 bytes (0.042 %)
Clang size change:  -153088 bytes (-0.284 %)
http://test-results.appspot.com/revision_range?start=415032&end=416181
svn log -r 280011:280460 http://llvm.org/svn/llvm-project

The actual size decrease seems to have been between #415032 (Clang r280011) and #415179 (Clang r280058). I don't see anything that stands out there though.


Next, here's a range where the size increased a lot for both compilers, but more with MSVC than with Clang:

Chromium: #416513--#416561
Clang: r280649--r280672
Ratio change: -0.005535
MSVC size change:    709632 bytes (1.459 %)
Clang size change:   510976 bytes (0.951 %)
http://test-results.appspot.com/revision_range?start=416513&end=416561
svn log -r 280649:280672 http://llvm.org/svn/llvm-project

The actual increase seems to have occurred in #416544--#416548.
There is a webrtc roll in that range (#416547): https://codereview.chromium.org/2310963002 which seems to contains a compiler optimization change: https://codereview.webrtc.org/2307283002 I suspect that's the cause.

Comment 77 by h...@chromium.org, Jan 11 2017

New numbers:

(The perf waterfall, which provides the 64-bit numbers, is currently down http://crbug.com/680160)


MSVC (Chromium #442872 32-bit) [0]
=============================================
chrome.exe: 939520 B
chrome.dll: 41191424 B
chrome_child.dll: 51298816 B
Total: 93429760 B
mini_installer.exe: 41436672 B

Clang (Chromium #442831 32-bit, Clang r291658) [1]
=============================================
chrome.exe: 968704 B (+29184 B, +3.11 %)
chrome.dll: 46730240 B (+5538816 B, +13.45 %)
chrome_child.dll: 58712576 B (+7413760 B, +14.45 %)
Total: 106411520 B (+12981760 B, +13.89 %)
mini_installer.exe: 43323904 B (+1887232 B, +4.55 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/14113
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/11030


Compared to the last measurement, we've regressed some. Attaching a new plot of the chrome_child.dll size which shows the jumps. I'll dig into those.
size-plot.png
35.4 KB View Download

Comment 78 by h...@chromium.org, Jan 12 2017

I have buildbots running to bisect some of these still, but this is what I have so far:


The two rightmost big ratio increases are due to Bruce working around MSVC compiler deficiencies:
https://codereview.chromium.org/2617463003 (#441416)
https://codereview.chromium.org/2620653004 (#442751)


The big spike before that, around #432000, is due to changes to Clang's inliner
r286814 | jamesm  [InlineCost] Remove skew when calculating call costs
r288024 | jamesm  [InlineCost] Reduce inline thresholds to compensate for cost changes
The second commit recovered most of the difference, but not all of it, leaving the ratio up by about one percentage point.


https://codereview.chromium.org/2361263002/ (#420523) "libvpx: enable high bit depth for vp9" seems to be the kind of change where both builds grew significantly (about half a meg), but the clang one more than MSVC. I don't know why yet.
Cc: brucedaw...@chromium.org
Moving this from  bug 82385 ...

I've been looking at Chrome binary sizes and making fixes to reduce them in the context of VC++.  Some of the tooling might be applicable to investigating the size of the binaries produced by clang-cl.

The size increase is mostly from the code segment - about 5.6 MB in chrome.dll and 7.6 MB in chrome_child.dll in my local (non-official, non-branded) tests.

I used the SymbolSort -diff option to compare the PDBs and it found lots of claimed differences. Some of them may be due to different symbol visibility. For instance, it claims that the kPreloadedHSTSData array is present in clang-cl builds but not VC++ builds and I'm not sure that makes sense.

It claims that this function generates about three times as much code in clang-cl builds, and that seems real:

       22794         code  private: void __thiscall v8::internal::Genesis::InitializeGlobal(class v8::internal::Handle<class v8::internal::JSGlobalObject>,class v8::internal::Handle<class v8::internal::JSFunction>,enum v8::internal::GlobalContextType)

       66384         code  private: void __thiscall v8::internal::Genesis::InitializeGlobal(class v8::internal::Handle<class v8::internal::JSGlobalObject>,class v8::internal::Handle<class v8::internal::JSFunction>,enum v8::internal::GlobalContextType)

The SkColorSpaceXform_XYZ functions seem to generate much more code in clang-cl than in VC++.

There are also claims that the clang-cl binaries have many more duplicate globals, such as 4,550 copies of atomic_histogram_pointer. I'm not sure if these reports are real are spurious - I think VC++ sometimes makes these symbols invisible even to the PDB.

It looks like clang-cl is better at putting global variables in the read-only data segment than VC++ is - perhaps more instances of the const-member bug which causes grief for VC++, although I thought I fixed all the big instances of that.

I could share more results but it is probably more useful to just share a link to the crude documentation I've written which includes Python scripts for high-level comparison of binaries and links to the SymbolSort repo: https://www.chromium.org/developers/windows-binary-sizes

I'm happy to help with the tools or analysis.

Comment 80 by r...@chromium.org, Jan 13 2017

Cc: inglorion@chromium.org
Thanks for the analysis! I bet there's something in the V() expansions in InitializeGlobal that we codegen poorly. Usually we do worse if there's a call that passes a non-trivial C++ object by value (std::string). Either that or we have runaway inlining in that function.

Comment 81 by r...@chromium.org, Jan 13 2017

It looks like we do bad things when passing V8 Handle objects by value.

Consider this code:

struct HandleBase { HandleBase(void *p) : p(p) {} void *p; };
struct Handle : HandleBase { Handle(int *p) : HandleBase(p) {} };
void f(HandleBase o) {}
void g(Handle o) {}

Clang generates a 'byval' prototype for g but not for h. We should generate the same code. This is probably easily fixable.

Comment 82 by r...@chromium.org, Jan 13 2017

Clang r291917 shaves ~23K off of InitializeGlobals:
before: 73,486 bytes
after: 50,299 bytes

More and more I think that if we want to optimize code size, we should focus on call sequences. Things like the store to push conversion changes have been huge wins for us in the past and I think there's more to do here. I filed https://llvm.org/bugs/show_bug.cgi?id=31634 to do something about the harder byval cases that we can't handle today.

I also noticed that we can pass MaybeHandle more efficiently if we remove its destructor. I sent a CL to V8 to do that: https://codereview.chromium.org/2632713003

Comment 83 by r...@chromium.org, Jan 13 2017

Blockedon: 681103

Comment 84 by r...@chromium.org, Jan 13 2017

I looked into atomic_histogram_pointer, and it's interesting enough that I split it off as http://crbug.com/681103. I think those globals are also present in MSVC builds. Either way it seems like there are some size optimization opportunities here.

Maybe the ~4500 duplicate globals number comes from some difference in clang's debug info, though.
Re comment 78: I asked on the vp9 review and that jump is known and tracked in  bug 650028 .

Comment 86 by h...@chromium.org, Jan 14 2017

The perf waterfall seems to be back up, so I got 64-bit numbers:

MSVC (Chromium #443742 64-bit) [0]
=============================================
chrome.exe: 1124352 B
chrome.dll: 57402880 B
chrome_child.dll: 68862976 B
Total: 127390208 B
mini_installer.exe: 45754368 B

Clang (Chromium #443555 64-bit, Clang r291904) [1]
=============================================
chrome.exe: 1043456 B (-80896 B, -7.19 %)
chrome.dll: 54485504 B (-2917376 B, -5.08 %)
chrome_child.dll: 67484160 B (-1378816 B, -2.00 %)
Total: 123013120 B (-4377088 B, -3.44 %)
mini_installer.exe: 43510784 B (-2243584 B, -4.90 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/46182
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/11278
size-plot.png
24.2 KB View Download

Comment 87 by h...@chromium.org, Feb 28 2017

There was a bump here:

Chromium: #445998--#446302
Clang: r293049--r293173
Ratio change: 0.014896
MSVC size change:   -588800 bytes (-1.154 %)
Clang size change:    90112 bytes (0.157 %)
http://test-results.appspot.com/revision_range?start=445998&end=446302
svn log -r 293049:293173 http://llvm.org/svn/llvm-project

Looking closer, the MSVC drop occurs in two steps:
445990	-	51416576.0	-	-
445994	-	51044352.0	-	-
Guessing: https://codereview.chromium.org/2652923002 "Devirtualize Visitor and remove inline visitor specialization."

446044	-	51046912.0	-	-
446054	-	50360832.0	-	-
Probably: https://codereview.chromium.org/2653073002 "Make sure DCHECK compiles out completely if DCHECKS aren't enabled."
This is addressing an MSVC deficiency which explains why the Clang build's size didn't decrease.


The ratio later improved slightly in Clang's favour:

Chromium: #450741--#451205
Clang: r295211--r295409
Ratio change: -0.004902
MSVC size change:    871936 bytes (1.729 %)
Clang size change:   742912 bytes (1.291 %)
http://test-results.appspot.com/revision_range?start=450741&end=451205
svn log -r 295211:295409 http://llvm.org/svn/llvm-project

The MSVC size jumped here:
451108	-	50421760.0	-	-
451120	-	51308544.0	-	-
The range includes https://codereview.chromium.org/2693193005 "[Windows MSVC CFG] Turning on linker CFG for all.  Disabling CFG compile."
The "CFG compile" part only affected chrome_elf, so didn't affect these size numbers (they are for chrome_child.dll). Looks like "/guard:cf" just for the linker imposes ~800 KB chrome_child.dll size increase with MSVC.
It's curious that the Clang size increased less..

Comment 88 by h...@chromium.org, Mar 22 2017

MSVC (Chromium #458862 32-bit) [0]
=============================================
chrome.exe: 3814400 B
chrome.dll: 35646976 B
chrome_child.dll: 51580928 B
Total: 91042304 B
mini_installer.exe: 42200576 B

Clang (Chromium #458789 32-bit, Clang r298521) [1]
=============================================
chrome.exe: 1083392 B (-2731008 B, -71.60 %)
chrome.dll: 39921664 B (+4274688 B, +11.99 %)
chrome_child.dll: 58294784 B (+6713856 B, +13.02 %)
Total: 99299840 B (+8257536 B, +9.07 %)
mini_installer.exe: 43857920 B (+1657344 B, +3.93 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/16456
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/11747



For some reason it seems the MSVC build of chrome.exe has grown from ~1MB to almost 4MB?

I can't get 64-bit numbers at the moment because after the move to logdog, the chromium.perf bot I was using requires authentication and my script can't handle that.

Comment 89 by h...@chromium.org, Mar 22 2017

I don't see the strange chrome.exe size in shipping builds (I checked 59.0.3030.0/win-pgo).
I just happened to notice on recent canary chrome.exe has regressed https://bugs.chromium.org/p/chromium/issues/detail?id=704300, so I guess somewhere between 3030 and 3048.
The sudden bloat of chrome.exe was previously noted and is being tracked by  crbug.com/703622 . Just ignore it and it will go away (it's being worked on).

Comment 92 by h...@chromium.org, Mar 24 2017

> The sudden bloat of chrome.exe was previously noted and is being tracked by  crbug.com/703622 . Just ignore it and it will go away (it's being worked on).

Thanks! Looks like it's back to normal now.


MSVC (Chromium #459395 32-bit) [0]
=============================================
chrome.exe: 1009664 B
chrome.dll: 35691008 B
chrome_child.dll: 51629056 B
Total: 88329728 B
mini_installer.exe: 41851392 B

Clang (Chromium #459380 32-bit, Clang r298699) [1]
=============================================
chrome.exe: 1060864 B (+51200 B, +5.07 %)
chrome.dll: 39960576 B (+4269568 B, +11.96 %)
chrome_child.dll: 58288640 B (+6659584 B, +12.90 %)
Total: 99310080 B (+10980352 B, +12.43 %)
mini_installer.exe: 43758592 B (+1907200 B, +4.56 %)
  
 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/16522
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/11760





MSVC (Chromium #459399 64-bit) [0]
=============================================
chrome.exe: 1200128 B
chrome.dll: 51021824 B
chrome_child.dll: 70262272 B
Total: 122484224 B
mini_installer.exe: 46183936 B
  
Clang (Chromium #459380 64-bit, Clang r298698) [1]
=============================================
chrome.exe: 1148928 B (-51200 B, -4.27 %)
chrome.dll: 48293888 B (-2727936 B, -5.35 %)
chrome_child.dll: 68576256 B (-1686016 B, -2.40 %)
Total: 118019072 B (-4465152 B, -3.65 %)
mini_installer.exe: 44411904 B (-1772032 B, -3.84 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/61758
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/11930

Comment 93 by h...@chromium.org, Apr 24 2017

MSVC (Chromium #466727 32-bit) [0]
=============================================
chrome.exe: 1127936 B
chrome.dll: 35540480 B
chrome_child.dll: 57838592 B
Total: 94507008 B
mini_installer.exe: 42716160 B

Clang (Chromium #466680 32-bit, Clang r301206) [1]
=============================================
chrome.exe: 1178112 B (+50176 B, +4.45 %)
chrome.dll: 39928320 B (+4387840 B, +12.35 %)
chrome_child.dll: 66022912 B (+8184320 B, +14.15 %)
Total: 107129344 B (+12622336 B, +13.36 %)
mini_installer.exe: 44883456 B (+2167296 B, +5.07 %)

 [0]. https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Win/builds/17503
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin/builds/12097



MSVC (Chromium #466728 64-bit) [0]
=============================================
chrome.exe: 1326080 B
chrome.dll: 50353152 B
chrome_child.dll: 79011840 B
Total: 130691072 B
mini_installer.exe: 47134208 B

Clang (Chromium #466668 64-bit, Clang r301198) [1]
=============================================
chrome.exe: 1271296 B (-54784 B, -4.13 %)
chrome.dll: 47489024 B (-2864128 B, -5.69 %)
chrome_child.dll: 76768768 B (-2243072 B, -2.84 %)
Total: 125529088 B (-5161984 B, -3.95 %)
mini_installer.exe: 45317120 B (-1817088 B, -3.86 %)

 [0]. https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/69111
 [1]. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWin64/builds/12200




Looks like 32-bit regressed slightly over the last month.. where?

Chromium: #457737--#458739
Clang: r298069--r298509
Ratio change: 0.003840
MSVC size change:   -138240 bytes (-0.267 %)
Clang size change:    42496 bytes (0.073 %)
http://test-results.appspot.com/revision_range?start=457737&end=458739
svn log -r 298069:298509 http://llvm.org/svn/llvm-project

Looks like MSVC dropped size in two main places:
458586	-	51673088.0	-	-
458602	-	51665920.0	-	-
Can't see anything obvious: http://test-results.appspot.com/revision_range?start=458586&end=458602

458711	-	51667968.0	-	-
458716	-	51596288.0	-	-
Can't see anything obvious: http://test-results.appspot.com/revision_range?start=458711&end=458716

---

Chromium: #462849--#463120
Clang: r299771--r299809
Ratio change: 0.007656
MSVC size change:   -791552 bytes (-1.523 %)
Clang size change:  -501248 bytes (-0.855 %)
http://test-results.appspot.com/revision_range?start=462849&end=463120
svn log -r 299771:299809 http://llvm.org/svn/llvm-project

Narrow range and msvc sizes:
463026	-	52005888.0	-	-
463046	-	51194368.0	-	-
http://test-results.appspot.com/revision_range?start=463026&end=463046
It's not clear to me what changed.. maybe https://codereview.chromium.org/2795883002?

---

Chromium: #464049--#464648
Clang: r300072--r300290
Ratio change: 0.004862
MSVC size change:    977408 bytes (1.909 %)
Clang size change:  1362432 bytes (2.345 %)
http://test-results.appspot.com/revision_range?start=464049&end=464648
svn log -r 300072:300290 http://llvm.org/svn/llvm-project

msvc changed a lot here:
464104	-	51209728.0	-	-
464119	-	53055488.0	-	-
http://test-results.appspot.com/revision_range?start=464104&end=464119
How can I not see what happened here? It's a 1.76 MB growth :-(

---

Chromium: #466026--#466319
Clang: r300854--r300970
Ratio change: 0.001786
MSVC size change:   5677568 bytes (10.873 %)
Clang size change:  6573568 bytes (11.047 %)
http://test-results.appspot.com/revision_range?start=466026&end=466319
svn log -r 300854:300970 http://llvm.org/svn/llvm-project

466251	-	52239360.0	-	-
466259	-	57895424.0	-	-
http://test-results.appspot.com/revision_range?start=466251&end=466259

It's a 5 MB growth! On my graph it looks like something landed and was reverted multiple times. Looks like this one: https://codereview.chromium.org/2762593002
Wow - 5 MB growth is impressive! The size bots should be monitored and bugs filed, although I don't see any. Maybe it's one of those size regressions that only happens on some build configurations, and this is not the configuration that is monitored? Or else that change got reverted so quickly that the size bots ignored it.

With before/after PDBs it's pretty straightforward to compare them with SymbolSort's -diff option, as described here, if somebody wants to drill down and figure out exactly what is going on. Source sets? That's often the problem.

rnk@ https://chromium-review.googlesource.com/c/538463/ seems to cause performance regression, can you take a look at the following bug?

https://bugs.chromium.org/p/chromium/issues/detail?id=735910

Comment 98 by r...@chromium.org, Jun 28 2017

@loorongjie I'd say that I don't have time to investigate the  blink_perf.bindings regression. If you're confident that this is a real regression, feel free to revert that CL. It should be a simple net improvement, but it could be causing some other bad inlining behavior.
Status: Fixed (was: Started)
Comparing 64.0.3278.0 clang and 64.0.3278.2 win-pgo builds, the mini_installer.exe is on par for 32-bit and 6% smaller for 64-bit. I think we're done here.

Sign in to add a comment