Issue metadata
Sign in to add a comment
|
0.6%-1.5% regression in sizes when switching from link.exe to lld |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
May 3 2018
hans: Looks like your linker change had a large step-up in the overall binary size. PTAL, or WontFix if this was expected and is acceptable.
,
May 4 2018
Thanks! The linker change was reverted shortly afterwards, but we'll need to figure out what's going on before landing the change again.
,
May 9 2018
Taking a look.
,
May 10 2018
Can reproduce locally. /mnt/c/src/c/src/out$ cat official_link/args.gn use_lld = false is_debug = false is_official_build = true strip_absolute_paths_from_debug_symbols = true /mnt/c/src/c/src/out$ cat official_lld/args.gn use_lld = true is_debug = false is_official_build = true strip_absolute_paths_from_debug_symbols = true /mnt/c/src/c/src/out$ ls -l official_l*/chrome_child.dll -rwxrwxrwx 1 root root 76966912 May 9 16:34 official_link/chrome_child.dll -rwxrwxrwx 1 root root 77565440 May 9 16:19 official_lld/chrome_child.dll /mnt/c/src/c/src/out$ ls -l official_l*/chrome.dll -rwxrwxrwx 1 root root 52494848 May 9 16:25 official_link/chrome.dll -rwxrwxrwx 1 root root 53294592 May 9 14:38 official_lld/chrome.dll /mnt/c/src/c/src/out$ ls -l official_l*/chrome.exe -rwxrwxrwx 1 root root 1506816 May 9 16:34 official_link/chrome.exe -rwxrwxrwx 1 root root 1489408 May 9 16:20 official_lld/chrome.exe
,
May 10 2018
The main cause is that our .pdata sections are much larger. $ size -A official_*/chrome_child.dll official_link/chrome_child.dll : section size addr .text 61190316 6442455040 .rdata 12262174 6503649280 .data 669184 6515912704 .pdata 2167860 6518079488 .rodata 3840 6520250368 CPADinfo 56 6520254464 prot 37 6520258560 .rsrc 52376 6520262656 .reloc 617184 6520315904 Total 76963027 official_lld/chrome_child.dll : section size addr .text 61261010 6442455040 .rdata 12218708 6503718912 .data 668160 6515941376 .pdata 2708616 6518108160 .00cfg 32 6520819712 .gfids 2820 6520823808 .rodata 3840 6520827904 .tls 45 6520832000 CPADinfo 56 6520836096 prot 41 6520840192 .rsrc 52376 6520844288 .reloc 644236 6520897536 Total 77559940 $ size -A official_*/chrome.dll official_link/chrome.dll : section size addr .text 42332808 6442455040 .rdata 7526084 6484791296 .data 69632 6492319744 .pdata 1946364 6493257728 CPADinfo 56 6495207424 prot 28 6495211520 .rodata 3584 6495215616 .rsrc 125984 6495219712 .reloc 486800 6495346688 Total 52491340 official_lld/chrome.dll : section size addr .text 42386103 6442455040 .rdata 7541052 6484844544 .data 68608 6492389376 .pdata 2652444 6493339648 .00cfg 32 6495993856 .gfids 2792 6495997952 .rodata 3584 6496002048 .tls 37 6496006144 CPADinfo 56 6496010240 prot 32 6496014336 .rsrc 125976 6496018432 .reloc 508872 6496145408 Total 53289588 Now to figure out why that is.
,
May 10 2018
At least one reason is that we don't ICF pdata sections when ICFing text sections.
C:\src\tmp\pdata-test>type 1.cpp
void g();
void f1() {
g();
}
void f2() {
g();
}
C:\src\tmp\pdata-test>type 2.cpp
void g() {}
C:\src\tmp\pdata-test>cl /c /EHsc 1.cpp 2.cpp /Gy
Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25835 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
1.cpp
2.cpp
Generating Code...
C:\src\tmp\pdata-test>link /noentry /opt:icf 1.obj 2.obj /dll /map:1.map /include:?f1@@YAXXZ /include:?f2@@YAXXZ
Microsoft (R) Incremental Linker Version 14.12.25835.0
Copyright (C) Microsoft Corporation. All rights reserved.
C:\src\tmp\pdata-test>wsl grep f[12] 1.map
0001:00000000 ?f2@@YAXXZ 0000000180001000 f 1.obj
0001:00000000 ?f1@@YAXXZ 0000000180001000 f 1.obj
0002:0000007c $unwind$?f2@@YAXXZ 000000018000207c 1.obj
0002:0000007c $unwind$?f1@@YAXXZ 000000018000207c 1.obj
0003:00000000 $pdata$?f2@@YAXXZ 0000000180003000 1.obj
0003:00000000 $pdata$?f1@@YAXXZ 0000000180003000 1.obj
C:\src\tmp\pdata-test>..\..\l\ra\bin\lld-link /noentry /opt:icf 1.obj 2.obj /dll /lldmap:1l.map /include:?f1@@YAXXZ /include:?f2@@YAXXZ
C:\src\tmp\pdata-test>wsl grep f[12] 1l.map
00001000 000000f2 4096 .text
00001000 00000000 0 "void __cdecl f1(void)" (?f1@@YAXXZ)
00001000 00000000 0 "void __cdecl f2(void)" (?f2@@YAXXZ)
00002100 00000000 0 $unwind$?f1@@YAXXZ
00002100 00000000 0 $unwind$?f2@@YAXXZ
00004000 00000000 0 $pdata$?f1@@YAXXZ
0000400c 00000000 0 $pdata$?f2@@YAXXZ
,
May 10 2018
If we fix this, we should try to fix pr35337 at the same time.
,
May 10 2018
I have a fix in progress for pdata. With that the chrome_child.dll size looks like this: chrome_child.dll : section size addr .text 61261218 6442455040 .rdata 12218788 6503718912 .data 668160 6515941376 .pdata 2176392 6518108160 .00cfg 32 6520287232 .gfids 2820 6520291328 .rodata 3840 6520295424 .tls 45 6520299520 CPADinfo 56 6520303616 prot 41 6520307712 .rsrc 52376 6520311808 .reloc 644236 6520365056 Total 77028004 So we're much closer to link.exe's size. Our .text is still 70KB larger, and our .reloc is 27KB larger for some reason, but we're getting there.
,
May 10 2018
,
May 10 2018
10KB in chrome_child.dll .text: https://reviews.llvm.org/D46673
,
May 10 2018
,
May 10 2018
With the above changes, link.exe's PDB publics stream (i.e. symbol table) has about the same number of unique entries as ours: /mnt/c/src/c/src/out$ ../../../l/ra/bin/llvm-pdbutil.exe dump -publics official_lld/chrome_child.dll.pdb | grep S_PUB32 | cut -d'`' -f2 | sort | uniq | wc -l 408814 /mnt/c/src/c/src/out$ ../../../l/ra/bin/llvm-pdbutil.exe dump -publics official_link/chrome_child.dll.pdb | grep S_PUB32 | cut -d'`' -f2 | sort | uniq | wc -l 409203 But theirs has significantly fewer unique addresses: /mnt/c/src/c/src/out$ ../../../l/ra/bin/llvm-pdbutil.exe dump -publics official_lld/chrome_child.dll.pdb | grep 'addr =' | cut -d, -f2 | sort | uniq | wc -l 336541 /mnt/c/src/c/src/out$ ../../../l/ra/bin/llvm-pdbutil.exe dump -publics official_link/chrome_child.dll.pdb | grep 'addr =' | cut -d, -f2 | sort | uniq | wc -l 331031 Hypothesis: link.exe is ICFing more than we do.
,
May 10 2018
It looks like link.exe is ICFing vtables (i.e. sections with leader symbol names prefixed with "??_7").
,
May 10 2018
With a patch that enables ICF on vtables our size now looks like this: chrome_child.dll : section size addr .text 61188978 6442455040 .rdata 12076884 6503645184 .data 668160 6515724288 .pdata 2167284 6517891072 .00cfg 32 6520061952 .gfids 2820 6520066048 .rodata 3840 6520070144 .tls 45 6520074240 CPADinfo 56 6520078336 prot 41 6520082432 .rsrc 52376 6520086528 .reloc 611260 6520139776 Total 76771776 So we're now 200KB smaller than link.exe.
,
May 10 2018
That we produce smaller executable than link.exe might be a signal that we are doing something more aggressive than link.exe, though as long as all tests pass, I'm not too worried.
,
May 10 2018
The smaller rodata is most likely due to string tail merging (i.e. issue 810154).
,
May 10 2018
Ah, that makes sense. Nice.
,
May 10 2018
ICF on vtables: https://reviews.llvm.org/D46734
,
May 11 2018
Awesome progress. If you still have your build directories around, how does mini_installer size compare now? (Since that's the bits we ship to users for the initial install, that's probably the size we care about most. mini_installer.exe mostly contains of a 7zip-compressed resource containing setup.exe)
,
May 11 2018
Sizes: Directory of C:\src\c\src\out\official_lld 05/11/2018 12:26 PM 46,179,840 mini_installer.exe Directory of C:\src\c\src\out\official_link 05/11/2018 12:16 PM 45,998,080 mini_installer.exe So our mini_installer is still around 180KB larger. That's strange because all of chrome.dll/chrome_child.dll/chrome.exe/setup.exe are smaller uncompressed. Are there any other files that I should be looking at?
,
May 11 2018
The full list of files seems to be in chrome.7z, and almost all of them are smaller uncompressed (one is larger by a few bytes). The totals are: /mnt/c/src/c/src/out/official_link$ ls -l chrome.7z -rwxrwxrwx 1 root root 188779025 May 11 12:14 chrome.7z /mnt/c/src/c/src/out/official_lld$ ls -l chrome.7z -rwxrwxrwx 1 root root 188378273 May 11 12:54 chrome.7z Here's what they look like after compression: /mnt/c/src/c/src/out/official_link$ ls -l chrome.packed.7z -rwxrwxrwx 1 root root 45108820 May 11 12:15 chrome.packed.7z /mnt/c/src/c/src/out/official_lld$ ls -l chrome.packed.7z -rwxrwxrwx 1 root root 45305317 May 11 12:55 chrome.packed.7z So it does indeed seem to be a compression issue. I compressed chrome_child.dll on its own using the same flags as the script that created chrome.packed.7z, and the compressed file was 80KB larger. /mnt/c/src/c/src/out/official_lld$ 7z a -t7z -m0=BCJ2 -m1=LZMA:d27:fb128 -m2=LZMA:d22:fb128:mf=bt2 -m3=LZMA:d22:fb128:mf=bt2 -mb0:1 -mb0s1:2 -mb0s2:3 chrome_child.7z chrome_child.dll 7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,40 CPUs) Scanning Creating archive chrome_child.7z Compressing chrome_child.dll Everything is Ok /mnt/c/src/c/src/out/official_lld$ ls -l chrome_child.dll -rwxrwxrwx 1 root root 76776448 May 11 12:53 chrome_child.dll /mnt/c/src/c/src/out/official_lld$ ls -l chrome_child.7z -rwxrwxrwx 1 root root 22919759 May 11 13:41 chrome_child.7z /mnt/c/src/c/src/out/official_lld$ cd ../official_link /mnt/c/src/c/src/out/official_link$ 7z a -t7z -m0=BCJ2 -m1=LZMA:d27:fb128 -m2=LZMA:d22:fb128:mf=bt2 -m3=LZMA:d22:fb128:mf=bt2 -mb0:1 -mb0s1:2 -mb0s2:3 chrome_child.7z chrome_child.dll 7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,40 CPUs) Scanning Creating archive chrome_child.7z Compressing chrome_child.dll Everything is Ok /mnt/c/src/c/src/out/official_link$ ls -l chrome_child.7z -rwxrwxrwx 1 root root 22837535 May 11 13:42 chrome_child.7z /mnt/c/src/c/src/out/official_link$ ls -l chrome_child.dll -rwxrwxrwx 1 root root 76966912 May 9 16:34 chrome_child.dll I'm not aware of any major differences in output at the moment, with the exception of string tail merging. Let's see what happens if I turn that off.
,
May 11 2018
Disabling string tail merging does this to the size: 05/11/2018 02:19 PM 46,039,040 mini_installer.exe 05/11/2018 02:18 PM 188,669,211 chrome.7z 05/11/2018 02:19 PM 45,142,913 chrome.packed.7z So it helps with compressed size: our mini_installer is still larger, but only by 40KB. Our uncompressed size is still smaller by 100KB. For now I think we should put string tail merging behind a flag and then disable it in Chrome. Later it may be possible to reimplement string tail merging in a way that doesn't disturb section layout as much as the current implementation does, which should hopefully improve compressibility. Another thing is that we still have more unique addresses in our publics stream than link.exe (currently 331910 in chrome_child.dll) so we may be able to improve ICF to hopefully get us some of that last 40KB.
,
May 11 2018
Tail merging flag: https://reviews.llvm.org/D46780
,
May 11 2018
To help me find missing ICF opportunities I wrote a program (attached) that takes llvm-pdbutil output and prints the names of symbols with non-unique addresses. One of the things that it told me was that link.exe is ICFing sections with different alignments, and we aren't. It's an easy fix (and was already implemented in the ELF linker) so I should have a patch soon.
,
May 12 2018
Max alignment: https://reviews.llvm.org/D46786 Sizes look like this now: 05/11/2018 04:59 PM 46,011,904 mini_installer.exe 05/11/2018 04:58 PM 188,609,638 chrome.7z 05/11/2018 04:59 PM 45,115,355 chrome.packed.7z
,
May 12 2018
That puts us at 330562 unique addresses, which is less than link.exe.
,
May 14 2018
All patches have landed as of r332273.
,
May 16 2018
I looked at builds with the latest Clang roll (crrev.com/559001) to see where we're at. args.gn: is_chrome_branded = true is_debug = false is_official_build = true strip_absolute_paths_from_debug_symbols = true use_goma = true use_lld = false link.exe: 05/16/2018 10:57 AM 50,569,216 mini_installer.exe lld: 05/16/2018 11:05 AM 50,739,200 mini_installer.exe Same builds args.gn with an added 'target_cpu = "x86"': link.exe: 05/16/2018 11:31 AM 51,068,416 mini_installer.exe lld: 05/16/2018 11:41 AM 51,290,112 mini_installer.exe Seems a bit unfair given that our DLLs are actually smaller, but trying to emit fluffy but compressible binaries also seems weird so I'm not sure what else we should do..
,
May 16 2018
Is the section ordering very different with link.exe and lld? Worse compression can be that similar blocks are further apart in the binary. But it is not fair considering compression tools are optimized to do as good job as possible on the exact kind of binaries that cl.exe+link.exe generates. Also the constants in CompressUsingLZMA in chrome/tools/build/win/create_installer_archive.py are chosen to work as well as possible with the binaries at the time (2011 judging from the git history).
,
May 16 2018
Re-running the measurements with string tail merging disabled as suggested by Nico:
diff --git a/build/config/win/BUILD.gn b/build/config/win/BUILD.gn
index ee8e5e833b0d..1fc409771b13 100644
--- a/build/config/win/BUILD.gn
+++ b/build/config/win/BUILD.gn
@@ -118,6 +118,11 @@ config("compiler") {
"/FIXED:NO",
]
+ if (use_lld) {
+ # XXX
+ ldflags += [ "/OPT:NOLLDTAILMERGE" ]
+ }
+
# TODO(siggi): Is this of any use anymore?
# /PROFILE ensures that the PDB file contains FIXUP information (growing the
# PDB file by about 5%) but does not otherwise alter the output binary. It
64-bit:
link.exe: 05/16/2018 02:49 PM 50,586,624 mini_installer.exe
lld: 05/16/2018 02:59 PM 50,609,664 mini_installer.exe
32-bit:
link.exe: 05/16/2018 11:31 AM 51,068,416 mini_installer.exe
lld: 05/16/2018 01:50 PM 51,156,992 mini_installer.exe
,
May 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9a07080c87ed7c5ed7dc450c2a7a842fa3ada6e9 commit 9a07080c87ed7c5ed7dc450c2a7a842fa3ada6e9 Author: Hans Wennborg <hans@chromium.org> Date: Thu May 17 08:45:37 2018 Disable string tail merging when linking with lld String tail merging leads to smaller binaries, but they don't compress as well, leading to increased mini_installer size. This reduces mini_installer.exe size by 130 KB and increases chrome.dll and chrome_child.dll by 72 KB and 154 KB, respectively for 32-bit (64-bit is in the same ballpark). (This depends on Clang/LLD being rolled past r332149.) Bug: 838449 Change-Id: I1653e8d522371edb263f742f55ca1aa27ed5dd62 Reviewed-on: https://chromium-review.googlesource.com/1061733 Commit-Queue: Hans Wennborg <hans@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#559459} [modify] https://crrev.com/9a07080c87ed7c5ed7dc450c2a7a842fa3ada6e9/build/config/win/BUILD.gn
,
May 17 2018
Let's call this fixed.
,
May 19 2018
Issue 844607 has been merged into this issue. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, May 1 2018