LLD is 3.2x slower when linking Chrome with FullLTO than Gold |
||
Issue description
Chrome Version: tip
OS: Linux x86-64
What steps will reproduce the problem?
(1) Build chrome with Gold:
$ gn gen out/gold '--args=is_chrome_branded=true is_official_build=true is_debug=false is_cfi=true use_lld=false allow_posix_link_time_opt=true' --check
$ ninja -C out/gold chrome
(2) Build chrome with LLD:
$ gn gen out/lld '--args=is_chrome_branded=true is_official_build=true is_debug=false is_cfi=true use_lld=true allow_posix_link_time_opt=true' --check
$ ninja -C out/lld chrome
(3) Compare link times:
$ cat out/gold/.ninja_log | grep -i "chrome[^/._a-z-]"
{start_time_in_ms} {end_time_in_ms} 0 chrome 3a8f93a3257da044
$ cat out/lld/.ninja_log | grep -i "chrome[^/._a-z-]"
{start_time_in_ms} {end_time_in_ms} 0 chrome 3b1e8419b639d435
What is the expected result?
LLD and Gold have the same performance. If LLD is slightly faster, that's a nice bonus.
What happens instead?
LLD is 3.2x slower. It takes 5537 seconds vs 1704 in the Gold case.
,
Apr 27 2017
Previous measurements didn't show anything wrong (Dec 2016): https://bugs.chromium.org/p/chromium/issues/detail?id=672158
,
Apr 27 2017
Tested the performance of linking cc_unittests: Gold: 280s LLD: 322s The 15% difference might be explained by the fact that Gold is linked with tcmalloc, and LLD is not. Last time is provided speed up of 25%, see https://crbug.com/583551 , so LLD might be actually faster is linked with tcmalloc. Which makes me wonder what's wrong with chrome target. The primary suspect is its size, and an O(n^2) algorithm somewhere inside may only be visible on larger targets. I will build a ToT toolchain and try to collect profile for LLD.
,
Apr 27 2017
This seems still odd to me as LLD is usually 2x or even 3x faster than gold.
,
Apr 27 2017
I am as surprised as you (if not more) as I have found this as a last-minute safety check before enabling LLD for all LTO configurations on Chrome Linux: https://codereview.chromium.org/2844883002/ (obviously, I decided not to do so)
,
Apr 27 2017
I am working towards having a profile. Will take some time.
,
Apr 28 2017
N^2 found. It's inside void lld::elf::writeMapFile<llvm::object::ELFType<(llvm::support::endianness)1, true> >(llvm::ArrayRef<lld::elf::OutputSection*>). It's a recent regression in a sense that it only hits when -Map is specified *and* we have a lot of sections (true for LTO case, as we just enabled function and data sections)
,
Apr 28 2017
,
Apr 28 2017
Function sections and data sections were enabled by pcc@ in https://codereview.chromium.org/2840723003/ (just 2 days ago)
,
Apr 28 2017
Actions to be taken: 1. Rui (ruiu@) promised to implement a more optimal way to generate Map files that does not have O(n^2) complexity 2. We need to understand why does this map file is even needed. Andrew, can we avoid doing this by default?
,
Apr 28 2017
I've written a tool for collecting binary size information from .map files: https://cs.chromium.org/chromium/src/tools/binary_size/README.md It allows diffing the set of symbols between two builds, and also running queries over the set of all symbols. The original version of the tool gets all of the symbol data from the linker's .map file, but I recently found you can get most of it from nm as well. But still, what .map has that nm does not: - Entries for ** merge strings - Object path attribution for anonymous symbols (.map shows a symbol with object path) Our focus so far has mostly been for Android, but it's been nice to verify that the tool works as well on Linux (although I haven't tested with lld). If this is blocking linux in any way though, disabling would be fine. If we ever get around to caring about Linux binary size, I may try an turn the .map generation back on, or just live with the slightly reduced accuracy of what nm provides.
,
Apr 28 2017
As an aside (since I might have the right audience for this question): Does anyone know why linker maps contain symbols like:
.text._ZN5blink18PerformanceMonitor4WillERKNS_5probe12UpdateLayoutE
0xffffffffffffffff 0x2a obj/third_party/WebKit/Source/core/frame/libframe.a(PerformanceMonitor.o)
0x00695e19 blink::PerformanceMonitor::Will(blink::probe::UpdateLayout const&)
.text._ZN5blink9FrameView21PerformPreLayoutTasksEv
0x006a454c 0x14c obj/third_party/WebKit/Source/core/frame/libframe.a(FrameView.o)
0x006a454d blink::FrameView::PerformPreLayoutTasks()
Looking through the code a bit here: https://github.com/gittup/binutils/blob/HEAD/gold/mapfile.cc
There's just a comment that "special handling" is required.
There seems to be little consistency between builds as to where these symbols occur, and they come with large gaps. Furthermore, we had a recent perf alert ( bug 709050 ), where a gap grew by 60kb as a result of changing the order in which .o files are fed to the linker.
The list of them for libchrome.so from my build today (via supersize tool):
t@0x6a454c 59146 third_party/WebKit/Source/core/frame/FrameView.cpp
** symbol gap 0
t@0xaa0a6c 34096 third_party/icu/source/i18n/nfsubs.cpp
** symbol gap 1
t@0xe949c0 672 device/bluetooth/bluetooth_remote_gatt_characteristic.cc
** symbol gap 2
t@0x1288cb0 760 v8/src/compiler/js-operator.cc
** symbol gap 3
t@0x1683fb8 29470 third_party/WebKit/Source/web/WebAXObject.cpp
** symbol gap 4
t@0x1a8f2c6 95038 third_party/libvpx/source/libvpx/vp9/encoder/vp9_multi_thread.c
** symbol gap 5
t@0x1e97878 83390 components/invalidation/impl/invalidation_logger.cc
** symbol gap 6
t@0x22aeb44 144152 third_party/WebKit/Source/core/input/TouchEventManager.cpp
** symbol gap 7
The second field is the size. Together, about 440kb!
The nm output for address range of symbol gap 0 show:
00695e19 t blink::PerformanceMonitor::Will(blink::probe::UpdateLayout const&)
00696004 N $d
00697390 N $d
0069743c N $d
00697500 N $d
0069799c N $d
006979bc N $d
00697a74 N $d
00697d10 N $d
00697f34 N $d
00697f9c N $d
006984a8 N $d
006985c4 N $d
00698980 N $d
0069945c N $d
00699658 N $d
00699718 N $d
006998a8 N $d
0069995c N $d
006999cc N $d
0069a6fc N $d
0069a778 N $d
0069a908 N $d
0069a9c0 N $d
0069aaa4 N $d
0069b178 N $d
0069b578 N $d
0069b830 N $d
0069bb50 N $d
0069bbe8 N $d
0069c904 N $d
0069d1bc N $d
0069ea94 N $d
0069eb74 N $d
006a00d4 N $d
006a02c8 N $d
006a0990 N $d
006a1174 N $d
006a1edc N $d
006a21e4 N $d
006a2204 N $d
006a2238 N $d
006a28f0 N $d
006a29f0 N $d
006a2b40 N $d
006a2d74 N $d
006a30cc N $d
006a3298 N $d
006a3394 N $d
006a3490 N $d
006a34f8 N $d
006a3cd8 N $d
006a3d6c N $d
006a454c t $t
006a454d t blink::FrameView::PerformPreLayoutTasks()
According to nm's manpage, "N" is a debugging symbol.
So.. yeah.. anyone have insights or know the best place to ask?
,
Apr 28 2017
Rui landed a fix for the map file generator in r301659.
,
Apr 28 2017
Re: #12. Andrew, this is from the regular, non-LTO build correct? (LTO = LinkTimeOptimization).
,
Apr 28 2017
I believe that will fix the issue that the feature significantly slowed down the build, but it is still better to remove the -Map option from the command line because 1) the existing script that reads -Map outputs will stop working after switching to LLD because LLD's -Map format is different from gold's, and 2) even if the -Map option is not very slow it is still fairly heavy option as it generates a hundreds megabytes of text file. Always adding that option is probably not a good idea.
,
Apr 28 2017
Hi Rui, thank you for fixing it! Yes, I am disabling it for now for exactly these reasons: https://codereview.chromium.org/2849583005/ (also, because we'll only get your fix only after the next Clang roll; want to make the switch to LLD before that)
,
Apr 28 2017
re #15 - the size of the files isn't too bad since the build rules gzips it. For libchrome.so, the .map.gz is ~24MB for a 630MB .so file. The fact that it will break for lld is a good enough reason though. re #14 - This is for an android build. So, gcc+gold without lto.
,
Apr 28 2017
Sorry, I meant for the linker (or any other program) it is not cheap to produce a hundreds of megabytes of text data (in the uncompressed form) containing tons of demangled symbols because symbol demangling is fairly complicated and slow.
,
Apr 28 2017
Re #17: just an fyi, but chrome build with LinkTimeOptimization is 2.5 GB. Re #18: can linker do this concurrently?
,
Apr 28 2017
Yes, and that's my most recent patch does.
,
Apr 28 2017
Yikes! 2.5 GB is quite the jump from 225MB (which is what Android's currently is non-gzipped). If lld is not going to follow the format of ld / gold's linker file, maybe this is an opportunity to make a smaller format? E.g. don't bother with demangling, separate out discarded sections as it's own --option, skip outputting debug sections from section map (maybe behind --option)
,
Apr 28 2017
Re #20: I have only now taken a look at your fix (https://reviews.llvm.org/rL301659). Thank you for your forward thinking.
,
Apr 28 2017
Re #25: this is before strip; after strip is around 105 MB, I believe. Will look up the real number in a sec.
,
Apr 28 2017
From sizes step on the bot: https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Linux%20x64/builds/17583 RESULT chrome: chrome= 1922789048 bytes RESULT chrome-stripped: stripped= 118788016 bytes RESULT chrome-text: text= 112688145 bytes RESULT chrome-data: data= 6094976 bytes RESULT chrome-bss: bss= 1982577 bytes RESULT chrome-si: initializers= 8 files RESULT chrome-textrel: textrel= 0 relocs It's 2 GB before strip and 118 MB after.
,
Apr 28 2017
Re #21: You can disable demangling by passing --no-demangle to the linker, but I'm not sure if you want it. I don't think we want to change the format only to reduce the size, as the format was chosen to be readable for human.
,
Apr 28 2017
aaah, okay, I thought you were saying the .map file was 2.5GB.
,
Apr 28 2017
Re #26: no, I meant the binary size. Sorry for the confusion. :)
,
Apr 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/317d9edab7ba87a5c05564ca8c66a12911c46872 commit 317d9edab7ba87a5c05564ca8c66a12911c46872 Author: krasin <krasin@chromium.org> Date: Fri Apr 28 20:34:01 2017 Disable linker map for Clang toolchains. Right now, we are transitioning to LLD, and LLD has a different map format (so, the analysis tool will need to be updated), as well as suboptimal implementation that adds 60 minutes to the Chrome link time. This will be reenabled when both of the blockers above are resolved. BUG= 716209 , 607968 Review-Url: https://codereview.chromium.org/2849583005 Cr-Commit-Position: refs/heads/master@{#468113} [modify] https://crrev.com/317d9edab7ba87a5c05564ca8c66a12911c46872/build/toolchain/gcc_toolchain.gni
,
May 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0f24feacbffe77af7d69f64e739c73d834fa19b4 commit 0f24feacbffe77af7d69f64e739c73d834fa19b4 Author: estevenson <estevenson@chromium.org> Date: Thu May 18 16:17:42 2017 Use a GN arg for controlling linker map file generation. Map files aren't generated on Linux by default now that linker map file generation is hooked into !use_lld ( https://crbug.com/716209 ). This CL adds a GN arg, enable_linker_map, that defaults to true for Android and makes it easy to generate map files on Linux. BUG= 717550 , 716209 Review-Url: https://codereview.chromium.org/2888623003 Cr-Commit-Position: refs/heads/master@{#472834} [modify] https://crrev.com/0f24feacbffe77af7d69f64e739c73d834fa19b4/build/toolchain/gcc_toolchain.gni [modify] https://crrev.com/0f24feacbffe77af7d69f64e739c73d834fa19b4/build/toolchain/toolchain.gni [modify] https://crrev.com/0f24feacbffe77af7d69f64e739c73d834fa19b4/tools/binary_size/diagnose_bloat.py
,
May 18 2017
Fixed as reported in https://bugs.chromium.org/p/chromium/issues/detail?id=607968#c80 .
,
Nov 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9d63558169c2945762c05b2809200736a40dcee2 commit 9d63558169c2945762c05b2809200736a40dcee2 Author: Daniel Bratell <bratell@opera.com> Date: Tue Nov 20 14:10:57 2018 Update some documentation for the binary_size tool The documentation didn't include the crucial information that generate_linker_map has to be set to true in gn for the tool to work in Linux. Bug: 716209 Change-Id: I62919e26a80a0b5d592f6b8953762e35575142d2 Reviewed-on: https://chromium-review.googlesource.com/c/1338085 Reviewed-by: agrieve <agrieve@chromium.org> Commit-Queue: Daniel Bratell <bratell@opera.com> Cr-Commit-Position: refs/heads/master@{#609686} [modify] https://crrev.com/9d63558169c2945762c05b2809200736a40dcee2/build/toolchain/toolchain.gni [modify] https://crrev.com/9d63558169c2945762c05b2809200736a40dcee2/tools/binary_size/README.md [modify] https://crrev.com/9d63558169c2945762c05b2809200736a40dcee2/tools/binary_size/libsupersize/archive.py |
||
►
Sign in to add a comment |
||
Comment 1 by krasin@chromium.org
, Apr 27 2017Summary: LLD is 3.2x slower when linking Chrome with FullLTO than Gold (was: LLD is 3.2x slower when linking Chrome)