New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 661019 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

clang in chroot is slow

Project Member Reported by laszio@chromium.org, Nov 1 2016

Issue description

when configuring coreutils,

outside chroot:
gcc: 55 sec
clang-3.9: 58 sec

inside chroot:
gcc: 64 sec
clang-3.9: 290 sec

The gcc versions are slightly different. clang come from the same revisions but are slightly different in patches.
 

Comment 1 by g...@chromium.org, Nov 1 2016

This is interesting to me, since I was bitten by it when dealing with FORTIFY.

Running perf on clang shows that a lot of time is spent in ld-2.19.so. Looking at what ld has to say about this: 

$ LD_DEBUG=all gcc |& wc -l
6533

$ LD_DEBUG=all clang |& wc -l
4239341

...So, it looks like the linker is doing *a ton* of work just to get clang ready to go.

Looking at the output of LD_DEBUG, it's sorta-clear why: clang loads *way* more symbols, and has **way** more .sos to load from. Picking a random symbol:

$ grep 'symbol=_ZTVN5clang4ento15AnalysisManagerE;  lookup in' ld_debug.clang | wc -l
120

So we looked in 120 different .sos just to find this one clang symbol.

A dumb script I wrote says that loading clang requires resolving 33,825 symbols, while GCC needs 1,073. This script also tells me that GCC needs to perform 4,605 lookups to resolve all of its symbols, whereas clang needs to do 4,175,281 (!!). Meaning that clang not only has 33x more symbols to load, but it needs to do ~30x more lookups per symbol (on average) than GCC.

Attached is what the linker has to say about where it looked for a randomly selected symbol.

(FWIW, it looks like nearly 10,000 of the symbols we're looking up are in the `llvm` namespace; I'm not sure where the other 2/3 of them are coming from, but making those go away might be a good start...)
sample_lookup.txt
8.8 KB View Download
Cc: vapier@chromium.org
The clang / llvm in cros chroot is built with -DBUILD_SHARED_LIBS=ON, which is pulled from upstream. Removing it make it almost as fast as gcc. Any objections?

https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/sys-devel/llvm/llvm-3.9_pre265926-r13.ebuild#323
did you check the disk size and installed set of files ?  and what it changes about binaries being linked ?
Did you mean the size and files install on DUT? They should be the same. The option should only affect LLVM itself and there should be no change to the outputs.
no, i mean the sdk itself
Considering files reported by `equery f llvm'

         shared static
tgz size    43M   262M
tar size   139M   770M

Changes: libLLVM*.so -> libLLVM*.a
         libclang*.so -> libclang*.a

Comment 7 Deleted

llvm.mono.txt
86.3 KB View Download
llvm.shared.txt
84.8 KB View Download
Some reference: http://llvm.org/docs/CMake.html

"""
BUILD_SHARED_LIBS:BOOL
Flag indicating if each LLVM component (e.g. Support) is built as a shared library (ON) or as a static library (OFF). Its default value is OFF. On Windows, shared libraries may be used when building with MinGW, including mingw-w64, but not when building with the Microsoft toolchain.

Note
BUILD_SHARED_LIBS is only recommended for use by LLVM developers. If you want to build LLVM as a shared library, you should use the LLVM_BUILD_LLVM_DYLIB option.
"""

So probably it's pretty safe, if the size difference is tolerable?
Another possibility is to build a shared monolithic libLLVM.so and have most executables dynamically linked to it:

-DLLVM_BUILD_LLVM_DYLIB=ON
-DLLVM_LINK_LLVM_DYLIB=ON

The tar size would be 355M; However, the load time is also longer (23.5% longer to configure coreutils).
Attached is the file list for the DYLIB flavor.

For what it worth, Android seems to take the DYLIB approach according to the file lists and sizes.
llvm.dylib.txt
86.4 KB View Download
I like fast build times.

How much do we care about the sdk size? 
What is your preference Vapier?
If considering gcc for all architectures, it's much larger:

172M    gcc.aarch64.tar
56M     gcc.armv6j.tar
94M     gcc.armv7a.tar
108M    gcc.x86_64.tar
Total: 430M

compressed size

49M     gcc.aarch64.tar.gz
18M     gcc.armv6j.tar.gz
30M     gcc.armv7a.tar.gz
35M     gcc.x86_64.tar.gz
Total: 132M

Not to mention that llvm covers many functions in binutils:

30M     binutils.aarch64.tar
28M     binutils.armv6j.tar
28M     binutils.armv7a.tar
32M     binutils.x86_64.tar
Total: 118M

8.6M    binutils.aarch64.tar.gz
8.4M    binutils.armv6j.tar.gz
8.4M    binutils.armv7a.tar.gz
8.8M    binutils.x86_64.tar.gz
Total: 34.2M
there is some advantage about building the same was as android referred in #10.

@vapier,

what was your concern about size and can we move ahead with either fully archived or with #11? 
have you guys tried forcing of now bindings ?  what if you build llvm/clang with -Wl,-z,nonow ?

Comment 16 by lloz...@google.com, Nov 19 2016

that is an interesting idea but I prefer to stay with one of the default ways of building. 
If we are going to report issues with compile times, we better stay with one of the supported ways of building LLVM.
If size is a real concern (why?), then lets go with the way android builds.
if it wasn't a supported build mode, then it wouldn't have a flag in the first place.  further, you're talking about changing from what Gentoo uses all the time and is supported there to something else.  so that argument doesn't quite hold water.

the SDK is already a bit ridiculously large at 1.6GB *compressed*.  uncompressed, it's 4.1GB, and that's not even on-disk usage.  we should be trying to shrink it, not increasing it by like 20%.

yes, we all want things to be fast all the time.  but making a larger SDK adds overhead in places too -- the sdk bot needs to compress & upload it, and all the other bots need to pull that down & decompress & unpack it.  it's a one-time cost, but that cost is often paid once per-build for bots that recreate their chroots.

if the issue boilds down entirely to symbol resolution overhead, and we're forcing bind now resolution on it, then let's try lazy resolution and see what happens.  seems fairly trivial to test out.
Is there a flag for building LLVM with -Wl,-z,-nonow? 

As far as I can tell from http://llvm.org/docs/CMake.html, I can only see 3 ways of building controlled by these:

LLVM_BUILD_LLVM_DYLIB, LLVM_LINK_LLVM_DYLIB, BUILD_SHARED_LIBS
(from here: http://llvm.org/docs/CMake.html)

What Gentoo is doing seems wrong to me (probably an oversight?). Using ALL shared libraries is supposed to only be used by LLVM developers. not in production environments.

Your lazy resolution idea is pretty nice, but if we want to report build times issues to LLVM, I prefer to use one of their supported ways of building it so that they can reproduce our reported issues. 


-Wl,-z,now is passed in ldwrapper.hardened. I don't know why but removing it makes clang even slower.
ah! so, my argument does not hold since -z,now is added by our wrapper.

I dont understand why it becomes slower.. most of the symbols being resolved in lazy mode? 

not sure, we want to spend a lot of time understanding that. Ok to go with solution at #11 which improves build time and uses less space than the fully archived solution?

Comment 21 by lloz...@google.com, Nov 23 2016

this has been open for far too long. 
Can we close it? 

Ok to go with solution at #11 which is what is used by Android?
Labels: M-57
Project Member

Comment 23 by bugdroid1@chromium.org, Dec 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/04ed5c284fa5a46b20e7bfacacc0e50499b06f74

commit 04ed5c284fa5a46b20e7bfacacc0e50499b06f74
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Tue Nov 01 23:00:16 2016

llvm: build a single libLLVM.so

Building every llvm modules into individual shared objects dramatically
increase it's load time. For example, it's 2.5x faster when configuring
coreutils with a single libLLVM.so.

TEST=cbuildbot --hwtest falco-release veyron-minnie-release
BUG= chromium:661019 

Change-Id: Ic277c123315dfdd3c95c0f13094df10a227b91ce
Reviewed-on: https://chromium-review.googlesource.com/406467
Commit-Ready: Ting-Yuan Huang <laszio@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Luis Lozano <llozano@chromium.org>

[rename] https://crrev.com/04ed5c284fa5a46b20e7bfacacc0e50499b06f74/sys-devel/llvm/llvm-3.9_pre265926-r20.ebuild

Project Member

Comment 24 by bugdroid1@chromium.org, Jan 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/b8d4040c8c935ba4e1f3c90adc2be76efe50664a

commit b8d4040c8c935ba4e1f3c90adc2be76efe50664a
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Tue Nov 01 23:00:16 2016

Re-land "llvm: build a single libLLVM.so"

Re-land 04ed5c284fa5a46b20e7bfacacc0e50499b06f74 since the goma
dependency problem is fixed.

TEST=cbuildbot --hwtest falco-release veyron-minnie-release
BUG= chromium:661019 

Change-Id: I1f486a86afa394fd8a5776fd3db60de1f040b8ec
Reviewed-on: https://chromium-review.googlesource.com/425868
Commit-Ready: Yunlian Jiang <yunlian@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Yunlian Jiang <yunlian@chromium.org>

[rename] https://crrev.com/b8d4040c8c935ba4e1f3c90adc2be76efe50664a/sys-devel/llvm/llvm-3.9_pre265926-r21.ebuild

Project Member

Comment 25 by bugdroid1@chromium.org, Feb 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/08b7401985800bfb22e574648851ac532d8ec096

commit 08b7401985800bfb22e574648851ac532d8ec096
Author: Ting-Yuan Huang <laszio@chromium.org>
Date: Thu Feb 02 20:33:14 2017

Re-land "llvm: build a single libLLVM.so"

Re-land 04ed5c284fa5a46b20e7bfacacc0e50499b06f74 since
1. the goma dependency problem (libffi) is fixed and
2. chell-chrome-pfq no longer depends on a different version of
   libLLVM.so.

TEST=cbuildbot --hwtest chell-release veyron_minnie-release elm-release
BUG= chromium:661019 

Change-Id: Iab4d28cda7dda3d5a42f322da9511da737c26981
Reviewed-on: https://chromium-review.googlesource.com/435472
Commit-Ready: Ting-Yuan Huang <laszio@chromium.org>
Tested-by: Ting-Yuan Huang <laszio@chromium.org>
Reviewed-by: Shinya Kawanaka <shinyak@chromium.org>
Reviewed-by: Takuto Ikuta <tikuta@chromium.org>
Reviewed-by: Fumitoshi Ukai <ukai@chromium.org>
Reviewed-by: Yoshisato Yanagisawa <yyanagisawa@chromium.org>
Reviewed-by: Yunlian Jiang <yunlian@chromium.org>

[rename] https://crrev.com/08b7401985800bfb22e574648851ac532d8ec096/sys-devel/llvm/llvm-4.0_pre285905-r3.ebuild

Status: Fixed (was: Untriaged)
These are time needed to build chrome on a 48-core machine.
The clock time speedup to gcc / clang_multiple_libs is 26.6% / 17.8%
The CPU (user + sys) time speed up to is 30.8% / 27.8%

GCC                 CLANG_SHLIB                CLANG_DYLIB
real  59m6.884s     real         55m1.628s     real  46m41.878s
user  1725m55.205s  user         1790m16.563s  user  1381m26.155s
sys   236m10.588s   sys          127m27.318s   sys   119m3.234s

real  59m35.396s    real         55m23.133s    real  47m36.645s
user  1724m27.779s  user         1789m34.014s  user  1381m59.670s
sys   230m36.910s   sys          128m19.207s   sys   119m56.977s

real  59m4.086s                                real  46m41.123s
user  1725m24.401s                             user  1383m4.388s
sys   234m45.593s                              sys   118m53.745s

real  58m59.861s                               real  47m7.542s
user  1723m2.621s                              user  1382m26.704s
sys   228m44.352s                              sys   119m32.277s

real  60m22.226s                               real  47m1.475s
user  1724m35.345s                             user  1383m46.556s
sys   235m25.407s                              sys   119m22.441s

Owner: laszio@chromium.org
Numbers on arm:

gcc-4.9.2
real    54m9.653s
user    1651m40.682s
sys     210m23.819s

clang-4.0, -DLLVM_BUILD_LLVM_DYLIB=ON, -DLLVM_LINK_LLVM_DYLIB=ON
real    42m27.895s
user    1258m8.988s
sys     106m19.619s

The BuildPackages stage seems to be benefited even more:

squawks-R58-9247: 2h59m37s (chrome by gcc, everything else by clang/shared)
squawks-R58-9255: 2h19m48s (chrome by gcc, everything else by clang/dylib)
squawks-R58-9281: 2h5m48s  (everything by clang/dylib)

So we saved nearly 55 mins / 42.8% in the gcc -> clang migration :)


Luis reminded me that the wording may look confusing. Let's make it super clear:

Speedup = Latency_old / Latency_new = 2h59m37s / 2h5m48s = 1.4461889425657541
according to https://en.wikipedia.org/wiki/Speedup

The time reduced is 29.96% of the original build time.

Sign in to add a comment