clang: kernel "no-op" compile time is much worse than gcc |
||||||||||
Issue descriptionFrom discussion in crosreview.com/673005: A "no-op" compile of the kernel (AKA re-running an emerge where you didn't change the source code at all) is much higher for gcc than for clang. For gcc it was measured at about 45 seconds (already quite slow) For clang it was measured at about 1 minute. My exact command line that I used was: time FEATURES="-buildpkg" USE="kmemleak lockdebug kgdb vtconsole -firmware_install" emerge-${BOARD} --nodeps chromeos-kernel-4_4 I ran that command once and ignored the time, then ran it again and looked at the time. --- Possibly this is due to the sheer number of calls that the kernel Makefile makes to clang. Each call takes 100 ms: $ time aarch64-cros-linux-gnu-clang --version > /dev/null real 0m0.101s user 0m0.072s sys 0m0.028s $ time aarch64-cros-linux-gnu-gcc --version > /dev/null real 0m0.066s user 0m0.052s sys 0m0.008s $ time sh --version > /dev/null real 0m0.004s user 0m0.000s sys 0m0.000s --- I'll try to dig a little bit to see if this is really correct. --- Note that the "no-op" build time is added to every iteration of modify / deploy / test. A 15 second increase in this time will impact developer productivity significantly.
,
Sep 20 2017
I did a quick test and found that a "no-op" build spends ~24 seconds calling "CC". I didn't check how much time was spent calling "LD" or any other tools.
I tested by creating a file in "usr/local/bin/aarch64-cros-linux-gnu-clang" that contained:
#!/bin/bash
echo "args: " "$@" >> /build/kevin/var/cache/portage/sys-kernel/chromeos-kernel-4_4/clang.log
/b/tip/tmp/time -o /build/kevin/var/cache/portage/sys-kernel/chromeos-kernel-4_4/clang.log -a /usr/bin/aarch64-cros-linux-gnu-clang "$@"
...then I ran a no-op kernel build. The results are attached. I ran a bit of python to analyze:
f = open("/b/tip/tmp/clang.log", "r")
elapsed = 0.0
for line in f.read().splitlines():
if "elapsed" in line:
word = line.split()[2]
elapsed += float(word[2:-7])
print elapsed
,
Sep 20 2017
,
Sep 20 2017
-- Copying from the CL for the sake of completness llozano@ said: But, does this point to a problem in the make pipeline? isn't there a 'configure' step? why doing a 'configure' is part of the single change, recompile process? dianders@ replied: I'll dig now and comment on the bug to see. I remember looking this type of thing ages ago and I even remember having it in my mind to implement some sort of cache of some of the type of stuff that the kernel queries about (I think u-boot does this), but I never got around to it.
,
Sep 20 2017
what happens with the time if you do the "no-op" build time several times? I am asking because we use ccache. So, ideally on the second iteration compiler should not even be called and there should not be any time difference wrt to GCC.
,
Sep 20 2017
@5: it makes no difference
,
Sep 20 2017
Even without generating a single object file the compiler is invoked plenty of times to determine which options are supported. That's what all the calls like "aarch64-cros-linux-gnu-clang -B/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/binutils-bin/2.27 -Werror -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -fno-PIE -c -x c /dev/null -o .1098.tmp" are about.
,
Sep 20 2017
@7 yes, but I thought ccache would cache those results.. which should change the time for the second iteration. Anyway, we can look at the compile time issues (laszio@ looked and improved them before). But it may take some time and it would only take you to close to what the GCC time is. the better solution would be to have some cache for this 'configure' step as dianders@ mentions. That would drastically reduce the 45 seconds time.
,
Sep 20 2017
,
Sep 20 2017
The kernel 'configure' step only generates the kernel configuration file(s) and doesn't involve the target compiler. The compiler flags are determined during the build step and are currently only held in Makefile variables, which is why ccache doesn't help here.
,
Sep 20 2017
About ccache:
OK, so ccache is certainly getting invoked. The ccache should be at "/var/cache/chromeos-cache/distfiles/ccache". You can look at the stats with:
# CCACHE_DIR=/var/cache/chromeos-cache/distfiles/ccache/ ccache -s
...and just typing "aarch64-cros-linux-gnu-clang" on the command line hits the cache (you can see an uptick in "no input file").
---
...boy, when you strace the invocation you can see why it takes so long! It's doing A LOT of stuff...
Actually, it looks like "/usr/bin/aarch64-cros-linux-gnu-clang" is some type of thin wrapper. Somewhere in there it invokes "/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang" which points to "sysroot_wrapper.hardened". ...but even that somehow still calls ccache since invoking that directly still makes ccache hit.
Whoa: is it seriously doing this?
$ strace -s 128 /usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang --version
...
...
...
execve("/usr/bin/python2", ["python2", "/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang", "--version"], [/* 45 vars */]) = 0
Oh, that is a python script! Yeah, that's part of our problem. Invoking python has a massive amount of overhead.
---
So, as far as I can tell:
* clang doesn't exactly have low overhead. In the chroot just running "time gcc --version > /dev/null" gives ~10ms and "time clang --version > /dev/null" gives 40ms.
* If anything ccache actually seems to increase our overhead in the "no files to process". :(
* We've got at least two levels of wrappers involved here, one of which invokes python (!).
---
I'm still trying to figure out what to do about all this.
,
Sep 20 2017
the python wrapper is expected, for better or worse. if it helps, the previous wrapper was written in bash and was much slower ;). if you're running commands by hand, you'll want to make sure to set the SYSROOT env var first. if you don't, the wrapper will log warnings on each invocation which increases the overhead.
,
Sep 20 2017
I tried with -c hello_world.c, without ccache, and the result is:
gcc + wrapper: 0m0.079s
gcc: 0m0.034s
wrapper: 0m0.045s
clang + wrapper:0m0.109s
clang: 0m0.077s
wrapper: 0m0.032s
I believe that most of the difference between gcc and llvm comes from dynamic linker; clang is linked against libLLVM-5.0svn.so (which has ~50MB text) and spends a considerable amount of time in ld.so in the beginning.
By the way, there is a built-in resource monitoring script in the wrapper. It can be enabled by an environment variable:
$ GETRUSAGE=/tmp/resource_usage.txt aarch64-cros-linux-gnu-{clang,gcc} ...
It will log the the elapsed time, user time, sys time, peak memory usage, full command line to invoke the real compiler, in the file. The above numbers were measured with that script and --sysroot was specified.
I can spend more time on this after finishing implementing the cwp + afdo things, which will likely take me one or two weeks.
,
Sep 21 2017
Some more tests about ccache:
$ aarch64-cros-linux-gnu-{clang,gcc} -x c /dev/null ...
--> ccache miss (0.07s / 0.035s for clang / gcc)
$ aarch64-cros-linux-gnu-{clang,gcc} -x c empty.c ...
--> ccache hit (0.003s for both)
So I guess it would be very interesting to make ccache work as expected :)
,
Sep 21 2017
so, if you modify the compiler commands in the makefile so that they are ccache-able, then that is the cache dianders referred to in #4 (and it does not matter which compiler we are using). Also, don't use --version to measure anything. --version is not ccache-able.
,
Oct 2 2017
BTW: I posted my caching upstream if anyone had any thoughts. * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/679445 * https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/683228/1
,
Oct 3 2017
cool! what is the time of a NO-OP build with this?
,
Oct 3 2017
@17: I believe I was at between 35 and 40 seconds, so it shaved over 20 seconds off a build. Not too shabby. ...the issue is that the patches really need some review (and testing, of course). I have no idea if upstream is going to say "hooray!" or if they're going to say "go to hell". :-P I don't think we'd want to carry these patches if upstream doesn't want them...
,
Oct 10 2017
Just a heads up that I've posted a v2 of the series and it seems to have a pretty good amount of upstream support. The two patches are here (can be found on patchwork.kernel.org/patch/...) 9985795 New [v2,1/2] kbuild: Add a cache for generated variables 9985791 New [v2,2/2] kbuild: Cache a few more calls to the compiler I updated the gerrit patches too. I'm continuing to use my patches locally and I'll plan to backport them to our tree after they land upstream unless someone is antsy for me to land them earlier. I forked the python wrapper gripes to bug #773138
,
Oct 19 2017
I think what is remaining of this bug is on dianders hands
,
Oct 19 2017
@20: yup, I'll try to post patches today. v4 landed upstream over 24 hours ago and (this time) didn't get yanked right away. ...so I think we should be good now. I'm planning on just posting patches to kernel 4.4 and (eventually) kernel 4.14. I might do 3.18 too if it's easy. If anyone else wants to port them to other kernels they can, but I don't build them often enough myself to justify the work.
,
Oct 20 2017
OK, posted here. Looking for reviews now. There were a few extra optimizations that landed upstream, namely: 2c1f4f125159 kbuild: re-order the code to not parse unnecessary variables 1b8142748fa4 kbuild: do not call cc-option before KBUILD_CFLAGS initialization ...but those are slightly trickier to backport and they are not critical, so I'm skipping them. Maybe in 4.14 we'll take them? Here's what I picked for kernel 4.4. Hoping for a +2 now, finally. :) * https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/713459 FROMGIT: kbuild: move "_all" target out of $(KBUILD_SRC) conditional * https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/713458 FROMGIT: kbuild: add forward declaration of default target to * https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/679445 FROMGIT: kbuild: Add a cache for generated variables * https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/683228 FROMGIT: kbuild: Cache a few more calls to the compiler * https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/729356 FROMGIT: kbuild: shrink .cache.mk when it exceeds 1000 lines
,
Oct 20 2017
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ba947f056846b16152cf66b959ad59d0e146cd2f commit ba947f056846b16152cf66b959ad59d0e146cd2f Author: Douglas Anderson <dianders@chromium.org> Date: Fri Oct 20 08:51:11 2017 FROMGIT: kbuild: Add a cache for generated variables While timing a "no-op" build of the kernel (incrementally building the kernel even though nothing changed) in the Chrome OS build system I found that it was much slower than I expected. Digging into things a bit, I found that quite a bit of the time was spent invoking the C compiler even though we weren't actually building anything. Currently in the Chrome OS build system the C compiler is called through a number of wrappers (one of which is written in python!) and can take upwards of 100 ms to invoke even if we're not doing anything difficult, so these invocations of the compiler were taking a lot of time. Worse the invocations couldn't seem to take advantage of the multiple cores on my system. Certainly it seems like we could make the compiler invocations in the Chrome OS build system faster, but only to a point. Inherently invoking a program as big as a C compiler is a fairly heavy operation. Thus even if we can speed the compiler calls it made sense to track down what was happening. It turned out that all the compiler invocations were coming from usages like this in the kernel's Makefile: KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,) Due to the way cc-option and similar statements work the above contains an implicit call to the C compiler. ...and due to the fact that we're storing the result in KBUILD_CFLAGS, a simply expanded variable, the call will happen every time the Makefile is parsed, even if there are no users of KBUILD_CFLAGS. Rather than redoing this computation every time, it makes a lot of sense to cache the result of all of the Makefile's compiler calls just like we do when we compile a ".c" file to a ".o" file. Conceptually this is quite a simple idea. ...and since the calls to invoke the compiler and similar tools are centrally located in the Kbuild.include file this doesn't even need to be super invasive. Implementing the cache in a simple-to-use and efficient way is not quite as simple as it first sounds, though. To get maximum speed we really want the cache in a format that make can natively understand and make doesn't really have an ability to load/parse files. ...but make _can_ import other Makefiles, so the solution is to store the cache in Makefile format. This requires coming up with a valid/unique Makefile variable name for each value to be cached, but that's solvable with some cleverness. After this change, we'll automatically create a ".cache.mk" file that will contain our cached variables. We'll load this on each invocation of make and will avoid recomputing anything that's already in our cache. The cache is stored in a format that it shouldn't need any invalidation since anything that might change should affect the "key" and any old cached value won't be used. BUG= chromium:767073 TEST=Time builds, check caches, echo variables before and after Conflicts: Makefile scripts/Kbuild.include ...due to the fact that we just have slightly older Makefiles. Change-Id: Id009b296b6123225178b31d17b7bc259f99ed258 Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Ingo Molnar <mingo@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> (cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild commit 732739e83e0473115ad8dfcbf9d2693d1c4770e9) Reviewed-on: https://chromium-review.googlesource.com/679445 Reviewed-by: Simon Glass <sjg@chromium.org> [modify] https://crrev.com/ba947f056846b16152cf66b959ad59d0e146cd2f/Makefile [modify] https://crrev.com/ba947f056846b16152cf66b959ad59d0e146cd2f/scripts/Kbuild.include
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8f2e10de310d8b05ed068a2dcd6528ba5ac52821 commit 8f2e10de310d8b05ed068a2dcd6528ba5ac52821 Author: Douglas Anderson <dianders@chromium.org> Date: Fri Oct 20 08:51:12 2017 FROMGIT: kbuild: Cache a few more calls to the compiler These are a few stragglers that I left out of the original patch to cache calls to the C compiler ("kbuild: Add a cache for generated variables") because they bleed out into the main Makefile and thus uglify things a little bit. The idea is the same here, though. BUG= chromium:767073 TEST=Time builds, check caches, echo variables before and after Change-Id: I7f8eb943179d55e182fa9ac881e7c520706b8b94 Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Ingo Molnar <mingo@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> (cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild commit 1061fb916c35a4eaa10a2be0294e7f6441228b8a) Reviewed-on: https://chromium-review.googlesource.com/683228 Reviewed-by: Simon Glass <sjg@chromium.org> [modify] https://crrev.com/8f2e10de310d8b05ed068a2dcd6528ba5ac52821/Makefile
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/98901fd8743806d6cf5814d7c0dc6fa8c74619b2 commit 98901fd8743806d6cf5814d7c0dc6fa8c74619b2 Author: Masahiro Yamada <yamada.masahiro@socionext.com> Date: Fri Oct 20 08:51:10 2017 FROMGIT: kbuild: add forward declaration of default target to Makefile.asm-generic $(kbuild-file) and Kbuild.include are included before the default target "all". We will add a target into Kbuild.include. In advance, add a forward declaration of the default target. BUG= chromium:767073 TEST=Time builds, check caches, echo variables before and after Change-Id: If78092b160fef48629f25086e6d3ffda260549f7 Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild commit fa9acf703d6464fd4b214fd4739cd72ddaf9e612) Reviewed-on: https://chromium-review.googlesource.com/713458 Reviewed-by: Simon Glass <sjg@chromium.org> [modify] https://crrev.com/98901fd8743806d6cf5814d7c0dc6fa8c74619b2/scripts/Makefile.asm-generic
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f0c096bfae6696110d3c508fcd2ea5baf880165a commit f0c096bfae6696110d3c508fcd2ea5baf880165a Author: Masahiro Yamada <yamada.masahiro@socionext.com> Date: Fri Oct 20 08:51:09 2017 FROMGIT: kbuild: move "_all" target out of $(KBUILD_SRC) conditional The first "_all" occurrence around line 120 is only visible when KBUILD_SRC is unset. If O=... is specified, the working directory is relocated, then the only second occurrence around line 193 is visible, that is not set to PHONY. Move the first one to an always visible place. This clarifies "_all" is our default target and it is always set to PHONY. BUG= chromium:767073 TEST=Time builds, check caches, echo variables before and after Change-Id: I0eb06af5ba87acb50e7d1bb3ae4367db5ebca7dd Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild commit ba634eceb535d95e87ef09caae7814b3687c6036) Reviewed-on: https://chromium-review.googlesource.com/713459 Reviewed-by: Simon Glass <sjg@chromium.org> [modify] https://crrev.com/f0c096bfae6696110d3c508fcd2ea5baf880165a/Makefile
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/1229d65826b91f0d81e81c68f4a3dd0151944a62 commit 1229d65826b91f0d81e81c68f4a3dd0151944a62 Author: Masahiro Yamada <yamada.masahiro@socionext.com> Date: Fri Oct 20 08:51:14 2017 FROMGIT: kbuild: shrink .cache.mk when it exceeds 1000 lines The cache files are only cleaned away by "make clean". If you continue incremental builds, the cache files will grow up little by little. It is not a big deal in general use cases because compiler flags do not change quite often. However, if you do build-test for various architectures, compilers, and kernel configurations, you will end up with huge cache files soon. When the cache file exceeds 1000 lines, shrink it down to 500 by "tail". The Least Recently Added lines are cut. (not Least Recently Used) I hope it will work well enough. BUG= chromium:767073 TEST=Time builds, check caches, echo variables before and after Change-Id: I6d964e4c49cdc97ddc9eaa902f6726d6f4006b23 Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild commit 681211724576488b5efa6c2677645d0e6676af16) Reviewed-on: https://chromium-review.googlesource.com/729356 Reviewed-by: Simon Glass <sjg@chromium.org> [modify] https://crrev.com/1229d65826b91f0d81e81c68f4a3dd0151944a62/scripts/Kbuild.include
,
Oct 23 2017
I'm going to call this "Fixed", but if anyone working on kernel 4.12 is interested in picking these patches to speed up their development I'm happy to help pick them there.
,
Oct 23 2017
,
Oct 23 2017
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by manojgupta@chromium.org
, Sep 20 2017