New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 767073 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

clang: kernel "no-op" compile time is much worse than gcc

Project Member Reported by diand...@chromium.org, Sep 20 2017

Issue description

From discussion in crosreview.com/673005:

A "no-op" compile of the kernel (AKA re-running an emerge where you didn't change the source code at all) is much higher for gcc than for clang.

For gcc it was measured at about 45 seconds (already quite slow)
For clang it was measured at about 1 minute.


My exact command line that I used was:

  time FEATURES="-buildpkg" USE="kmemleak lockdebug kgdb vtconsole -firmware_install" emerge-${BOARD} --nodeps chromeos-kernel-4_4

I ran that command once and ignored the time, then ran it again and looked at the time.

---

Possibly this is due to the sheer number of calls that the kernel Makefile makes to clang.  Each call takes 100 ms:

$ time aarch64-cros-linux-gnu-clang --version > /dev/null
real    0m0.101s
user    0m0.072s
sys     0m0.028s

$ time aarch64-cros-linux-gnu-gcc --version > /dev/null
real    0m0.066s
user    0m0.052s
sys     0m0.008s

$ time sh --version > /dev/null
real    0m0.004s
user    0m0.000s
sys     0m0.000s

---

I'll try to dig a little bit to see if this is really correct.

---

Note that the "no-op" build time is added to every iteration of modify / deploy / test.  A 15 second increase in this time will impact developer productivity significantly.
 
Cc: manojgupta@chromium.org
I did a quick test and found that a "no-op" build spends ~24 seconds calling "CC".  I didn't check how much time was spent calling "LD" or any other tools.

I tested by creating a file in "usr/local/bin/aarch64-cros-linux-gnu-clang" that contained:

  #!/bin/bash
  echo "args: " "$@" >> /build/kevin/var/cache/portage/sys-kernel/chromeos-kernel-4_4/clang.log
  /b/tip/tmp/time -o /build/kevin/var/cache/portage/sys-kernel/chromeos-kernel-4_4/clang.log -a /usr/bin/aarch64-cros-linux-gnu-clang "$@"

...then I ran a no-op kernel build.  The results are attached.  I ran a bit of python to analyze:

f = open("/b/tip/tmp/clang.log", "r")
elapsed = 0.0
for line in f.read().splitlines():
  if "elapsed" in line:
    word = line.split()[2]
    elapsed += float(word[2:-7])
print elapsed


clang.log
173 KB View Download
Cc: laszio@chromium.org
-- Copying from the CL for the sake of completness
llozano@ said:
But, does this point to a problem in the make pipeline? isn't there a 'configure' step? why doing a 'configure' is part of the single change, recompile process?

dianders@ replied:
I'll dig now and comment on the bug to see.  I remember looking this type of thing ages ago and I even remember having it in my mind to implement some sort of cache of some of the type of stuff that the kernel queries about (I think u-boot does this), but I never got around to it.
what happens with the time if you do the "no-op" build time several times? 
I am asking because we use ccache. So, ideally on the second iteration compiler should not even be called and there should not be any time difference wrt to GCC.


@5: it makes no difference

Comment 7 by mka@chromium.org, Sep 20 2017

Even without generating a single object file the compiler is invoked plenty of times to determine which options are supported. That's what all the calls like "aarch64-cros-linux-gnu-clang -B/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/binutils-bin/2.27 -Werror -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -fno-PIE -c -x c /dev/null -o .1098.tmp" are about.
@7 yes, but I thought ccache would cache those results.. which should change the time for the second iteration. 

Anyway, we can look at the compile time issues (laszio@ looked and improved them before). But it may take some time and it would only take you to close to what the GCC time is. 

the better solution would be to have some cache for this 'configure' step as dianders@ mentions. That would drastically reduce the 45 seconds time.

Labels: Build-Toolchain

Comment 10 by mka@chromium.org, Sep 20 2017

The kernel 'configure' step only generates the kernel configuration file(s) and doesn't involve the target compiler.

The compiler flags are determined during the build step and are currently only held in Makefile variables, which is why ccache doesn't help here.
About ccache:

OK, so ccache is certainly getting invoked.  The ccache should be at "/var/cache/chromeos-cache/distfiles/ccache".  You can look at the stats with:

# CCACHE_DIR=/var/cache/chromeos-cache/distfiles/ccache/ ccache -s

...and just typing "aarch64-cros-linux-gnu-clang" on the command line hits the cache (you can see an uptick in "no input file").

---

...boy, when you strace the invocation you can see why it takes so long!  It's doing A LOT of stuff...

Actually, it looks like "/usr/bin/aarch64-cros-linux-gnu-clang" is some type of thin wrapper.  Somewhere in there it invokes "/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang" which points to "sysroot_wrapper.hardened".  ...but even that somehow still calls ccache since invoking that directly still makes ccache hit.


Whoa: is it seriously doing this?

$ strace -s 128 /usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang --version
...
...
...
execve("/usr/bin/python2", ["python2", "/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/gcc-bin/4.9.x/aarch64-cros-linux-gnu-clang", "--version"], [/* 45 vars */]) = 0


Oh, that is a python script!  Yeah, that's part of our problem.  Invoking python has a massive amount of overhead.

---

So, as far as I can tell:

* clang doesn't exactly have low overhead.  In the chroot just running "time gcc --version  > /dev/null" gives ~10ms and "time clang --version  > /dev/null" gives 40ms.

* If anything ccache actually seems to increase our overhead in the "no files to process".  :(  

* We've got at least two levels of wrappers involved here, one of which invokes python (!).

---

I'm still trying to figure out what to do about all this.  

the python wrapper is expected, for better or worse.  if it helps, the previous wrapper was written in bash and was much slower ;).

if you're running commands by hand, you'll want to make sure to set the SYSROOT env var first.  if you don't, the wrapper will log warnings on each invocation which increases the overhead.
I tried with -c hello_world.c, without ccache, and the result is:

gcc + wrapper:  0m0.079s
gcc:            0m0.034s
wrapper:        0m0.045s

clang + wrapper:0m0.109s
clang:          0m0.077s
wrapper:        0m0.032s

I believe that most of the difference between gcc and llvm comes from dynamic linker; clang is linked against libLLVM-5.0svn.so (which has ~50MB text) and spends a considerable amount of time in ld.so in the beginning.

By the way, there is a built-in resource monitoring script in the wrapper. It can be enabled by an environment variable:

$ GETRUSAGE=/tmp/resource_usage.txt aarch64-cros-linux-gnu-{clang,gcc} ...

It will log the the elapsed time, user time, sys time, peak memory usage, full command line to invoke the real compiler, in the file. The above numbers were measured with that script and --sysroot was specified.

I can spend more time on this after finishing implementing the cwp + afdo things, which will likely take me one or two weeks.
Some more tests about ccache:

$ aarch64-cros-linux-gnu-{clang,gcc} -x c /dev/null ... 
--> ccache miss (0.07s / 0.035s for clang / gcc)

$ aarch64-cros-linux-gnu-{clang,gcc} -x c empty.c ... 
--> ccache hit (0.003s for both)

So I guess it would be very interesting to make ccache work as expected :)
so, if you modify the compiler commands in the makefile so that they are ccache-able, then that is the cache dianders referred to in #4 (and it does not matter which compiler we are using). 
Also, don't use --version to measure anything. --version is not ccache-able. 
cool! 
what is the time of a NO-OP build with this?
@17: I believe I was at between 35 and 40 seconds, so it shaved over 20 seconds off a build.  Not too shabby.

...the issue is that the patches really need some review (and testing, of course).  I have no idea if upstream is going to say "hooray!" or if they're going to say "go to hell".  :-P  I don't think we'd want to carry these patches if upstream doesn't want them...
Just a heads up that I've posted a v2 of the series and it seems to have a pretty good amount of upstream support.  The two patches are here (can be found on patchwork.kernel.org/patch/...)

9985795 New          [v2,1/2] kbuild: Add a cache for generated variables
9985791 New          [v2,2/2] kbuild: Cache a few more calls to the compiler

I updated the gerrit patches too.

I'm continuing to use my patches locally and I'll plan to backport them to our tree after they land upstream unless someone is antsy for me to land them earlier.


I forked the python wrapper gripes to bug #773138
Labels: -Build-Toolchain
Owner: diand...@chromium.org
Status: Assigned (was: Untriaged)
I think what is remaining of this bug is on dianders hands
Components: OS>Kernel
Summary: clang: kernel "no-op" compile time is much worse than gcc (was: clang: "no-op" compile time is much worse than gcc)
@20: yup, I'll try to post patches today.  v4 landed upstream over 24 hours ago and (this time) didn't get yanked right away.  ...so I think we should be good now.

I'm planning on just posting patches to kernel 4.4 and (eventually) kernel 4.14.  I might do 3.18 too if it's easy.  If anyone else wants to port them to other kernels they can, but I don't build them often enough myself to justify the work.
OK, posted here.  Looking for reviews now.  There were a few extra optimizations that landed upstream, namely:
  2c1f4f125159 kbuild: re-order the code to not parse unnecessary variables
  1b8142748fa4 kbuild: do not call cc-option before KBUILD_CFLAGS initialization

...but those are slightly trickier to backport and they are not critical, so I'm skipping them.  Maybe in 4.14 we'll take them?

Here's what I picked for kernel 4.4.  Hoping for a +2 now, finally.  :)

* https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/713459
  FROMGIT: kbuild: move "_all" target out of $(KBUILD_SRC) conditional

* https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/713458
  FROMGIT: kbuild: add forward declaration of default target to

* https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/679445
  FROMGIT: kbuild: Add a cache for generated variables

* https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/683228
  FROMGIT: kbuild: Cache a few more calls to the compiler

* https://chromium-review.googlesource.com/#/c/chromiumos/third_party/kernel/+/729356
  FROMGIT: kbuild: shrink .cache.mk when it exceeds 1000 lines

Cc: sjg@chromium.org
Project Member

Comment 24 by bugdroid1@chromium.org, Oct 20 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ba947f056846b16152cf66b959ad59d0e146cd2f

commit ba947f056846b16152cf66b959ad59d0e146cd2f
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Oct 20 08:51:11 2017

FROMGIT: kbuild: Add a cache for generated variables

While timing a "no-op" build of the kernel (incrementally building the
kernel even though nothing changed) in the Chrome OS build system I
found that it was much slower than I expected.

Digging into things a bit, I found that quite a bit of the time was
spent invoking the C compiler even though we weren't actually building
anything.  Currently in the Chrome OS build system the C compiler is
called through a number of wrappers (one of which is written in
python!) and can take upwards of 100 ms to invoke even if we're not
doing anything difficult, so these invocations of the compiler were
taking a lot of time.  Worse the invocations couldn't seem to take
advantage of the multiple cores on my system.

Certainly it seems like we could make the compiler invocations in the
Chrome OS build system faster, but only to a point.  Inherently
invoking a program as big as a C compiler is a fairly heavy
operation.  Thus even if we can speed the compiler calls it made sense
to track down what was happening.

It turned out that all the compiler invocations were coming from
usages like this in the kernel's Makefile:

KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,)

Due to the way cc-option and similar statements work the above
contains an implicit call to the C compiler.  ...and due to the fact
that we're storing the result in KBUILD_CFLAGS, a simply expanded
variable, the call will happen every time the Makefile is parsed, even
if there are no users of KBUILD_CFLAGS.

Rather than redoing this computation every time, it makes a lot of
sense to cache the result of all of the Makefile's compiler calls just
like we do when we compile a ".c" file to a ".o" file.  Conceptually
this is quite a simple idea.  ...and since the calls to invoke the
compiler and similar tools are centrally located in the Kbuild.include
file this doesn't even need to be super invasive.

Implementing the cache in a simple-to-use and efficient way is not
quite as simple as it first sounds, though.  To get maximum speed we
really want the cache in a format that make can natively understand
and make doesn't really have an ability to load/parse files. ...but
make _can_ import other Makefiles, so the solution is to store the
cache in Makefile format.  This requires coming up with a valid/unique
Makefile variable name for each value to be cached, but that's
solvable with some cleverness.

After this change, we'll automatically create a ".cache.mk" file that
will contain our cached variables.  We'll load this on each invocation
of make and will avoid recomputing anything that's already in our
cache.  The cache is stored in a format that it shouldn't need any
invalidation since anything that might change should affect the "key"
and any old cached value won't be used.

BUG= chromium:767073 
TEST=Time builds, check caches, echo variables before and after

Conflicts:
	Makefile
	scripts/Kbuild.include
...due to the fact that we just have slightly older Makefiles.

Change-Id: Id009b296b6123225178b31d17b7bc259f99ed258
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
 commit 732739e83e0473115ad8dfcbf9d2693d1c4770e9)
Reviewed-on: https://chromium-review.googlesource.com/679445
Reviewed-by: Simon Glass <sjg@chromium.org>

[modify] https://crrev.com/ba947f056846b16152cf66b959ad59d0e146cd2f/Makefile
[modify] https://crrev.com/ba947f056846b16152cf66b959ad59d0e146cd2f/scripts/Kbuild.include

Project Member

Comment 25 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8f2e10de310d8b05ed068a2dcd6528ba5ac52821

commit 8f2e10de310d8b05ed068a2dcd6528ba5ac52821
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Oct 20 08:51:12 2017

FROMGIT: kbuild: Cache a few more calls to the compiler

These are a few stragglers that I left out of the original patch to
cache calls to the C compiler ("kbuild: Add a cache for generated
variables") because they bleed out into the main Makefile and thus
uglify things a little bit.  The idea is the same here, though.

BUG= chromium:767073 
TEST=Time builds, check caches, echo variables before and after

Change-Id: I7f8eb943179d55e182fa9ac881e7c520706b8b94
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
 commit 1061fb916c35a4eaa10a2be0294e7f6441228b8a)
Reviewed-on: https://chromium-review.googlesource.com/683228
Reviewed-by: Simon Glass <sjg@chromium.org>

[modify] https://crrev.com/8f2e10de310d8b05ed068a2dcd6528ba5ac52821/Makefile

Project Member

Comment 26 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/98901fd8743806d6cf5814d7c0dc6fa8c74619b2

commit 98901fd8743806d6cf5814d7c0dc6fa8c74619b2
Author: Masahiro Yamada <yamada.masahiro@socionext.com>
Date: Fri Oct 20 08:51:10 2017

FROMGIT: kbuild: add forward declaration of default target to Makefile.asm-generic

$(kbuild-file) and Kbuild.include are included before the default
target "all".

We will add a target into Kbuild.include.  In advance, add a forward
declaration of the default target.

BUG= chromium:767073 
TEST=Time builds, check caches, echo variables before and after

Change-Id: If78092b160fef48629f25086e6d3ffda260549f7
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
 commit fa9acf703d6464fd4b214fd4739cd72ddaf9e612)
Reviewed-on: https://chromium-review.googlesource.com/713458
Reviewed-by: Simon Glass <sjg@chromium.org>

[modify] https://crrev.com/98901fd8743806d6cf5814d7c0dc6fa8c74619b2/scripts/Makefile.asm-generic

Project Member

Comment 27 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f0c096bfae6696110d3c508fcd2ea5baf880165a

commit f0c096bfae6696110d3c508fcd2ea5baf880165a
Author: Masahiro Yamada <yamada.masahiro@socionext.com>
Date: Fri Oct 20 08:51:09 2017

FROMGIT: kbuild: move "_all" target out of $(KBUILD_SRC) conditional

The first "_all" occurrence around line 120 is only visible when
KBUILD_SRC is unset.

If O=... is specified, the working directory is relocated, then the
only second occurrence around line 193 is visible, that is not set
to PHONY.

Move the first one to an always visible place.  This clarifies "_all"
is our default target and it is always set to PHONY.

BUG= chromium:767073 
TEST=Time builds, check caches, echo variables before and after

Change-Id: I0eb06af5ba87acb50e7d1bb3ae4367db5ebca7dd
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
 commit ba634eceb535d95e87ef09caae7814b3687c6036)
Reviewed-on: https://chromium-review.googlesource.com/713459
Reviewed-by: Simon Glass <sjg@chromium.org>

[modify] https://crrev.com/f0c096bfae6696110d3c508fcd2ea5baf880165a/Makefile

Project Member

Comment 28 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/1229d65826b91f0d81e81c68f4a3dd0151944a62

commit 1229d65826b91f0d81e81c68f4a3dd0151944a62
Author: Masahiro Yamada <yamada.masahiro@socionext.com>
Date: Fri Oct 20 08:51:14 2017

FROMGIT: kbuild: shrink .cache.mk when it exceeds 1000 lines

The cache files are only cleaned away by "make clean".  If you continue
incremental builds, the cache files will grow up little by little.
It is not a big deal in general use cases because compiler flags do not
change quite often.

However, if you do build-test for various architectures, compilers, and
kernel configurations, you will end up with huge cache files soon.

When the cache file exceeds 1000 lines, shrink it down to 500 by "tail".
The Least Recently Added lines are cut. (not Least Recently Used)
I hope it will work well enough.

BUG= chromium:767073 
TEST=Time builds, check caches, echo variables before and after

Change-Id: I6d964e4c49cdc97ddc9eaa902f6726d6f4006b23
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from git.kernel.org masahiroy/linux-kbuild kbuild
 commit 681211724576488b5efa6c2677645d0e6676af16)
Reviewed-on: https://chromium-review.googlesource.com/729356
Reviewed-by: Simon Glass <sjg@chromium.org>

[modify] https://crrev.com/1229d65826b91f0d81e81c68f4a3dd0151944a62/scripts/Kbuild.include

Cc: djkurtz@chromium.org
I'm going to call this "Fixed", but if anyone working on kernel 4.12 is interested in picking these patches to speed up their development I'm happy to help pick them there.
Status: Fixed (was: Assigned)
Labels: M-64

Sign in to add a comment