Issue metadata
Sign in to add a comment
|
clang: kernel build fails with "'stdarg.h' file not found" |
||||||||||||||||||||||
Issue description
USE="-cros-debug" emerge-scarlet --verbose chromeos-kernel-4_4
...
aarch64-cros-linux-gnu-clang -B/usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/binutils-bin/2.27.0 -Wp,-MD,kernel/.bounds.s.d -nostdinc -isystem /usr/lib64/clang/6.0.0/include -I/mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/include -Iarch/arm64/include/generated/uapi -Iarch/arm64/include/generated -I/mnt/host/source/src/third_party/kernel/v4.4/include -Iinclude -I/mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/include/uapi -Iarch/arm64/include/generated/uapi -I/mnt/host/source/src/third_party/kernel/v4.4/include/uapi -Iinclude/generated/uapi -include /mnt/host/source/src/third_party/kernel/v4.4/include/linux/kconfig.h -I/mnt/host/source/src/third_party/kernel/v4.4/. -I. -D__KERNEL__ -mlittle-endian -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mgeneral-regs-only -fno-pic -Wno-asm-operand-widths -Wno-maybe-uninitialized -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -Oz --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector-strong -target aarch64-cros-linux-gnu -gcc-toolchain /usr/x86_64-pc-linux-gnu/aarch64-cros-linux-gnu/binutils-bin -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-duplicate-decl-specifier -Wno-tautological-compare -mno-global-merge -no-integrated-as -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -Werror -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -fprofile-sample-use=/build/scarlet/tmp/portage/sys-kernel/chromeos-kernel-4_4-9999/work/chromeos-kernel-4_4-R67-10452.11-1521453969.gcov -fdebug-info-for-profiling -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(bounds)" -D"KBUILD_MODNAME=KBUILD_STR(bounds)" -fverbose-asm -S -o kernel/bounds.s /mnt/host/source/src/third_party/kernel/v4.4/kernel/bounds.c
In file included from /mnt/host/source/src/third_party/kernel/v4.4/kernel/bounds.c:9:
In file included from /mnt/host/source/src/third_party/kernel/v4.4/include/linux/page-flags.h:9:
In file included from /mnt/host/source/src/third_party/kernel/v4.4/include/linux/bug.h:4:
In file included from /mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/include/asm/bug.h:67:
In file included from /mnt/host/source/src/third_party/kernel/v4.4/include/asm-generic/bug.h:13:
/mnt/host/source/src/third_party/kernel/v4.4/include/linux/kernel.h:5:10: fatal error: 'stdarg.h' file not found
#include <stdarg.h>
^~~~~~~~~~
1 error generated.
This happened repeatedly, and then when I switched to GCC and back, it no longer is reproducible.
Mathias tells me this happened to him before, and he was just told to clobber some cached directories. That doesn't seem like a good long-term solution.
,
Mar 22 2018
Hmm, actually it's still reproducible. Don't know what I was saying. Maybe I was still building with USE=-clang when the build "succeeded."
,
Mar 22 2018
I think it is a portage issue, not a toolchain one. Can you remove /build/$BOARD/var/cache/portage/sys-kernel and /build/$BOARD/ tmp/portage/sys-kernel/ directories and try?
,
Mar 22 2018
What makes you say a portage issue? Do you just mean general build caching? Kernel builds do quite a bit of smarts there, so maybe we caused something. Removing /build/$BOARD/var/cache/portage/sys-kernel alone seems to have been sufficient.
,
Mar 22 2018
I suspect that the kernel specific build smarts interaction with portage in CrOS that are causing the problem here.
,
Mar 23 2018
It's certainly plausible that the Makefile caching stuff that I picked back to the 4.4 kernel is somehow affecting things, but IMHO it probably isn't. The cache takes into account the string used to invoke the compiler, so it should handle things OK. I'd guess instead that this has to do with the kernel's innate ability to detect that it ought to do a clean build when you change compilers. As far as I know the kernel doesn't try to do this at all--it relies on you to know that you should clean things when you change compilers. Probably the easiest way to solve this is for portage to know that it shouldn't allow incremental builds if USE flags change. That seems sane.
,
Mar 23 2018
Do any USE flags change if we're just upgrading clang as part of a 'repo sync'?
,
Mar 23 2018
USE="clang" flag was made default a few weeks back. If your last sync/build predates that, it could very well be the case.
,
Mar 23 2018
Oh, wait! You're not saying that you changed from gcc to clang, but from one version of clang to another. Yeah, that's probably the makefile cache stuff. :( We'd either need to revert that (and suffer slower incremental builds) or pick <https://patchwork.kernel.org/patch/10277791/>. That fix will just make the error message clearer. Upstream won't land that patch, since they're ripping out the Makefile cache in 4.17. --- Sorry for misreading earlier. :(
,
Mar 23 2018
I had everything below the '---' typed earlier but didn't hit send. I'm not actually 100% sure if I changed clang versions. I also may or may not have changed USE flags, but I think not. It seems like we have a good guess that <X> (below) is the upgraded clang version? --- The relevant kernel here (4.4) was entirely using clang for a while now, no? And anyway, the point is if we know a problem like this happens every time <X> happens, it seems like we should try to resolve it. As of now, I don't think <X> necessarily includes a USE flag change; what makes you think that? Does anyone have a good idea of what <X> is? It sounds like this has happened before. I can try to dig a little more if no one has any specific knowledge.
,
Mar 23 2018
Yeah, I updated the clang version a couple of days back. So that must be it.
,
Mar 23 2018
In theory the next time it happens we can even look at the Makefile cache and confirm the version changed, but it seems pretty sure. The error message about stdarg is even the same one as from my patch. What do you prefer to do? The incremental build speedup from the cache in Chrome OS was really significant. We could pick my patch which would just say "hey you, try make clean", or we could do something else in the ebuild to notice that clang updated and do a make clean for us. Interestingly enough: it might be sorta wise for _something_ in the system to know that we shouldn't be doing incremental builds when the compiler version changed. It seems like asking for trouble.
,
Mar 23 2018
I encountered this error multiple times, typically when switching back between the current CrOS llvm version and llvm-next. So yes, most likely the recent update of the clang version 'caused' this.
,
Mar 23 2018
Ugh. I just tried, and I don't think my patch caught this clang upgrade. :( I think the version number didn't rev.
,
Mar 23 2018
I'm not sure what a great solution for this is then. Just revert the compiler cache stuff in 4.4? IIUC, we don't even have it in 4.14, and it's going to be pulled out upstream. If I get a chance, maybe I'll do some testing with your cache reverted and see if it's that annoying to me.
,
Mar 23 2018
fwiw, the kernel eclass is written such that, if there is something causing a weird failure like this, it'll pessimistically nuke the cache so the next run passes. this helps bots & devs automatically recover at the expense of a periodic one-off weirdness.
,
Mar 23 2018
Removing the cache would be significantly less painful if clang was invoked directly and not through the python wrapper.
manojgupta@: I wonder if we strictly need the ${arch}-cros-linux-gnu-clang wrapper for kernel builds. The kernel can be invoked with CC=clang and determines the target triple based on the kernel configuration (doesn't currently work on v4.4, but does for v4.14, shouldn't be more than a couple of patches to backport). In case of a kernel build does the wrapper anything else useful besides determining the target triple?
,
Mar 23 2018
It is definitely possible to do without the wrapper as long as kernel build can provide all the options to clang e.g. target triple, sysroot, binutils path etc. But wrapper has a few useful features that we (toolchain team) use often. In particular, ability to bisect object files has been extremely helpful in finding a bad/miscompiled file. I am planning to add more features to wrapper; ability to pick a different version of clang, for instance (once I add support for installing multiple clang versions in chroot).
,
Mar 24 2018
> fwiw, the kernel eclass is written such that, if there is something causing a weird failure like this, it'll pessimistically nuke the cache so the next run passes. Really? I remember seeing log notes that say something like "if something went wrong, try nuking this directory", but I don't remember it autocleaning. Also, when I saw this failure, that was after several failed build_packages in a row; I'd think one of those would have nuked it already, if that was working as you described?
,
Mar 24 2018
ah looks like we changed it so it only cleans up on bots now: https://chromium-review.googlesource.com/37538
,
Mar 26 2018
> and it's going to be pulled out upstream ...and replaced with something better. They're now going to do all of the compiler tests at config time so they don't need to run them at every build. --- > we don't even have it in 4.14 It was on my list to pick it soon, ideally before we switch to clang. The clang builds were just too slow otherwise --- Could we add something into the "cros-workon.eclass" to keep track of the compiler version when CROS_WORKON_INCREMENTAL_BUILD is set? Then it can nuke the cache if it detects the change? IMHO even if the Makefile cache didn't die in such a strange way, it's not advisable to do an incremental build across a compiler upgrade. It might work a large percentage of the time, but I could sorta imagine weird issues cropping up...
,
Mar 29 2018
Regarding wrapper overhead, I can try implementing it in GO in Q2.
,
Mar 29 2018
fingerprinting the active toolchain and blowing the cache away pessimistically sounds fine
,
Nov 16
I just ran into this issue again. Blowing away the cache dir solved the issue (c#4). |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by briannorris@chromium.org
, Mar 22 2018