Issue metadata
Sign in to add a comment
|
clang: CONFIG_ARM64_LSE_ATOMICS kernel support |
||||||||||||||||||||||||
Issue description
Upstream commit 7bd99b403405d3 ("arm64: Kconfig: Enable LSE atomics by default") in the Linux kernel enabled CONFIG_ARM64_LSE_ATOMICS by default. This breaks clang builds for arm64 kernels with the default configuration, since clang doesn't support several options specified in the Makefile:
clang-7: error: unknown argument: '-fcall-used-x0'
clang-7: error: unknown argument '-ffixed-x1', did you mean '-ffixed-x18'?
clang-7: error: unknown argument '-ffixed-x2', did you mean '-ffixed-x20'?
clang-7: error: unknown argument: '-ffixed-x3'
clang-7: error: unknown argument: '-ffixed-x4'
clang-7: error: unknown argument: '-ffixed-x5'
clang-7: error: unknown argument: '-ffixed-x6'
clang-7: error: unknown argument: '-ffixed-x7'
clang-7: error: unknown argument: '-fcall-saved-x8'
clang-7: error: unknown argument: '-fcall-saved-x9'
clang-7: error: unknown argument: '-fcall-saved-x10'
clang-7: error: unknown argument: '-fcall-saved-x11'
clang-7: error: unknown argument: '-fcall-saved-x12'
clang-7: error: unknown argument: '-fcall-saved-x13'
clang-7: error: unknown argument: '-fcall-saved-x14'
clang-7: error: unknown argument: '-fcall-saved-x15'
clang-7: error: unknown argument: '-fcall-saved-x18'
The corresponding upstream LLVM bug is https://bugs.llvm.org/show_bug.cgi?id=9457
A comment suggests not implementing the missing options but use alternate calling conventions like "coldcc" or "preserve_allcc" instead.
The Makefile entry is:
CFLAGS_atomic_ll_sc.o := -fcall-used-x0 -ffixed-x1 -ffixed-x2 \
-ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6 \
-ffixed-x7 -fcall-saved-x8 -fcall-saved-x9 \
-fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12 \
-fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15 \
-fcall-saved-x18 -fomit-frame-pointer
Not sure if there is an equivalent configuration in LLVM, the documentation is a bit vague about the specifics of the alternate calling conventions.
,
Jul 18
From atomic_ll_sc.h: * NOTE: these functions do *not* follow the PCS and must explicitly * save any clobbered registers other than x0 (regardless of return * value). This is achieved through -fcall-saved-* compiler flags for * this file, which unfortunately don't work on a per-function basis * (the optimize attribute silently ignores these options). And the clang doc of 'preserve_all' which was mentioned a few times: This calling convention also behaves identical to the C calling convention on how arguments and return values are passed, but it uses a different set of caller/callee-saved registers. This removes the burden of saving and recovering a large register set before and after the call in the caller. If the arguments are passed in callee-saved registers, then they will be preserved by the callee across the call. This doesn’t apply for values returned in callee-saved registers. On X86-64 the callee preserves all general purpose registers, except for R11. R11 can be used as a scratch register. Furthermore it also preserves all floating-point registers (XMMs/YMMs). The idea behind this convention is to support calls to runtime functions that don’t need to call out to any other functions. This sounds similar to what the header says, except for x0. X86-64 has the scratch register, but this doesn't seem to be the case for arm64. The match seems close enough to give an upstream patch a shot, if it doesn't fit the bill we might at least learn why.
,
Jul 18
It seems the 'preserve_all' calling convention can only be specified through an attribute, and not as compiler option. It probably make sense from a clang POV since it is usually only applied to specific functions, however it might cause resistance from kernel folks, because it clutters the code with clang specific attributes (even when abstracting it with a #define, since this would be empty for gcc).
,
Jul 19
It seems 'preserve_all' isn't supported for arm64, at least in our version of LLVM. Attached is a minimalistic function using the calling convention attribute, the compiler options are derived from the kernel build: clang -Wp,-MD,cc_test.o.d -nostdinc -isystem /usr/lib64/clang/7.0.0/include -mlittle-endian -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 --target=aarch64-cros-linux-gnu --gcc-toolchain=/usr -no-integrated-as -fno-PIE -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -O2 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -fomit-frame-pointer -c -o cc_test.o cc_test.c fatal error: error in backend: Unsupported calling convention. clang-7: error: clang frontend command failed with exit code 70 (use -v to see invocation) The compilation succeeds without '--target=aarch64-cros-linux-gnu'
,
Jul 19
Opened upstream bug https://bugs.llvm.org/show_bug.cgi?id=38229 for the broken calling convention.
,
Jul 19
preserve_all is only implemented for x86_64. I'm honestly a little hesitant relying on preserve_all anyway, for a few reasons: (1) I'm scared about relying on a calling convention that just seems "close enough". The caller code is written in inline assembly, using a bl instruction with a hard-coded clobber list. If clang is too aggressive and clobbers something outside of that list, it will lead to weird bugs at runtime. If clang is too conservative and saves one of the registers that the caller expects to be clobbered anyway, that's extra overhead on fairly performance-sensitive code. (2) docs/LangRefs.rst says preserve_allcc is experimental and was added for a future version of the Objective-C runtime, which I take to mean its semantics could change in the future. (3) If upstream decides to tweak atomic_ll_sc.o's custom calling convention, we're back where we started.
,
Jul 19
Some background here on what the ARM64 atomics layer is doing: When CONFIG_ARM64_LSE_ATOMICS=n, all the kernel's atomic primitives are implemented as macros that expand to inline assembly. When CONFIG_ARM64_LSE_ATOMICS=y, the kernel moves these primitives into out-of-line functions. The standard atomic macros now expand to assembly branch instructions into these functions. If the kernel detects that your machine supports LSE atomics at boot time, then the alternatives system rewrites these branches into LSE atomic instructions. This lets the kernel use LSE instructions where available, while maintaining backwards compatibility with ARMv8.0 CPUs. Moving the LL/SC atomic primitives out-of-line incurs a performance penalty on CPUs without LSE instructions, and these atomic operations are fairly common in the kernel. So to minimize this penalty, the inlined caller code uses a custom calling convention that only allows the out-of-line LL/SC routines to clobber three registers (x16, x17, and x30).
,
Jul 19
I believe we'd be better off implementing a calling convention in clang, and changing the documentation to say what the exact behavior is (removing the cop-out "might change" wording, too, while at it). Then, just ensure that the clobber list in the asm matches the documentation, and it should be fine. I don't believe the exact calling convention is important here, but it should be minimally-clobbering, and of course needs to match the asm callsite.
,
Jul 19
Thanks Greg for providing these details! I was/am not really familiar with LSE atomics and my knowledge of inline assembly is very limited. I mainly looked at atomic_ll_sc.h which doesn't use the hard-coded clobber list. With the inline assembly in https://elixir.bootlin.com/linux/v4.18-rc5/source/arch/arm64/include/asm/atomic_lse.h I agree that using 'preserve_all' is at least risky and we probably need to be as specific as with gcc.
,
Jul 19
> I believe we'd be better off implementing a calling convention in clang >> (3) If upstream decides to tweak atomic_ll_sc.o's custom calling convention, we're back where we started. So if we implemented the custom calling convention in clang, then the kernel changed the calling convention, we'd have to add yet another one-off calling convention to clang? The -ffixed-reg flags seem more fine grained against future changes to the calling convention (though, the calling convention might not ever change for the code in question, hard to say).
,
Jul 19
This code can use _any_ calling convention, it doesn't actually require a particular unique one -- as long as the call-site asm code matches the convention used by the definition. The kernel just wants the code to be efficient, which in this case means not clobbering too many registers. So I don't think there's a particular danger here of needing to keep adding one-off calling conventions to match the kernel code here. Actually, for now, I'd suggest that you can simply use the already-existing preserve_most calling convention, and just add some additional registers to the clobber list. You may well be able to just stop there -- it seems entirely possible to me that having x0-x8 also clobbered could turn out to not even MATTER for performance in practice, in which case, you're done.
,
Aug 3
,
Aug 6
From what I can tell from a brief scan of the code there is an implementation of preserve-most in AArch64 (it adds X9 - X15) to the callee save registers in addition to the AAPCS X19 - X28, LR and FP. There is some vestigial support for preserve-all in AArch64CallingConvention.td but there does not seem to be any way of selecting it. It is possible that this support was lost in the merging of the ARM64 and AArch64 backends. If I've understood the proposal above then there would be another calling convention that adds X0 - X8 to the calleesave list with a description of something like: - On AArch64 the callee preserves all core registers, except for X16 and X17. X16 and X17 can be used as a scratch registers. Floating-point/AdvancedSIMD registers are not preserved and need to be saved by the caller. For what its worth I think a new calling convention fits much better with Clang/LLVM than adding more fixed registers, and the risk of more of these is low. Having said that I'm probably not one of the people that are likely to need convincing.
,
Aug 6
> So I don't think there's a particular danger here of needing to keep adding one-off calling conventions to match the kernel code here. Note that a patch was just sent to change the calling convention to add X15 to the list of scratch registers: https://www.spinics.net/lists/arm-kernel/msg670180.html It seems brittle to add support for 1 calling convention that hard codes the register set when they can be changed so rapidly.
,
Aug 6
If you made it adhere to a given cc (e.g. clang's preserve_most), and then added a comment saying what the calling convention is, and why it shouldn't just be randomly changed, that seems like it'd solve that.
,
Sep 14
cmtice@ Can you cherry-pick https://reviews.llvm.org/rL341706 and https://reviews.llvm.org/rL342100 to llvm-next as well?
,
Sep 14
,
Sep 17
Fixes have been cherry picked to llvm-next. Is there anything more I need to do?
,
Sep 17
Thanks cmtice@. mka@ Can you verify that upstream ARM64 kernel can be built with llvm-next llvm? i.e. $ USE=llvm-next sudo emerge llvm clang
,
Sep 17
Note that only -ffixed-x* has been implemented for some registers for arm64 (including the ones we need), but not yet fcall-used-x* or -fcall-saved-x*, so CONFIG_ARM64_LSE_ATOMICS wont work yet.
,
Sep 17
As Nick mentioned, there are still missing options: clang-8: error: unknown argument: '-fcall-used-x0' clang-8: error: unknown argument: '-fcall-saved-x8' clang-8: error: unknown argument: '-fcall-saved-x9' clang-8: error: unknown argument: '-fcall-saved-x10' clang-8: error: unknown argument: '-fcall-saved-x11' clang-8: error: unknown argument: '-fcall-saved-x12' clang-8: error: unknown argument: '-fcall-saved-x13' clang-8: error: unknown argument: '-fcall-saved-x14' clang-8: error: unknown argument: '-fcall-saved-x15' clang-8: error: unknown argument: '-fcall-saved-x18'
,
Sep 19
Proposal to remove -fcall-used-x0 from the kernel: http://lists.infradead.org/pipermail/linux-arm-kernel/2018-September/602609.html -fcall-saved-x* for aarch64 LLVM backend: http://go/llvm-cr/D52216 (code review help wanted) Then we'll just need to land -fcall-saved-x* support in Clang.
,
Sep 25
https://github.com/ClangBuiltLinux/linux/issues/25#issuecomment-419574229 TL;DR 2 of the 3 flags are implemented 1 flag was dropped upstream (will need to cherry pick a patch from the for-next branch of the arm64 tree, link to patch in above link).
,
Sep 25
If CrOS-LLVM team can cherry pick https://reviews.llvm.org/rL342824 and https://reviews.llvm.org/rL342990, and CrOS-Linux kernel team can cherry pick https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=2a6c7c367de82951c98a290a21156770f6f82c84, then we can probably close this out.
,
Sep 25
Assign to cmtice@ for cherry-picking https://reviews.llvm.org/rL342824 and https://reviews.llvm.org/rL342990 to llvm-next.
,
Oct 9
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/03a1fac54db397fdcae36e49faff9ce7a5aafd16 commit 03a1fac54db397fdcae36e49faff9ce7a5aafd16 Author: Caroline Tice <cmtice@google.com> Date: Tue Oct 09 23:22:59 2018 Add two more cherry picks to LLVM. Missed these two when we upgraded the compiler. These add support for the X[8-15,18] registers on aarch64. BUG= chromium:865188 TEST=Successfully ran llvm-next-tryjob on kevin. Change-Id: I54c784198245795f2fed9dfb7c1f0e82b9c8ef59 Reviewed-on: https://chromium-review.googlesource.com/c/1265092 Reviewed-by: Manoj Gupta <manojgupta@chromium.org> Tested-by: Caroline Tice <cmtice@chromium.org> Commit-Queue: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> [rename] https://crrev.com/03a1fac54db397fdcae36e49faff9ce7a5aafd16/sys-devel/llvm/llvm-8.0_pre339409_p20180926-r3.ebuild [add] https://crrev.com/03a1fac54db397fdcae36e49faff9ce7a5aafd16/sys-devel/llvm/files/cherry/75dc9f32d0b4f441d2e1b980445e9b7d2d74505c.patch [add] https://crrev.com/03a1fac54db397fdcae36e49faff9ce7a5aafd16/sys-devel/llvm/files/cherry/260dbbf3855227c827be14b15cac86126f1d22fe.patch [modify] https://crrev.com/03a1fac54db397fdcae36e49faff9ce7a5aafd16/sys-devel/llvm/llvm-8.0_pre339409_p20180926.ebuild
,
Oct 22
Is there anything further that needs to be done for this bug, or can I close it?
,
Oct 22
mka@ should pick https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=2a6c7c367de82951c98a290a21156770f6f82c84 into the relevant kernels (and/or we should send to LTS kernel maintainers and take them from there).
,
Oct 22
,
Oct 23
CrOS v4.14 builds with the CL from #28 and LSE atomics enabled, however the following warning is raised: arch/arm64/Makefile:40: LSE atomics not supported by binutils it stems from: # Check for binutils support for specific extensions lseinstr := $(call as-instr,.arch_extension lse,-DCONFIG_AS_LSE=1) ifeq ($(CONFIG_ARM64_LSE_ATOMICS), y) ifeq ($(lseinstr),) $(warning LSE atomics not supported by binutils) endif endif This is not a clang problem though, so I think we can close this issue and create a separate one for binutils.
,
Oct 23
> I think we can close this issue and create a separate one for binutils. Yep, likely our version of binutils doesn't understand the assembler directive `.arch_extension lse`.
,
Oct 23
Filed https://bugs.chromium.org/p/chromium/issues/detail?id=898252, please help me move it to the correct component (I'm a noob at CrOS' issue tracker). |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by manojgupta@chromium.org
, Jul 18