ThinLTO Linux ToT bot should build with debug information |
|||||
Issue descriptionAs we're mostly concerned about doing ThinLTO builds in the official mode, and the official mode implies symbol_level=2, we should enable the same on 'ThinLTO Linux ToT' bot. Currently, something is broken on the LLVM side about ThinLTO + debug symbols (Peter knows the details), so this is blocked on that issue being resolved first.
,
Jan 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bf6c6a31bafed2f7bc5ed6083e32b21956152221 commit bf6c6a31bafed2f7bc5ed6083e32b21956152221 Author: krasin <krasin@chromium.org> Date: Thu Jan 19 22:34:10 2017 Build with full debug info on ThinLTO Linux ToT bot. This will be a better proxy for the official builds, which we are mostly worried about in the context of ThinLTO builds. BUG= 682773 , 660216 Review-Url: https://codereview.chromium.org/2644803003 Cr-Commit-Position: refs/heads/master@{#444866} [modify] https://crrev.com/bf6c6a31bafed2f7bc5ed6083e32b21956152221/tools/mb/mb_config.pyl
,
Jan 20 2017
The bot went red, and here is how to reproduce the failure:
$ gn gen out/thin-lto-tot '--args=is_chrome_branded=false is_official_build=false is_clang=true is_component_build=false is_debug=false allow_posix_link_time_opt=true is_cfi=false use_t
hin_lto=true symbol_level=2 clang_use_chrome_plugins=false use_lld=true clang_base_path="/usr/local/google/home/krasin/src/llvm.org/release-build"' --check
$ ninja -C out/thin-lto-tot/ angle_unittests
ninja: Entering directory `out/thin-lto-tot/'
[1195/1195] LINK ./angle_unittests
FAILED: angle_unittests
../../../../src/llvm.org/release-build/bin/clang++ -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -flto=thin -Wl,--thinlto-jobs=8 -Wl,-plugin-opt,-function-sections -m64 -pthread -Werror -Wl,-O1 -Wl,--gc-sections --sysroot=../../build/linux/debian_wheezy_amd64-sysroot -L/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/lib/x86_64-linux-gnu -Wl,-rpath-link=/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/lib/x86_64-linux-gnu -L/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/x86_64-linux-gnu -L/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath-link=/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib -Wl,-rpath-link=/usr/local/google/home/krasin/chr33/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,--export-dynamic -o "./angle_unittests" -Wl,--start-group @"./angle_unittests.rsp" -Wl,--end-group -ldl -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lX11 -lXi -lXext -lX11-xcb -lxcb -lXcomposite -lXcursor -lXdamage -lXfixes -lXrender -lXtst
ld.lld: /usr/local/google/home/krasin/src/llvm.org/llvm/lib/Bitcode/Reader/MetadataLoader.cpp:772: void llvm::MetadataLoader::MetadataLoaderImpl::lazyLoadOneMetadata(unsigned int, {anonymous}::{anonymous}::PlaceholderQueue&): Assertion `ID < (MDStringRef.size()) + GlobalMetadataBitPosIndex.size()' failed.
ld.lld: /usr/local/google/home/krasin/src/llvm.org/llvm/lib/Bitcode/Reader/MetadataLoader.cpp:772: void llvm::MetadataLoader::MetadataLoaderImpl::lazyLoadOneMetadata(unsigned int, {anonymous}::{anonymous}::PlaceholderQueue&): Assertion `ID < (MDStringRef.size()) + GlobalMetadataBitPosIndex.size()' failed.
clang-4.0: error: unable to execute command: Aborted (core dumped)
clang-4.0: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed.
,
Jan 21 2017
We found that this can only be reproduced with an lld built with gcc or a bootstrapped LTO build. Smaller reproducer: $ ld.lld obj/third_party/libxml/libxml/parser.o obj/third_party/libxml/libxml/xmlIO.o
,
Jan 21 2017
$ cat 1.c
xmlInitParser() { xmlRegisterDefaultOutputCallbacks(); }
$ cat 2.c
typedef xmlCharEncodingHandlerPtr;
a, b, c;
xmlCharEncodingHandlerPtr d;
void xmlRegisterDefaultOutputCallbacks() {
if (a)
return;
xmlRegisterOutputCallbacks();
}
xmlOutputBufferCreateFile() { xmlRegisterDefaultOutputCallbacks(); }
$ clang -O2 -g2 -flto=thin -c 1.c -o 1.o
$ clang -O2 -g2 -flto=thin -c 2.c -o 2.o
$ ld.lld 1.o 2.o
ld.lld: lib/Bitcode/Reader/MetadataLoader.cpp:785: void llvm::MetadataLoader::MetadataLoaderImpl::lazyLoadOneMetadata(unsigned int, {anonymous}::{anonymous}::PlaceholderQueue&): Assertion `ID < (MDStringRef.size()) + GlobalMetadataBitPosIndex.size()' failed.
,
Jan 24 2017
Per Peter's advice, I have taken two LLVM builds. One build was done by GCC 4.8.4, and the other one by Clang ToT. The issue is reproducible with the GCC build and not reproducible with Clang ToT. After making a few franken-builds, where I took some object files and archives from one build and some from the one one, I was able to detect that the issue is within MetadataLoader.cpp.o. This is still very coarse, of course, but a useful bit of information anyway.
,
Jan 24 2017
To clarify, I made builds of LLD which then tried to link the reproducer from #5.
,
Jan 26 2017
Today I ran per-method bisection, and the method that behaves differently between regular and bootstrap builds is MetadataLoader::MetadataLoaderImpl::parseOneMetadata. Unfortunately, the method is more than 700 lines long, so more work is ahead.
,
Jan 27 2017
This issue is now fixed by https://reviews.llvm.org/rL293291. The new issue that was previously maked by the parseOneMetadata's one has the following stack trace and steps to reproduce: $ gn gen out/thin-lto-tot '--args=is_chrome_branded=false is_official_build=false is_clang=true is_component_build=false is_debug=false allow_posix_link_time_opt=true is_cfi=fa[59/3253]hin_lto=true symbol_level=2 clang_use_chrome_plugins=false use_lld=true clang_base_path="/usr/local/google/home/krasin/src/llvm.org/release-build"' --check $ ninja -C out/thin-lto-tot font_service.service ... [2062/2062] LINK ./font_service.service FAILED: font_service.service ../../../../src/llvm.org/release-build/bin/clang++ -Wl,--fatal-warnings -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -flto=thin -Wl,--thinlto-jobs=8 -Wl,-plugin-opt,-function-sections -m64 -pthread -Werror -Wl,-O1 -Wl,--gc-sections --sysroot=../../build/linux/debian_wheezy_amd64-sysroot -L/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/lib/x86_64-linux-gnu -Wl,-rpath-link=/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/lib/x86_64-linux-gnu -L/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/x86_64-linux-gnu -Wl,-rpath-link=/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/x86_64-linux-gnu -L/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath-link=/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib -Wl,-rpath-link=/usr/local/google/home/krasin/chr32/src/build/linux/debian_wheezy_amd64-sysroot/usr/lib -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,--export-dynamic -o "./font_service.service" -Wl,--start-group @"./font_service.service.rsp" -Wl,--end-group -ldl -lrt -lgmodule-2.0 -lgobject-2.0 -lgthread-2.0 -lglib-2.0 -lnss3 -lnssutil3 -lsmime3 -lplds4 -lplc4 -lnspr4 -lpangocairo-1.0 -lpango-1.0 -lcairo -lfontconfig -lfreetype -lexpat -lgio-2.0 All DICompileUnits must be listed in llvm.dbg.cu ld.lld: /usr/local/google/home/krasin/src/llvm.org/llvm/lib/IR/Verifier.cpp:4516: virtual bool {anonymous}::VerifierLegacyPass::doFinalization(llvm::Module&): Assertion `!V->hasBrokenDebugInfo() && "Module contains invalid debug info"' failed. #0 0x000000000051fd18 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/Support/Unix/Signals.inc:406:0 #1 0x000000000051dc1e llvm::sys::RunSignalHandlers() /usr/local/google/home/krasin/src/llvm.org/llvm/lib/Support/Signals.cpp:45:0 #2 0x000000000051dd8c SignalHandler(int) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/Support/Unix/Signals.inc:246:0 #3 0x00007f33d0ca3330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330) #4 0x00007f33cfa9ec37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0 #5 0x00007f33cfaa2028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0 #6 0x00007f33cfa97bf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0 #7 0x00007f33cfa97ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2) #8 0x0000000002028e4a (anonymous namespace)::VerifierLegacyPass::doFinalization(llvm::Module&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/IR/Verifier.cpp:4516:0 #9 0x0000000001fd0c5c llvm::FPPassManager::doFinalization(llvm::Module&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1552:0 #10 0x0000000001fdb0b0 /usr/local/google/home/krasin/src/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1608:0 #11 0x0000000001fdb0b0 llvm::legacy::PassManagerImpl::run(llvm::Module&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/IR/LegacyPassManager.cpp:1693:0 #12 0x000000000152e4d6 /usr/local/google/home/krasin/src/llvm.org/llvm/lib/LTO/LTOBackend.cpp:230:0 #13 0x000000000152e4d6 (anonymous namespace)::opt(llvm::lto::Config&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/LTO/LTOBackend.cpp:263:0 #14 0x000000000152e9be llvm::lto::backend(llvm::lto::Config&, std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, unsigned int, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ModuleSummaryIndex&) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/LTO/LTOBackend.cpp:370:0 #15 0x0000000001520e55 /usr/include/c++/4.8/functional:2029:0 #16 0x0000000001520e55 /usr/include/c++/4.8/functional:2174:0 #17 0x0000000001520e55 llvm::lto::LTO::runRegularLTO(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/LTO/LTO.cpp:604:0 #18 0x0000000001529663 /usr/include/c++/4.8/functional:2029:0 #19 0x0000000001529663 /usr/include/c++/4.8/functional:2174:0 #20 0x0000000001529663 llvm::lto::LTO::run(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, std::function<std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)> (unsigned int, llvm::StringRef)>) /usr/local/google/home/krasin/src/llvm.org/llvm/lib/LTO/LTO.cpp:541:0 #21 0x00000000005e87f9 lld::elf::BitcodeCompiler::compile() /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/LTO.cpp:144:0 #22 0x0000000000638018 /usr/include/c++/4.8/bits/stl_iterator.h:726:0 #23 0x0000000000638018 /usr/include/c++/4.8/bits/stl_vector.h:539:0 #24 0x0000000000638018 lld::elf::SymbolTable<llvm::object::ELFType<(llvm::support::endianness)1, true> >::addCombinedLTOObject() /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/SymbolTable.cpp:120:0 #25 0x000000000059f5c8 void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::support::endianness)1, true> >(llvm::opt::InputArgList&) /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/Driver.cpp:832:0 #26 0x00000000004aa8d4 lld::elf::LinkerDriver::main(llvm::ArrayRef<char const*>, bool) /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/Driver.cpp:343:0 #27 0x00000000005a066c /usr/include/c++/4.8/bits/stl_iterator.h:726:0 #28 0x00000000005a066c /usr/include/c++/4.8/bits/stl_vector.h:557:0 #29 0x00000000005a066c /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/Memory.h:60:0 #30 0x00000000005a066c lld::elf::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&) /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/ELF/Driver.cpp:79:0 #31 0x00000000004a8949 main /usr/local/google/home/krasin/src/llvm.org/llvm/tools/lld/tools/lld/lld.cpp:104:0 #32 0x00007f33cfa89f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0 #33 0x000000000051028d _start (/usr/local/google/home/krasin/chr32/src/out/thin-lto-tot/../../../../src/llvm.org/release-build/bin/ld.lld+0x51028d) This only happens, if debug info is enabled.
,
Jan 27 2017
Thanks, I'll start digging.
,
Jan 27 2017
,
Jan 27 2017
This should be fixed when https://reviews.llvm.org/D28913 is submitted.
,
Jan 27 2017
For the record, the problematic object file is out/thin-lto-tot/obj/skia/skia/GrFragmentProcessor.o
,
Jan 27 2017
$ cat p.cpp
class C;
class A {
public:
A(C *);
};
class B {
public:
template <typename> void m_fn2() { static int a; }
virtual void m_fn1();
};
class C : public B {};
void fn1() {
class D : public C {
public:
D() { m_fn2<D>(); }
};
A(new D);
}
$ clang++ -target x86_64-unknown-linux -fvisibility=hidden -O2 -g2 -flto=thin -c p.cpp -o p.o
$ llvm-modextract -b -n 1 p.o -o - | opt -o /dev/null
All DICompileUnits must be listed in llvm.dbg.cu
,
Jan 28 2017
https://reviews.llvm.org/D29240 is the fix.
,
Feb 8 2017
The fix landed upstream. That fixed the build, but it looks like we still have a few test failures. https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1195 Not clear if they're related to debug info or just regressions from the past few weeks.
,
Feb 8 2017
Thank you, Peter. I will investigate this tomorrow.
,
Feb 8 2017
Confirmed that the issue exists without debug info too. Also spotted the fact that ThinLTO with debug info eats significantly more memory than without. It might be even larger number than for FullLTO + debug symbols, but the latter fact is not proven.
,
Feb 24 2017
My current hypothesis (not tested yet) is that ThinLTO somehow confuses these two static symbols from two different compilation units: https://cs.chromium.org/chromium/src/components/data_reduction_proxy/core/browser/data_reduction_proxy_data.cc?q=DataReductionProxyData::GetDataAndCreateIfNecessary&dr=CSs&l=11 const void* const kDataReductionProxyUserDataKey = &kDataReductionProxyUserDataKey; and https://cs.chromium.org/chromium/src/chrome/browser/loader/chrome_navigation_data.cc?gsn=ChromeNavigationData const void* const kChromeNavigationDataUserDataKey = &kChromeNavigationDataUserDataKey; What I observe is that DataReductionProxyData* is created, placed into a map for user values in a request and then it's static cast to ChromeNavigationData*. Both of the types are derived from base::SupportsUserData::Data. Somehow, we don't see any CFI or UBSan complaints.
,
Feb 24 2017
Which probably can be explained by the fact that we don't build CFI or UBSan with ThinLTO
,
Feb 24 2017
Confirmed (the keys collision): DataReductionProxyData::GetDataAndCreateIfNecessary, 0, kDataReductionProxyUserDataKey: 0xbc3c40 ChromeNavigationData::GetDataAndCreateIfNecessary, 0, kChromeNavigationDataUserDataKey: 0xbc3c40 I am trying to reproduce it locally, but failing so far.
,
Feb 24 2017
Could it be ICF that is merging these two variables?
,
Feb 24 2017
Correct. Passing -Wl,--icf=all to ldd allowed me to reproduce the issue with a standalone program. That's a bug, is it?
,
Feb 24 2017
Repro:
$ cat first.cc
#include <stdio.h>
const void* const kFirstKey = &kFirstKey;
void PrintFirstKey() {
printf("kFirstKey: %p\n", kFirstKey);
}
$ cat second.cc
#include <stdio.h>
const void* const kSecondKey = &kSecondKey;
void PrintSecondKey() {
printf("kSecondKey: %p\n", kSecondKey);
}
$ cat main.cc
#include <stdio.h>
void PrintFirstKey();
void PrintSecondKey();
int main(void) {
PrintFirstKey();
PrintSecondKey();
return 0;
}
Build script:
#!/bin/bash
set -ue
mkdir -p thin
cd thin
readonly CFLAGS="-flto=thin -Werror -Wall -std=c++11"
clang++ -c ../main.cc -o main.o $CFLAGS
clang++ -c ../first.cc -o first.o $CFLAGS
clang++ -c ../second.cc -o second.o $CFLAGS
clang++ -fuse-ld=lld -flto=thin -o run main.o first.o second.o -Wl,--lto-O2 -Wl,--icf=all
Running it:
$ ./thin/run
kFirstKey: 0x2002d8
kSecondKey: 0x2002d8
,
Feb 24 2017
FYI, it's a very common trick in Chrome: https://cs.chromium.org/search/?q=const.k.*Key.%3D&type=cs
,
Feb 24 2017
Maybe. By passing --icf=all the linker is assuming that pointer addresses are not significant, but linkers have historically made that assumption only for code pointers. LLD's implementation of --icf=all is more aggressive than gold's because it looks at read-only data as well as code. I think Reid mentioned that he ran into similar issues in lld's COFF linker because it was more aggressive than Microsoft's linker in exactly the same way. I wonder how large the actual benefit of ICF is in practice for read-only data and whether it would be more friendly for lld to only consider code for ICF. Reid/Rui: any thoughts?
,
Feb 24 2017
FWIW, this can be reproduced easily without LTO:
$ cat icftest.c
static void *const foo = &foo;
static void *const bar = &bar;
int main() {
printf("%p %p\n", &foo, &bar);
}
$ clang -O3 -fuse-ld=lld -Wl,--icf=all icftest.c -fdata-sections
$ ./a.out
0x2002c0 0x2002c0
,
Feb 25 2017
I'm not sure how much we would loss by limiting ICF to only functions, but issues caused by ICF are sometimes hard to debug, it might make sense to be conservative.
Making LLD ICF merge only executable sections should be pretty easy. Could you apply this patch and see how much is the loss in size?
--- a/lld/ELF/ICF.cpp
+++ b/lld/ELF/ICF.cpp
@@ -163,8 +163,8 @@ template <class ELFT> static bool isEligible(InputSection *S) {
// .init and .fini contains instructions that must be executed to
// initialize and finalize the process. They cannot and should not
// be merged.
- return S->Live && (S->Flags & SHF_ALLOC) && !(S->Flags & SHF_WRITE) &&
- S->Name != ".init" && S->Name != ".fini";
+ return S->Live && (S->Flags & SHF_ALLOC) && (S->Flags & SHF_EXECINSTR)
+ !(S->Flags & SHF_WRITE) && S->Name != ".init" && S->Name != ".fini";
}
// Split an equivalence class into smaller classes.
,
Feb 25 2017
First, yes: the patch fixes the bug. The difference in size (before stripping) is insignificant, on unit_tests target: before: 250167320 after: 251125784 So, it's like 0.4% (before strip). After strip it's: before: 194651080 after: 195609544 So, 0.5%. I think the correctness is worth it.
,
Feb 25 2017
Also, a minor correction to the patch in #28: && is missing at the end of the first inserted line.
,
Feb 25 2017
https://reviews.llvm.org/D30365 is sent out to make ICF merge only functions.
,
Mar 22 2017
This is fixed a long time ago. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by p...@chromium.org
, Jan 19 2017