Precise x64 release builders produce defective binaries under certain conditions |
|||||||
Issue descriptionPlease see minimal reproduction here: https://codereview.chromium.org/2093203002/ When above target is built using the following GN args, and then the produced executable is run on Ubuntu Precise x64 / Debian Wheezy, it will SEGFAULT on calling pthread_open() which is invoked from std::locale in from libstdc++. target_cpu = "x64" is_debug = false is_chrome_branded = true is_official_build = true is_component_build = true This environment is set up on https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20Linux%20x64, but for testing, the issue can be easily reproduced locally: OUTPUTDIR=/path/to/src/out/gnoutdir CHROMIUM_SRC=/path/to/src cp ${OUTPUTDIR}/flto_repro ${CHROMIUM_SRC}/build/linux/debian_wheezy_amd64-sysroot/usr/bin sudo chroot ${CHROMIUM_SRC}/build/linux/debian_wheezy_amd64-sysroot/ /usr/bin/flto_repro
,
Jun 24 2016
FWIW, I have done quite a bit of investigation, but can't quite put my finger on what exactly is at fault here. With the build steps generated by GN, the presence of `-flto` seemed relevant. When trying to reproduce with a single command, I found that some subsets of the following options worked, while some didn't: clang++ -pthread -O2 -fuse-ld=gold -Wl,--as-needed repro.cc -B/path/to/src/third_party/binutils/Linux_x64/Release/bin --sysroot=path/to/usr/lib/x86_64-linux-gnu (+ corresponding -L + rpath-link flags) flto_repro.cc With some configurations, `ldd` did not list libpthread as an SO-dependency of the executable, in those cases it's not surprising that pthread_open() cannot be resolved.
,
Jun 24 2016
Re #1: Hmm, then it's rather unfortunate that we have the "Google Chrome Linux x64" builder running under Precise, and being a tree-closer at the same time.
,
Jun 24 2016
s/pthread_open/pthread_once/g
,
Jun 24 2016
Oh, ya, those machines. I guess they still need to make sure tests pass. If only we stopped dragging our feet on the Trusty upgrade.
,
Jun 24 2016
To be fair, there may be more to this than just Precise being old. This happens locally on Trusty, which I find surprising: out/ReleaseGN/$ ../../third_party/llvm-build/Release+Asserts/bin/clang++ -pthread -O2 -fuse-ld=gold \ -B../../third_party/binutils/Linux_x64/Release/bin -Wl,--as-needed ../../flto_repro/flto_repro.cc out/ReleaseGN/$ a.out Segmentation fault (core dumped) And now it also seem like during the experiments mentioned in #2, the failure always occurred with -B. Maybe issue with gold 2.26.20160125?
,
Jun 25 2016
Okay, I have done more digging, it's not related to ld.gold, instead it seems to be a combination of two things: First, we are triggering a bug in glibc that is described here: http://www.openwall.com/lists/musl/2014/10/18/5. Essentially, it tries to detect if it is running in a multithreaded program by probing whether __pthread_key_create() is linked in. If so, then it won't do any checking before calling other weak references to symbols such as __pthread_once(). This is normally only a problem when you try to statically link libpthread, and not with dynamic linking in which case either all symbols from the shared library is linked in, or nothing. Therefore the second, more interesting question is why we are triggering this bug in first place with dynamic linkage. For some reason, due to -O2, the invocation in #6 produces an executable that itself exports a weak, but defined, __pthread_key_create symbol: $readelf --dyn-syms ./a.out Symbol table '.dynsym' contains 13 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2) 2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ 3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab 4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable 5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 (3) 7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZNSsC1EPKcRKSaIcE@GLIBCXX_3.4 (3) 8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZNSs4_Rep10_M_destroyERK@GLIBCXX_3.4 (3) 9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __cxa_atexit@GLIBC_2.2.5 (2) 10: 0000000000400700 0 FUNC WEAK DEFAULT UND __pthread_key_create 11: 0000000000402080 32 OBJECT UNIQUE DEFAULT 26 _ZNSs4_Rep20_S_empty_rep_@GLIBCXX_3.4 (3) 12: 0000000000400730 0 FUNC GLOBAL DEFAULT UND _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3) This is then bound to the weak reference in glibc upon execution, which encourages glibc to call pthread_once, which, of course, segfaults, as it is not bound: $ LD_DEBUG=all ./a.out 2>&1 | grep pthread_ 96388: symbol=__pthread_key_create; lookup in file=./a.out [0] 96388: binding file /lib/x86_64-linux-gnu/libgcc_s.so.1 [0] to ./a.out [0]: normal symbol `__pthread_key_create' 96388: symbol=__pthread_key_create; lookup in file=./a.out [0] 96388: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to ./a.out [0]: normal symbol `__pthread_key_create' 96388: symbol=pthread_once; lookup in file=./a.out [0] 96388: symbol=pthread_once; lookup in file=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] 96388: symbol=pthread_once; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0] 96388: symbol=pthread_once; lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0] 96388: symbol=pthread_once; lookup in file=/lib64/ld-linux-x86-64.so.2 [0] 96388: symbol=pthread_once; lookup in file=/lib/x86_64-linux-gnu/libgcc_s.so.1 [0] When building without -pthread, or without -O2, or with -fno-inline-functions, the extra copy of __pthread_key_create is not created/exported, so glibc correctly assumes that libpthread is not present/used. Of course it can be argued if the symbol should be in the other cases, but I don't think it actually violates any specs. (It's not there when building with GCC, however.) When building without `-Wl,--as-needed`, a DT_NEEDED entry declaring dependency on libpthread is added to the dynamic section of the executable itself, which, in a weird instance of overlinking, masks the issue. This is what probably allows all other Chromium build targets that actually use libpthread to run without problems. Finally, the reason why the problem first manifest as a SEGFAULT is that the normal safeguards are all disabled for functions declared as a weak symbol. By default, there would be dependency checking during linking (unless -Wl,--allow-shlib-undefined), as well as ld-linux would terminate with an informative error message if it failed to resolve a symbol. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=16417#c6
,
Jun 26 2016
Since this sounds LTO-related, +krasin
,
Jun 26 2016
Peter, can you please take a look? Nico, sorry, I am on vacation until July 7 in UTC+0 timezone, and have limited connectivity. If the current issue is not resolved until tomorrow, I will come to the local Google office and work from here.
,
Jun 26 2016
,
Jun 26 2016
If you're on vacation, don't come into the office! We'll figure it out somehow.
,
Jun 26 2016
ack, thx!
,
Jun 27 2016
engedy@, thanks for the detailed investigation! It seems that the definition of __pthread_key_create in the main executable is not a regular definition but what I'm going to call a "pseudo PLT entry". The static linker creates these in response to seeing a reference to the symbol that does not go via the PLT or GOT. (The reason why this needs to be a definition is that the symbol needs to have a unique address in order to allow function pointer comparisons to work correctly.) I discussed this with Rafael (LLVM's resident object file expert) and I think this is a linker bug: either the linker should not be creating the pseudo PLT entry, or it should create the DT_NEEDED entry. The same issue exists in lld, and I've created an LLVM bug with more details here: https://llvm.org/bugs/show_bug.cgi?id=28335 While this issue should ultimately be resolved in the linker, we can work around it by ensuring that the linker always emits a DT_NEEDED entry for libpthread. I believe that can do that by always including the following in ldflags on Linux platforms: "-Wl,--no-as-needed -lpthread -Wl,--as-needed". I'll create a CL that does that.
,
Jun 28 2016
Peter, thanks for unraveling the toughest part of this mystery! Dominic, Pavel, FYI.
,
Jun 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7ebfaa0c5067c77db1b4891ac6571f8d1b97fc4d commit 7ebfaa0c5067c77db1b4891ac6571f8d1b97fc4d Author: pcc <pcc@chromium.org> Date: Wed Jun 29 01:58:50 2016 Disable --as-needed for libpthread to work around linker bug. If a DSO is linked against with --as-needed and the program contains no strong references to any symbols in the DSO, the documented behaviour is that the linker should not create a DT_NEEDED entry for that DSO. However, if the program contains weak references to symbols in the DSO, this results in an incorrect resolution for that symbol. In Chromium this happens in particular for the symbol __pthread_key_create (as a result of code inlined from a pthread header), which causes the program to segfault. For more details, see the associated bug and the related LLVM bug: http://llvm.org/pr28335 Work around this by linking against libpthread with --no-as-needed. Because the problem only affects glibc, only do this when not targeting NaCl or Android, neither of which normally uses glibc as the C runtime library (NaCl can use glibc, but we haven't observed this problem there). BUG= 623236 R=thakis@chromium.org,engedy@chromium.org Review-Url: https://codereview.chromium.org/2094273003 Cr-Commit-Position: refs/heads/master@{#402650} [modify] https://crrev.com/7ebfaa0c5067c77db1b4891ac6571f8d1b97fc4d/build/common.gypi [modify] https://crrev.com/7ebfaa0c5067c77db1b4891ac6571f8d1b97fc4d/build/config/compiler/BUILD.gn
,
Jun 1 2017
Closing this since there are no more precise builders
,
Apr 24 2018
thomasanderson: Does that mean we can remove https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?type=cs&q=wd4996&sq=package:chromium&l=398 now?
,
May 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ae768dbf271fec5c84411ca195bf1bab07b7ed45 commit ae768dbf271fec5c84411ca195bf1bab07b7ed45 Author: Nico Weber <thakis@chromium.org> Date: Mon May 21 22:23:55 2018 attempt to remove a workaround Bug: 623236 Change-Id: I7d914fd314685b0e874e515a48dfcd1819c895d2 Reviewed-on: https://chromium-review.googlesource.com/1067602 Reviewed-by: Thomas Anderson <thomasanderson@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#560376} [modify] https://crrev.com/ae768dbf271fec5c84411ca195bf1bab07b7ed45/build/config/compiler/BUILD.gn |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by thestig@chromium.org
, Jun 24 2016