32-bit Android builds fail when linker output is >4GB |
|||||||
Issue description(forked from issue 638485 which is private due to discussing lots of internal infrastructure details) Summary ======= Building Android targets for 32-bit architectures using default release build settings causes weird build errors like: readelf: Error: Reading 0xfce6f7fc bytes extends past end of file for string table readelf: Error: no .dynamic section in the dynamic segment The issue is that in a non-component build with the default amount of debug information (symbol_level=2), the shared library produced by the linker is now over 4GiB for some targets. The toolchains for 32-bit architectures produce ELF32 format binaries, which cannot represent file offsets larger than 4GiB due to using 32-bit header fields. The resulting binaries are therefore corrupted, and cannot be processed correctly by other tools that run after the linker. Workarounds =========== The immediate workarounds for developer builds are: 1) Use a component build (the default for debug builds) since it produces a number of separate smaller shared libraries instead of one very big one. Works fine, but has the "usual" disadvantages of a component build (less realistic performance/startup time). 2) Set symbol_level = 1 in your GN configuration. This will produce *much* smaller binaries by omitting some of the debug information. You will still be able to generate and symbolise correct stack backtraces from the binary, but if you attempt to debug the binary with gdb, there will be no information about local variables and the detail/accuracy of other interactive debugging information may be reduced. If you don't intend to debug with gdb, it's perfectly safe to set symbol_level=1 without concern (and it will, in fact, decrease disk space usage and linker memory usage, and increase linker performance). 3) Build for a 64-bit architecture instead, which doesn't have a 4GiB file limitation. Aside - why doesn't this fail during linking? ============================================= It should actually fail to link, rather than the linker "succeeding" but producing an invalid ELF binary. We filed https://sourceware.org/bugzilla/show_bug.cgi?id=20481 to track this - it's a bug in gold. Actually solving the problem ============================ Several options: 1) Linux uses debug fission, where the majority of the debug information is not linked into the final binary, but left in separate files that are just referred to by the main binary. Enabling this for android would solve the problem, because it would significantly reduce the size of the binary without losing any information. Two issues: 1a) I don't know if the Android toolchain supports debug fission or not 1b) I don't know if Android debugging tools, stack symbolisation tools, Breakpad, etc actually support debugging properly with debug-fissioned binaries. 2) There appears to be a way to compress the debug sections with -Wa,--compress-debug-sections that results in them being smaller. I don't know a lot about this, and I don't think we use it on any other platforms. Likely has the same two potential issues as 1) - does the toolchain support it and do tools that we use support it. 3) It's possible (in theory) to link the binary as an ELF64 binary, even though it's for a 32-bit architecture. ELF32/ELF64 refer to the *file format* of the binary, not the bit-ness of the code inside the binary, so using ELF64 allows for larger binary files (64-bit header fields in the ELF data structures). Current versions of the Linux kernel will successfully load ELF64 binaries on 32-bit architectures as long as the actual architecture defined in the header is appropriate; older linux kernel versions will not (I'm not sure where this changed). We could convert it back to ELF32 after stripping the debug information? Problem here is again I don't know if the toolchain will support it and whether gdb/etc will get confused by this. Debug fission is probably the first option to look into as we already use this on Linux.
,
Sep 28 2016
Are we hitting the limit now on all of the 32-bit platforms, or just arm?
,
Sep 28 2016
I haven't actually checked any of the others, but very few people (or bots) build the other platforms, and last I looked debug x86 binaries generally seemed to be larger than arm, so I am assuming that either they've also hit the limit or will soon. If someone wants to check that'd be somewhat interesting, but not a lot. :)
,
Sep 28 2016
- on ChromeOS we are using split debug (fission). We had to fix breakpad and ccache for it. But just recently we ran into issues with the .pdb (combination of .dwo files) being larger than 4 GB. I assume that will not be a problem in Android since it is using llvm and the debug info is smaller than the GCC generated debug info. - We also tried using -g1, but the crash dumps looked different and ChromeOS crash team did not like this. - compressing debug sections looked promising but we ran into an issue with the linker. So we did not follow that path. - We also considered using an ELF64 binary but it seems the tools are not ready to handle incoming ELF32 with output ELF64. We did not explore this much. - This is a big problem in CHromeOS, so now we are forced to use -femit-struct-debug-reduced. The crash dumps look ok but debuggability within GDB maybe compromised. We are hoping when we migrate to LLVM we don't have to use this option. - So, I think you first bet should be fission (split-dwarf)
,
Sep 28 2016
Android is not using LLVM, we still build with gcc 4.9. If you're talking about breakpad when you had a probelm with -g1, then this is because breakpad didn't used to support DWARF4 (see issue 638485 for me discovering that one) and so was inadvertantly depending on some info that's only in -g2 to symbolise things correctly. I have temporarily switched Android to DWARF3 to avoid this, and breakpad output is now not affected by -g1 vs -g2. Someone has, now, fixed breakpad to support DWARF4 so when I have a moment to test it, I'll change it back. So, if that's what you mean, then you should actually be able to use -g1 now if that's useful to you (at least for breakpad purposes, it still harms gdb usage of course).
,
Sep 28 2016
ok, good to know. we will retry g1... but, are you still having this problem even after using g1? I thought g1 was way more compact?
,
Sep 28 2016
People still want to be able to build with -g2 to debug effectively with gdb. Using -g1 is a short term workaround (the bots are using it, and many developers are too, but it impairs debugging). The point of this bug is to actually solve the problem so that people can build with -g2 again if they want to.
,
Sep 28 2016
ah, ok, I understand now. on ChromeOS we have 2 workflows: 1) for developers, can build images and fully debug. This one uses fission which does not have the 4GB limit and improves link time greatly. Debug info does not need to be delivered anywhere 2) release workflow. debug info needs to be delivered with the image. We use debug fission but then .pdb file is still larger than 4 GB. So, we use -femit-struct-debug-reduced option. -femit-struct-debug-reduced is less aggressive than -g1. So , you could use that one temporarily. Anyway, I think you should try fission next.
,
Sep 28 2016
re: comment 5 about still building with GCC. Are you looking into switching to Clang now? GCC is EOL for Android NDK, and it would be good to know what else is preventing you from transitioning. Please feel free to file internal bugs. We don't really have great answers for performance differences (those essentially must be handled by partners in upstream LLVM), but we can certainly help with correctness problems.
,
Sep 28 2016
Sorry, I made a mistake in the name of suffixes. the collected set of .dwo files for fission is called .dwp not .pdb.
,
Sep 28 2016
I'm not really involved with the LLVM switchover but my understanding from last time I heard about it was that the code was still larger and slower than gcc output and we couldn't afford the regression. I don't know to what degree exactly, and it was some time ago, so things may have changed.
,
Sep 28 2016
On Android we never ship any debug information with our release at all, so the only two consumers of teh debug info are developers doing local builds and the breakpad symbol server. The switch to DWARF3 made it so that -g1 was sufficient for breakpad's symbol purposes, and -g1 also solves buildability for developers, but reduces their debugging capabilities in the process. Debug fission hasn't yet been proven to work on android AFAIK and I'm not sure what the current state of it is, so yes, I think we should pursue that, but I don't want to (or have time to) do it myself :)
,
Sep 28 2016
re: switching to clang. The bug for tracking that is bug 481675 , and it doesn't look like there's been any activity in some time. We probably need to bump that and get migrating actively on someone's plate, especially since CrOS is also working on switching over.
,
Sep 28 2016
for #12. I think the only thing you need for fission is the correct binutils. The android binutils is the same for ChromeOS. So, it *should* work on Android. regarding switch to Clang/LLVM, we are starting to dig deeper on the performance issues while buiding Chrome browser. So, we may be able to collaborate there.
,
Sep 29 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/090bc3fbad2a085f1eaffa27921072e769729377 commit 090bc3fbad2a085f1eaffa27921072e769729377 Author: torne <torne@chromium.org> Date: Thu Sep 29 17:29:11 2016 Assert that the chosen Android config is safe. The default Android release build configuration fails to link on 32-bit devices, because the combination of a 32-bit target CPU, a non-component build, and symbol_level=2 produces a binary that's >4GiB. Assert that at least one of these conditions doesn't apply at gn time, to prevent people being surprised by the inevitable link failure. BUG=648948 Review-Url: https://codereview.chromium.org/2377013002 Cr-Commit-Position: refs/heads/master@{#421860} [modify] https://crrev.com/090bc3fbad2a085f1eaffa27921072e769729377/build/config/compiler/compiler.gni
,
Sep 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/679409e800a173ef7830ba8781dfa83506c59ccf commit 679409e800a173ef7830ba8781dfa83506c59ccf Author: machenbach <machenbach@chromium.org> Date: Fri Sep 30 11:58:58 2016 [build] Use same symbol level as chromium for android This makes our configuration similar to Chromium's for android performance testing. This blocks deps'ing in: https://codereview.chromium.org/2377013002 BUG=chromium:648948 NOTRY=true Review-Url: https://codereview.chromium.org/2383743002 Cr-Commit-Position: refs/heads/master@{#39915} [modify] https://crrev.com/679409e800a173ef7830ba8781dfa83506c59ccf/infra/mb/mb_config.pyl
,
Oct 5 2016
FYI, https://uberchromegw.corp.google.com/i/internal.client.clank/builders/clang-release-builder fails with the new assertion about symbols>=2 on a non-component non-64-bit Android build. The builder was green before so I assume the assertion doesn't take some condition into an account? Should it only check for debug builds?
,
Oct 5 2016
No, it doesn't have anything to do with debug vs release. I assume clang's output is simply smaller. How much smaller is it? If it's reasonably close to 4GiB then we should just assume it will stop building soon and reconfigure the builder to do something else. If it's actually significantly less for some real reason, then we could exclude clang, but that doesn't seem likely to me.
,
Oct 5 2016
The last successful build from that builder produced binaries for chrome and webview that are not even over 1GB, but appear to contain full debug info. Huhhh. So either clang can represent the same debugging info in ~20% of the space, or it's producing less info and I'm just not spotting the difference, or something else is weird.
,
Oct 5 2016
,
Oct 5 2016
Just FYI, clang was written/designed to be much more efficient in it's debug information, so it is known that Clang's debug information is significantly smaller than GCC's...GCC had some efficiencies/redundancies in it's debug information that Clang was careful to avoid. Clang's debug information is also not quite as complete as GCC's but for most debugging purposes it is good enough.
,
Oct 6 2016
OK; I'll amend the assertion so that is_clang is also sufficient, then.
,
Oct 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6d32a26068321e5976494501c51a33636a42c13f commit 6d32a26068321e5976494501c51a33636a42c13f Author: xunjieli <xunjieli@chromium.org> Date: Thu Oct 06 16:58:03 2016 Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni to turn off assertion for Cronet builds. This CL additionally adds is_clang to the assertion per comment in 648948. BUG= 651887 ,648948 Review-Url: https://codereview.chromium.org/2395603003 Cr-Commit-Position: refs/heads/master@{#423564} [modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/build/config/compiler/compiler.gni [modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/components/cronet/tools/cr_cronet.py
,
Oct 6 2016
srhines: clang produces arm binaries that are 12% bigger than gccs (at least for chrome), that's the main blocker.
,
Oct 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fd77254d72956d5e469857b2242fefc7f95fc407 commit fd77254d72956d5e469857b2242fefc7f95fc407 Author: kjellander <kjellander@chromium.org> Date: Wed Oct 26 06:25:50 2016 Move ignore_elf32_limitations into build_overrides/build.gni Having this it will be easy for client projects to override the ignore_elf32_limitations flag that was added in https://codereview.chromium.org/2395603003 by setting it to true in their own build_overrides/build.gni. Then it will still be possible to build with symbol_level=2 as long as the project isn't affected by the 4GB size limit. BUG=648948, webrtc:6596 Review-Url: https://codereview.chromium.org/2448453002 Cr-Commit-Position: refs/heads/master@{#427608} [modify] https://crrev.com/fd77254d72956d5e469857b2242fefc7f95fc407/build/config/compiler/compiler.gni [modify] https://crrev.com/fd77254d72956d5e469857b2242fefc7f95fc407/build_overrides/build.gni
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6d32a26068321e5976494501c51a33636a42c13f commit 6d32a26068321e5976494501c51a33636a42c13f Author: xunjieli <xunjieli@chromium.org> Date: Thu Oct 06 16:58:03 2016 Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni to turn off assertion for Cronet builds. This CL additionally adds is_clang to the assertion per comment in 648948. BUG= 651887 ,648948 Review-Url: https://codereview.chromium.org/2395603003 Cr-Commit-Position: refs/heads/master@{#423564} [modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/build/config/compiler/compiler.gni [modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/components/cronet/tools/cr_cronet.py
,
Nov 4 2016
[Automated comment] removing mislabelled merge-merged-2840
,
Nov 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c4ffc5d95c07e6777d4c57dc086d1b681a830a34 commit c4ffc5d95c07e6777d4c57dc086d1b681a830a34 Author: agrieve <agrieve@chromium.org> Date: Tue Nov 08 15:32:39 2016 Android: Default symbol_level=1 for GN configs where 2 breaks things BUG=648948 Review-Url: https://codereview.chromium.org/2481523002 Cr-Commit-Position: refs/heads/master@{#430618} [modify] https://crrev.com/c4ffc5d95c07e6777d4c57dc086d1b681a830a34/build/config/compiler/compiler.gni
,
Feb 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7be003ae95486a55fc2a9ef5fb34e26f702fc241 commit 7be003ae95486a55fc2a9ef5fb34e26f702fc241 Author: thakis <thakis@chromium.org> Date: Thu Feb 09 16:43:41 2017 android: Default symbol_level to 1 in 32-bit builds for clang too clang finally hit the same file size wall that gcc hit a while ago. (Locally, building lib_android_webview_unittests__library.so goes from 6.5min to 3.8min on my linux box and file size of the .so goes from over 4.1GB to 320MB -- at the cost of no detailed debug info of coarse. Feels like investigating fission or similar would probably be beneficial for the android build.) BUG=648948, 685259 Review-Url: https://codereview.chromium.org/2685033002 Cr-Commit-Position: refs/heads/master@{#449314} [modify] https://crrev.com/7be003ae95486a55fc2a9ef5fb34e26f702fc241/build/config/compiler/compiler.gni
,
Feb 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f54eb60f2783c67c77808d7e0aa3e9d39f8c600a commit f54eb60f2783c67c77808d7e0aa3e9d39f8c600a Author: thakis <thakis@chromium.org> Date: Fri Feb 10 19:03:51 2017 Address comment from https://codereview.chromium.org/2685033002/ BUG=648948, 68525 TBR=torne Review-Url: https://codereview.chromium.org/2682173006 Cr-Commit-Position: refs/heads/master@{#449679} [modify] https://crrev.com/f54eb60f2783c67c77808d7e0aa3e9d39f8c600a/build/config/compiler/compiler.gni
,
Feb 23 2017
Does anyone know if remove_webcore_debug_symbols = true keeps the .so under 4GB?
,
Feb 24 2017
To answer my own question, chrome links fine with symbol_level = 2 if webcore symbols are removed. My GN args: target_os = "android" is_debug = false is_component_build = false enable_nacl = false ffmpeg_branding = "Chrome" proprietary_codecs = true use_goma = true enable_profiling = true remove_webcore_debug_symbols = true symbol_level = 2 ignore_elf32_limitations = true Size of lib.unstripped/libchrome.so: 2840028620 (2.64 GiB)
,
Feb 24 2017
You could amend the assert/comment/error message/etc to reflect this option if you think it's useful, then. I still generally think that symbol_level=1 is perfectly acceptable for almost all people, though - the impact on even interactive debugging seems to be pretty minimal, and the performance gain from building this way is worth it all by itself to me :)
,
Mar 9 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Sep 8
No longer on the Chrome team, e-mail me @google.com if any attention still required from me here, otherwise good luck! |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by dpranke@chromium.org
, Sep 28 2016