New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 648948 link

Starred by 16 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug



Sign in to add a comment

32-bit Android builds fail when linker output is >4GB

Project Member Reported by torne@chromium.org, Sep 21 2016

Issue description

(forked from issue 638485 which is private due to discussing lots of internal infrastructure details)

Summary
=======

Building Android targets for 32-bit architectures using default release build settings causes weird build errors like:
readelf: Error: Reading 0xfce6f7fc bytes extends past end of file for string table
readelf: Error: no .dynamic section in the dynamic segment

The issue is that in a non-component build with the default amount of debug information (symbol_level=2), the shared library produced by the linker is now over 4GiB for some targets. The toolchains for 32-bit architectures produce ELF32 format binaries, which cannot represent file offsets larger than 4GiB due to using 32-bit header fields. The resulting binaries are therefore corrupted, and cannot be processed correctly by other tools that run after the linker.


Workarounds
===========

The immediate workarounds for developer builds are:

1) Use a component build (the default for debug builds) since it produces a number of separate smaller shared libraries instead of one very big one. Works fine, but has the "usual" disadvantages of a component build (less realistic performance/startup time).

2) Set symbol_level = 1 in your GN configuration. This will produce *much* smaller binaries by omitting some of the debug information. You will still be able to generate and symbolise correct stack backtraces from the binary, but if you attempt to debug the binary with gdb, there will be no information about local variables and the detail/accuracy of other interactive debugging information may be reduced. If you don't intend to debug with gdb, it's perfectly safe to set symbol_level=1 without concern (and it will, in fact, decrease disk space usage and linker memory usage, and increase linker performance).

3) Build for a 64-bit architecture instead, which doesn't have a 4GiB file limitation.


Aside - why doesn't this fail during linking?
=============================================

It should actually fail to link, rather than the linker "succeeding" but producing an invalid ELF binary. We filed https://sourceware.org/bugzilla/show_bug.cgi?id=20481 to track this - it's a bug in gold.


Actually solving the problem
============================

Several options:

1) Linux uses debug fission, where the majority of the debug information is not linked into the final binary, but left in separate files that are just referred to by the main binary. Enabling this for android would solve the problem, because it would significantly reduce the size of the binary without losing any information. Two issues:
1a) I don't know if the Android toolchain supports debug fission or not
1b) I don't know if Android debugging tools, stack symbolisation tools, Breakpad, etc actually support debugging properly with debug-fissioned binaries.

2) There appears to be a way to compress the debug sections with -Wa,--compress-debug-sections that results in them being smaller. I don't know a lot about this, and I don't think we use it on any other platforms. Likely has the same two potential issues as 1) - does the toolchain support it and do tools that we use support it.

3) It's possible (in theory) to link the binary as an ELF64 binary, even though it's for a 32-bit architecture. ELF32/ELF64 refer to the *file format* of the binary, not the bit-ness of the code inside the binary, so using ELF64 allows for larger binary files (64-bit header fields in the ELF data structures). Current versions of the Linux kernel will successfully load ELF64 binaries on 32-bit architectures as long as the actual architecture defined in the header is appropriate; older linux kernel versions will not (I'm not sure where this changed). We could convert it back to ELF32 after stripping the debug information? Problem here is again I don't know if the toolchain will support it and whether gdb/etc will get confused by this.


Debug fission is probably the first option to look into as we already use this on Linux.
 
Cc: thakis@chromium.org llozano@chromium.org
Adding +llozano to this bug, as I believe CrOS has hit similar issues and maybe he has some ideas (I know he has experience with debug fission).
Are we hitting the limit now on all of the 32-bit platforms, or just arm?

Comment 3 by torne@chromium.org, Sep 28 2016

I haven't actually checked any of the others, but very few people (or bots) build the other platforms, and last I looked debug x86 binaries generally seemed to be larger than arm, so I am assuming that either they've also hit the limit or will soon. If someone wants to check that'd be somewhat interesting, but not a lot. :)
Cc: srhines@google.com yunlian@chromium.org
- on ChromeOS we are using split debug (fission). We had to fix breakpad and ccache for it. But just recently we ran into issues with the .pdb (combination of .dwo files) being larger than 4 GB. I assume that will not be a problem in Android since it is using llvm and the debug info is smaller than the GCC generated debug info.
- We also tried using -g1, but the crash dumps looked different and ChromeOS crash team did not like this.
- compressing debug sections looked promising but we ran into an issue with the linker. So we did not follow that path.
- We also considered using an ELF64 binary but it seems the tools are not ready to handle incoming ELF32 with output ELF64. We did not explore this much.
- This is a big problem in CHromeOS, so now we are forced to use -femit-struct-debug-reduced. The crash dumps look ok but debuggability within GDB maybe compromised. We are hoping when we migrate to LLVM we don't have to use this option. 
- So, I think you first bet should be fission (split-dwarf)

Comment 5 by torne@chromium.org, Sep 28 2016

Android is not using LLVM, we still build with gcc 4.9.

If you're talking about breakpad when you had a probelm with -g1, then this is because breakpad didn't used to support DWARF4 (see issue 638485 for me discovering that one) and so was inadvertantly depending on some info that's only in -g2 to symbolise things correctly. I have temporarily switched Android to DWARF3 to avoid this, and breakpad output is now not affected by -g1 vs -g2. Someone has, now, fixed breakpad to support DWARF4 so when I have a moment to test it, I'll change it back. So, if that's what you mean, then you should actually be able to use -g1 now if that's useful to you (at least for breakpad purposes, it still harms gdb usage of course).
ok, good to know. we will retry g1...

but, are you still having this problem even after using g1? I thought g1 was way more compact?

Comment 7 by torne@chromium.org, Sep 28 2016

People still want to be able to build with -g2 to debug effectively with gdb. Using -g1 is a short term workaround (the bots are using it, and many developers are too, but it impairs debugging). The point of this bug is to actually solve the problem so that people can build with -g2 again if they want to.
ah, ok, I understand now. 

on ChromeOS we have 2 workflows: 1) for developers, can build images and fully debug. This one uses fission which does not have the 4GB limit and improves link time greatly. Debug info does not need to be delivered anywhere 2) release workflow. debug info needs to be delivered with the image. We use debug fission but then .pdb file is still larger than 4 GB. So, we use -femit-struct-debug-reduced option. 

-femit-struct-debug-reduced is less aggressive than -g1. So , you could use that one temporarily.

Anyway, I think you should try fission next.

Comment 9 by srhines@google.com, Sep 28 2016

re: comment 5 about still building with GCC. Are you looking into switching to Clang now? GCC is EOL for Android NDK, and it would be good to know what else is preventing you from transitioning. Please feel free to file internal bugs. We don't really have great answers for performance differences (those essentially must be handled by partners in upstream LLVM), but we can certainly help with correctness problems.
Sorry, I made a mistake in the name of suffixes. the collected set of .dwo files for fission is called .dwp not .pdb. 

Comment 11 by torne@chromium.org, Sep 28 2016

I'm not really involved with the LLVM switchover but my understanding from last time I heard about it was that the code was still larger and slower than gcc output and we couldn't afford the regression. I don't know to what degree exactly, and it was some time ago, so things may have changed.

Comment 12 by torne@chromium.org, Sep 28 2016

On Android we never ship any debug information with our release at all, so the only two consumers of teh debug info are developers doing local builds and the breakpad symbol server. The switch to DWARF3 made it so that -g1 was sufficient for breakpad's symbol purposes, and -g1 also solves buildability for developers, but reduces their debugging capabilities in the process.

Debug fission hasn't yet been proven to work on android AFAIK and I'm not sure what the current state of it is, so yes, I think we should pursue that, but I don't want to (or have time to) do it myself :)
re: switching to clang. The bug for tracking that is  bug 481675 , and it doesn't look like there's been any activity in some time. We probably need to bump that and get migrating actively on someone's plate, especially since CrOS is also working on switching over.
for #12. I think the only thing you need for fission is the correct binutils. The android binutils is the same for ChromeOS. So, it *should* work on Android.

regarding switch to Clang/LLVM, we are starting to dig deeper on the performance issues while buiding Chrome browser. So, we may be able to collaborate there.
Project Member

Comment 15 by bugdroid1@chromium.org, Sep 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/090bc3fbad2a085f1eaffa27921072e769729377

commit 090bc3fbad2a085f1eaffa27921072e769729377
Author: torne <torne@chromium.org>
Date: Thu Sep 29 17:29:11 2016

Assert that the chosen Android config is safe.

The default Android release build configuration fails to link on 32-bit
devices, because the combination of a 32-bit target CPU, a non-component
build, and symbol_level=2 produces a binary that's >4GiB. Assert that at
least one of these conditions doesn't apply at gn time, to prevent
people being surprised by the inevitable link failure.

BUG=648948

Review-Url: https://codereview.chromium.org/2377013002
Cr-Commit-Position: refs/heads/master@{#421860}

[modify] https://crrev.com/090bc3fbad2a085f1eaffa27921072e769729377/build/config/compiler/compiler.gni

Project Member

Comment 16 by bugdroid1@chromium.org, Sep 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/679409e800a173ef7830ba8781dfa83506c59ccf

commit 679409e800a173ef7830ba8781dfa83506c59ccf
Author: machenbach <machenbach@chromium.org>
Date: Fri Sep 30 11:58:58 2016

[build] Use same symbol level as chromium for android

This makes our configuration similar to Chromium's for
android performance testing.

This blocks deps'ing in:
https://codereview.chromium.org/2377013002

BUG=chromium:648948
NOTRY=true

Review-Url: https://codereview.chromium.org/2383743002
Cr-Commit-Position: refs/heads/master@{#39915}

[modify] https://crrev.com/679409e800a173ef7830ba8781dfa83506c59ccf/infra/mb/mb_config.pyl

FYI, https://uberchromegw.corp.google.com/i/internal.client.clank/builders/clang-release-builder fails with the new assertion about symbols>=2 on a non-component non-64-bit Android build.

The builder was green before so I assume the assertion doesn't take some condition into an account? Should it only check for debug builds?
No, it doesn't have anything to do with debug vs release. I assume clang's output is simply smaller. How much smaller is it? If it's reasonably close to 4GiB then we should just assume it will stop building soon and reconfigure the builder to do something else. If it's actually significantly less for some real reason, then we could exclude clang, but that doesn't seem likely to me.
The last successful build from that builder produced binaries for chrome and webview that are not even over 1GB, but appear to contain full debug info. Huhhh.

So either clang can represent the same debugging info in ~20% of the space, or it's producing less info and I'm just not spotting the difference, or something else is weird.
Cc: -llozano@chromium.org cmt...@chromium.org
Just FYI, clang was written/designed to be much more efficient in it's debug information, so it is known that Clang's debug information is significantly smaller than GCC's...GCC had some efficiencies/redundancies in it's debug information that Clang was careful to avoid.  

Clang's debug information is also not quite as complete as GCC's but for most debugging purposes it is good enough.
OK; I'll amend the assertion so that is_clang is also sufficient, then.
Project Member

Comment 23 by bugdroid1@chromium.org, Oct 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d32a26068321e5976494501c51a33636a42c13f

commit 6d32a26068321e5976494501c51a33636a42c13f
Author: xunjieli <xunjieli@chromium.org>
Date: Thu Oct 06 16:58:03 2016

Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni

Add a ignore_elf32_limitations flag in
build/config/compiler/compiler.gni to turn off
assertion for Cronet builds.

This CL additionally adds is_clang to the
assertion per comment in 648948.

BUG= 651887 ,648948

Review-Url: https://codereview.chromium.org/2395603003
Cr-Commit-Position: refs/heads/master@{#423564}

[modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/build/config/compiler/compiler.gni
[modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/components/cronet/tools/cr_cronet.py

srhines: clang produces arm binaries that are 12% bigger than gccs (at least for chrome), that's the main blocker.
Project Member

Comment 25 by bugdroid1@chromium.org, Oct 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fd77254d72956d5e469857b2242fefc7f95fc407

commit fd77254d72956d5e469857b2242fefc7f95fc407
Author: kjellander <kjellander@chromium.org>
Date: Wed Oct 26 06:25:50 2016

Move ignore_elf32_limitations into build_overrides/build.gni

Having this it will be easy for client projects to override
the ignore_elf32_limitations flag that was added in
https://codereview.chromium.org/2395603003 by setting it to true
in their own build_overrides/build.gni.
Then it will still be possible to build with symbol_level=2 as
long as the project isn't affected by the 4GB size limit.

BUG=648948,  webrtc:6596 

Review-Url: https://codereview.chromium.org/2448453002
Cr-Commit-Position: refs/heads/master@{#427608}

[modify] https://crrev.com/fd77254d72956d5e469857b2242fefc7f95fc407/build/config/compiler/compiler.gni
[modify] https://crrev.com/fd77254d72956d5e469857b2242fefc7f95fc407/build_overrides/build.gni

Project Member

Comment 26 by bugdroid1@chromium.org, Oct 27 2016

Labels: merge-merged-2840
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d32a26068321e5976494501c51a33636a42c13f

commit 6d32a26068321e5976494501c51a33636a42c13f
Author: xunjieli <xunjieli@chromium.org>
Date: Thu Oct 06 16:58:03 2016

Add a ignore_elf32_limitations flag in build/config/compiler/compiler.gni

Add a ignore_elf32_limitations flag in
build/config/compiler/compiler.gni to turn off
assertion for Cronet builds.

This CL additionally adds is_clang to the
assertion per comment in 648948.

BUG= 651887 ,648948

Review-Url: https://codereview.chromium.org/2395603003
Cr-Commit-Position: refs/heads/master@{#423564}

[modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/build/config/compiler/compiler.gni
[modify] https://crrev.com/6d32a26068321e5976494501c51a33636a42c13f/components/cronet/tools/cr_cronet.py

Comment 27 by dimu@google.com, Nov 4 2016

Labels: -merge-merged-2840
[Automated comment] removing mislabelled merge-merged-2840
Project Member

Comment 28 by bugdroid1@chromium.org, Nov 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4ffc5d95c07e6777d4c57dc086d1b681a830a34

commit c4ffc5d95c07e6777d4c57dc086d1b681a830a34
Author: agrieve <agrieve@chromium.org>
Date: Tue Nov 08 15:32:39 2016

Android: Default symbol_level=1 for GN configs where 2 breaks things

BUG=648948

Review-Url: https://codereview.chromium.org/2481523002
Cr-Commit-Position: refs/heads/master@{#430618}

[modify] https://crrev.com/c4ffc5d95c07e6777d4c57dc086d1b681a830a34/build/config/compiler/compiler.gni

Project Member

Comment 29 by bugdroid1@chromium.org, Feb 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7be003ae95486a55fc2a9ef5fb34e26f702fc241

commit 7be003ae95486a55fc2a9ef5fb34e26f702fc241
Author: thakis <thakis@chromium.org>
Date: Thu Feb 09 16:43:41 2017

android: Default symbol_level to 1 in 32-bit builds for clang too

clang finally hit the same file size wall that gcc hit a while ago.

(Locally, building lib_android_webview_unittests__library.so goes
from 6.5min to 3.8min on my linux box and file size of the .so goes
from over 4.1GB to 320MB -- at the cost of no detailed debug info
of coarse. Feels like investigating fission or similar would probably
be beneficial for the android build.)

BUG=648948, 685259 

Review-Url: https://codereview.chromium.org/2685033002
Cr-Commit-Position: refs/heads/master@{#449314}

[modify] https://crrev.com/7be003ae95486a55fc2a9ef5fb34e26f702fc241/build/config/compiler/compiler.gni

Project Member

Comment 30 by bugdroid1@chromium.org, Feb 10 2017

Does anyone know if remove_webcore_debug_symbols = true keeps the .so under 4GB?
To answer my own question, chrome links fine with symbol_level = 2 if webcore symbols are removed.

My GN args:
target_os = "android"
is_debug = false
is_component_build = false
enable_nacl = false
ffmpeg_branding = "Chrome"
proprietary_codecs = true
use_goma = true
enable_profiling = true
remove_webcore_debug_symbols = true
symbol_level = 2
ignore_elf32_limitations = true

Size of lib.unstripped/libchrome.so: 2840028620 (2.64 GiB)

Comment 33 by torne@chromium.org, Feb 24 2017

You could amend the assert/comment/error message/etc to reflect this option if you think it's useful, then.

I still generally think that symbol_level=1 is perfectly acceptable for almost all people, though - the impact on even interactive debugging seems to be pretty minimal, and the performance gain from building this way is worth it all by itself to me :)
Project Member

Comment 34 by sheriffbot@chromium.org, Mar 9 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: -amineer@chromium.org
No longer on the Chrome team, e-mail me @google.com if any attention still required from me here, otherwise good luck!

Sign in to add a comment