ICF: compare savings of icf=safe and icf=all for ChromeOS and use safe if possible |
||||||
Issue descriptionwe are currently using "all" but it maybe better to use safe if the space savings are not significant. There is some missing functionality.
,
Oct 10 2016
I have gathered some numbers on chrome binary sizes (amd64) with or without PIE and with different ICF options: a) pie, icf=all: 97,676 sections folded. chrome binary size (stripped): 165,175,920 bytes (158M). b) pie, icf=safe: 16,494 sections folded. chrome binary size (stripped): 176,345,712 bytes (169M). c) pie, icf=none: 0 sections folded. chrome binary size (stripped): 177,906,288 bytes (170M). d) nopie, icf=all: 97,676 sections folded. chrome binary size (stripped): 151,773,808 bytes (145M). e) nopie, icf=safe: 77,978 sections folded. chrome binary size (stripped): 153,940,592 bytes (147M). f) nopie, icf=none: 0 sections folded. chrome binary size (stripped): 164,504,176 bytes (157M). As we can see, nopie/icf=safe (e) is able to fold many more sections than pie/icf=safe (b). These are cases where in the pie binary, the icf logic isn't able to figure out if a function reference is a function call or if a function pointer is being taken. In such cases, it makes the conservative decision to not fold that function into another one. For nopie, icf=safe is very close to icf=all (147M vs. 145M). For pie, icf=safe is very close to icf=none (169M vs. 170M). If we were to make icf=safe work better with pie (so that it folds roughly the same number of sections that nopie/icf=safe does), we can expect that the final binary size will be closer to (a) instead of (c). i.e. switching from icf=all to icf=safe for pie chrome binary should result in ~20,000 less sections folded, at the cost of ~1-2MB increase in chrome binary size.
,
Oct 13 2016
thanks for the analysis. Good to know we loose very little in terms of code size by going to "safe".
,
Oct 18 2016
,
Dec 17 2016
I have updated numbers below, this time for all three architectures. Also included are the numbers for amd64/icf=safe with pie/nopie after applying pie-safe-icf.patch (attached): amd64 nopie --icf=all: Chrome binary size: 151M. 104,044 sections folded. amd64 nopie --icf=safe: Chrome binary size: 153M. 83,542 sections folded. amd64 nopie --icf=safe: Chrome binary size: 157M. 45,670 sections folded. <<---- With pie-safe-icf.patch. amd64 nopie --icf=none: Chrome binary size: 164M. 0 sections folded. amd64 pie --icf=all: Chrome binary size: 164M. 104,044 sections folded. amd64 pie --icf=safe: Chrome binary size: 176M. 16,798 sections folded. amd64 pie --icf=safe: Chrome binary size: 171M. 45,670 sections folded. <<---- With pie-safe-icf.patch. amd64 pie --icf=none: Chrome binary size: 177M. 0 sections folded. x86 nopie --icf=all: Chrome binary size: 164M. 102,447 sections folded. x86 nopie --icf=safe: Chrome binary size: 170M. 54,219 sections folded. x86 nopie --icf=none: Chrome binary size: 178M. 0 sections folded. x86 pie --icf=all: Chrome binary size: 168M. 102,447 sections folded. x86 pie --icf=safe: Chrome binary size: 174M. 54,219 sections folded. x86 pie --icf=none: Chrome binary size: 183M. 0 sections folded. arm nopie --icf=all: Chrome binary size: 107M. 98,155 sections folded. arm nopie --icf=safe: Chrome binary size: 109M. 56,042 sections folded. arm nopie --icf=none: Chrome binary size: 113M. 0 sections folded. arm pie --icf=all: Chrome binary size: 111M. 98,155 sections folded. arm pie --icf=safe: Chrome binary size: 114M. 56,042 sections folded. arm pie --icf=none: Chrome binary size: 118M. 0 sections folded. Some observations: 1) For both x86 and arm, both pie/nopie fold the same number of sections with --icf=safe. This is to say, --icf=safe works just as well when building PIE binaries as it does when building regular executables. 2) For x86, the penalty for switching to --icf=safe is 6MB (chrome size increases from 168M to 174M). 3) For arm, the penalty for switching to --icf=safe is 3MB (chrome size increases from 111M to 114M). 4) For amd64, --icf=safe folds fewer sections when building PIE binaries than when building regular executables (~17K vs. ~83K). This is the issue the attached patch is attempting to solve. 5) After the patch is applied to binutils/gold, --icf=safe folds the same number of sections for pie/nopie. With this patch, the penalty for switching to --icf=safe is 7MB for amd64 (chrome size increases from 164M to 171M). Without the patch, the increase was to 176M. The patch is saving us 5MB. 6) Comparing just the nopie cases for all architectures, --icf=all folds 104K, 102K, and 98K sections on amd64, x86, and arm respectively. 7) Comparing just the nopie cases for all architectures, --icf=safe folds 83K, 54K, and 56K sections on amd64, x86, and arm respectively. 8) Something is off about the above numbers in (6) and (7). The --icf=all numbers are similar for all three architectures, but the --icf=safe numbers are not. With --icf=safe, much fewer sections get folded for x86 and arm than for amd64. Note that all architectures are building pretty much the same source code. For any function reference in the C/C++ code, whether it is a call to the function, or is saving the pointer to the function, the answer should be the same for all three architectures. We're either being too conservative on x86/arm and treating references as function pointers when they're not (thus resulting in fewer sections folded), or we have a bug for amd64 --icf=safe and are mistakenly treating function pointers as function calls and folding those functions. 9) With the patch, the number of sections folded for amd64 --icf=safe is 45K. This is down from 83K (nopie) without the patch. Comparing with x86/arm, this number should be somewhere in the 50K-60K range. The patch is currently only checking for one relocation type (R_X86_64_PC32) and one instruction opcode (call: 0xe8) to distinguish between a function call and function pointer. I'll continue working on the patch to recognize more relocation types and opcodes as needed to get more sections folded.
,
Dec 17 2016
Something to maybe keep in mind: I hear y'all are thinking about switching to lld. lld only has ICF on/off, no "safe" vs "all" business. So if that switch is in the near future, maybe it's not worth investing much here.
,
Dec 19 2016
I built lld locally and did some experiments with a small testcase. lld supports --icf=all and it is equivalent to gold's --icf=all. There's no equivalent to gold's --icf=safe. We may need to add support for it before the switch to using lld, whenever that is. The list relocation types/opcodes that we need to recognize for enabling amd64 --icf=safe in gold will be directly useful in the lld implementation as well.
,
Dec 19 2016
ruiu (main lld author) prefers to not have multiple ICF modes. If =all doesn't work for you, send him examples of where it doesn't work, and maybe it can be tweaked.
,
Dec 19 2016
One idea was that the compiler would emit a "this function has its address taken" bit, so that the linker doesn't have to reverse that information from relocation types. (This may or may not make lld icf blocked on switching to clang.) So work on relocation types/opcodes will likely not be useful for lld.
,
Dec 19 2016
we should probably split this bug in 2 (ie: have a separate issue for the possible migration to LLD) and keep this one for binutils/gold.
,
Dec 22 2016
Updated patch attached. Updated numbers below (note that I did a sync for chrome sources, so all numbers have changed a bit):
amd64 nopie --icf=all: Chrome binary size: 151M. 104565 sections folded.
amd64 nopie --icf=safe: Chrome binary size: 153M. 83910 sections folded.
amd64 nopie --icf=safe: Chrome binary size: 156M. 55709 sections folded. <== with pie-safe-icf.patch
amd64 nopie --icf=none: Chrome binary size: 164M. 0 sections folded.
amd64 pie --icf=all: Chrome binary size: 164M. 104565 sections folded.
amd64 pie --icf=safe: Chrome binary size: 170M. 55709 sections folded. <== with pie-safe-icf.patch
amd64 pie --icf=safe: Chrome binary size: 176M. 16975 sections folded.
amd64 pie --icf=none: Chrome binary size: 177M. 0 sections folded.
x86 nopie --icf=all: Chrome binary size: 164M. 102955 sections folded.
x86 nopie --icf=safe: Chrome binary size: 171M. 54445 sections folded.
x86 nopie --icf=none: Chrome binary size: 179M. 0 sections folded.
x86 pie --icf=all: Chrome binary size: 169M. 102955 sections folded.
x86 pie --icf=safe: Chrome binary size: 175M. 54445 sections folded.
x86 pie --icf=none: Chrome binary size: 183M. 0 sections folded.
arm nopie --icf=all: Chrome binary size: 107M. 98672 sections folded.
arm nopie --icf=safe: Chrome binary size: 110M. 56252 sections folded.
arm nopie --icf=none: Chrome binary size: 113M. 0 sections folded.
arm pie --icf=all: Chrome binary size: 112M. 98672 sections folded.
arm pie --icf=safe: Chrome binary size: 114M. 56252 sections folded.
arm pie --icf=none: Chrome binary size: 118M. 0 sections folded.
The patch checks if the R_X86_64_PC32 relocation is in a call, jmp, or any of the conditional jump instructions. If not, it is considered to be taking the address of a function.
I found one more relocation that is frequently used for function calls: R_X86_64_PLT32. However, I found no instance of it being used in a non call/jmp instruction, so I didn't add the checking logic for this relocation. It is always assumed to be used for function calls.
Note that with this patch, the number of sections folded on amd64 --icf=safe is 55709 (for both pie/nopie). This is in line with the number of sections folded with --icf=safe on other architectures. The earlier number for amd64/nopie case (83910) was an anomaly. It looks like even when not building a PIE binary, the R_X86_64_PC32 relocation is sometimes used for taking the address of a function.
Also, we finally know the cost for switching chrome build to use --icf=safe instead of --icf=all:
amd64: 6MB
x86: 6MB
arm: 2MB
,
Feb 28 2017
The patch was merged into the upstream binutils tree: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4aebb6312eb5dcd12f2f8420028547584b708907
,
Mar 15 2017
Marking as fixed since the gold linker now does a better job when using --icf=safe with -pie. We decided not to enable this for chromeos-chrome at this time. The file size increase (6MB for amd64) is a too high. Besides, chrome on Linux still links with --icf=all, so both should switch together if we do decide to switch to --icf=safe at a later date.
,
May 30 2017
,
Aug 1 2017
,
Aug 3 2017
Closing. Please reopen it if its not fixed. Thanks! |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by llozano@chromium.org
, Sep 21 2016