supersize: Figure out how to break down ** merge strings |
||
Issue description
As of now, ** merge strings is about 3MB within the binary.
These come from inline string literals (e.g. SomeFunc("msg")).
The easiest way to improve attribution is by converting strings to proper symbols by assigning each literal to a variable. While this is obviously unreasonable, it's probably doable at least for generated code.
Beyond that, I'm hopeful that we could improve our attribution by using the per-object-file nm output (or perhaps some other per-object-file command is needed?). Looking per-file would cause us to over count identical strings, but I'd guess the number of strings that actually get merged when linked to be < 5% (a guess, but certainly we'll see)
,
Jul 20 2017
Actually, it looks like using const char[] makes strings not participate in merging: https://godbolt.org/g/BnUeH3 #include <iostream> const char foo[] = "asdf"; const char* const bar = "asdf2"; const char foo2[] = "asdf"; const char* const bar2 = "asdf2"; int square(int num) { std::cout << foo << bar << foo2 << bar2; } .LC0: .string "asdf2" square(int): sub rsp, 8 mov esi, OFFSET FLAT:foo mov edi, OFFSET FLAT:std::cout call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) mov esi, OFFSET FLAT:.LC0 mov rdi, rax call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) mov esi, OFFSET FLAT:foo2 mov rdi, rax call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) mov esi, OFFSET FLAT:.LC0 mov rdi, rax call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) pop rax ret _GLOBAL__sub_I__Z6squarei: sub rsp, 8 mov edi, OFFSET FLAT:std::__ioinit call std::ios_base::Init::Init() pop rax mov edx, OFFSET FLAT:__dso_handle mov esi, OFFSET FLAT:std::__ioinit mov edi, OFFSET FLAT:std::ios_base::Init::~Init() jmp __cxa_atexit foo2: .string "asdf" foo: .string "asdf"
,
Jul 20 2017
Example commit of giving string literals names in jni_generator.py: https://chromium-review.googlesource.com/578217
,
Jul 20 2017
Interestingly, it seems the same problem does not exist for merged globals. The linker map has an entry with the name .L_MergedGlobals for each object file.
,
Jul 20 2017
Note: #1 and #2 were with GCC. Here's one with clang (but same result): https://godbolt.org/g/mhpmEg
,
Jul 22 2017
How much of .rodata is strings?
objdump -h libmonochrome.so| grep .rodata | awk '{print "dd if=libmonochrome.so of=rodata bs=1 count=$[0x" $3 "] skip=$[0x" $6 "]"}' | bash
strings rodata > rodata.strings
ls -l rodata*
-rw-r----- 1 agrieve eng 6194582 Jul 21 22:42 rodata
-rw-r----- 1 agrieve eng 3836093 Jul 21 22:42 rodata.strings
,
Jul 23 2017
Let's consider these two declarations: const char foo[] = "asdf"; const char* const bar = "asdf2"; The first one requires that the compiler dedicate five bytes to the string. Any uses of the string must load the address of those five bytes which often/typically requires no data accesses until the characters are examined. The second one requires that the compiler dedicate sizeof(void*) bytes to a pointer, plus five bytes to the string. So, the extra space required is typically four or eight bytes. And, and uses of the string must load the pointer from the data segment before later loading the actual characters. This implies an extra cache miss. Now, in some cases the compiler can optimize case two to be as efficient as case one. For instance, if the pointer is tagged as const and if it is not tagged as extern then the compiler knows that it has internal linkage and it can potentially avoid the pointer. However the first case is fundamentally more efficient, and relies less on the optimizer and should be preferred. If there is a belief that case two is more efficient then I would be highly skeptical. Such a claim would need to be tested across multiple compilers. In particular, the extra level of indirection that case two implies will be unavoidable on any string pointers with external linkage - that is any that are non-const or are tagged with extern. > Actually, it looks like using const char[] makes strings not participate in merging This is probably because each array is supposed to have its own address, and merging would prevent that, therefore merging is not allowed. There may be a compiler switch to override that - I don't know. But, I'm not sure that optimizing for string merging is ideal, unless you know that strings are duplicated. It is interesting that non-extern const pointers seem to be handled as efficiently by some compilers as arrays. We would need to test across multiple compilers and architectures before counting on that. Even then I worry that the guidance is too complex, given that non-const pointers are less efficient.
,
Oct 3 2017
Plan of action: When running nm on *.o, also look for strings: E.g.: nm -p out/Debug/components/autofill/core/browser/browser/state_names.o 00000000 r .L.str 00000008 r .L.str.1 00000066 r .L.str.10 0000000b r .L__func__.evtag_init ... The __fun__ ones come when the code uses "__func__" Running: readelf -S out/Debug/components/autofill/core/browser/browser/state_names.o Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al ... [17] .rodata.str1.1 PROGBITS 00000000 00036c 0001f9 01 AMS 0 0 1 ... So, looking at the bytes within the .o at 00036c + 00000008 gives the string for .L.str.1 Note: For .a files, readelf -S does similar to nm, and outputs for each .o within separated with a header: File: ./base/third_party/libevent/libevent.a(event.o) There are 149 section headers, starting at offset 0x50b4: Note2: There can be multiple .rodata sections. E.g.: [295] .rodata.str1.1 PROGBITS 00000000 00290c 0005ab 01 AMS 0 0 1 [296] .rodata.str1.8 PROGBITS 00000000 002eb8 000009 01 AMS 0 0 8 With this information, we can create a list of strings for each .o file. There are now two approaches we could take: 1) Don't bother trying to figure out the final address within the .so. * Instead, create a symbol for each string with the correct size. * Use a map to de-dupe identical strings so as to not double-count merged strings (by too much anyways) 2) Each subprocess will scan the .so file's **merge strings data to find their string * Record the actual address & for each string In either case, symbol aliases should be created for merged strings. I suspect #2 is feasible, so I'll attempt that first.
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/92655b27a8dac362d27abf66ec9724ffd8a703b1 commit 92655b27a8dac362d27abf66ec9724ffd8a703b1 Author: Andrew Grieve <agrieve@chromium.org> Date: Wed Oct 18 20:12:30 2017 Supersize: Break ** merge strings symbols into individual literals * nm.py now uses multiprocessing.Pipe() and os.fork() rather than subprocess.Popen(). * nm.py now directly executable (for testing its logic on single .o/.a files). * Adds "ReadStringLiterals()" console function for dumping values of literals. * "supersize archive" now runs in 40s on my machine, up from 25s with --no-string-literals * libmonochrome.apk.size file now 9.67mb, up from 9.30mb Stats for libmonochrome.so: Section .rodata: has 100.0% of 6184870 bytes accounted for from 172885 symbols. 0 bytes are unaccounted for. * Padding accounts for 19684 bytes (0.3%) * Contains 156446 string literals. Total size=2888602, padding=13275 * Contains 73269 aliases, mapped to 17774 unique addresses (308060 bytes) * 550 symbols have shared ownership (285821 bytes) Comparing .rodata symbols with & without string literals, these two paths now show in the top 20: 15) 1879143 (30.4%) r@Group 45520 ui/gl/gl_bindings_autogen_gl.cc (count=1) 16) 1923623 (31.1%) r@Group 44480 components/autofill/core/browser/address_rewriter_rules.cc (count=2) It works! Bug: 746761 Change-Id: I8aee0cafffc8f2941ea0ff7dd4e5f825b8165bc0 Reviewed-on: https://chromium-review.googlesource.com/721465 Commit-Queue: agrieve <agrieve@chromium.org> Reviewed-by: Samuel Huang <huangs@chromium.org> Cr-Commit-Position: refs/heads/master@{#509849} [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/archive.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/concurrent.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/concurrent_test.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/console.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/describe.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/integration_test.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/models.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/nm.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Archive.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Archive_Elf.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Archive_OutputDirectory.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Console.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Csv.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Diff_Basic.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/Diff_NullDiff.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/FullDescription.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/SymbolGroupMethods.golden [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/mock_output_directory/elf [add] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/mock_output_directory/obj/base/base/page_allocator.o [add] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/mock_output_directory/obj/third_party/icu/icuuc/ucnv_ext.o [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/mock_toolchain/mock_nm.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/mock_toolchain/mock_readelf.py [modify] https://crrev.com/92655b27a8dac362d27abf66ec9724ffd8a703b1/tools/binary_size/libsupersize/testdata/test.map
,
Oct 18 2017
,
Oct 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/18bc8502b61bc45f17b1af4a2bead3e292359745 commit 18bc8502b61bc45f17b1af4a2bead3e292359745 Author: Andrew Grieve <agrieve@chromium.org> Date: Tue Oct 24 03:22:03 2017 Supersize: Make Diff() smarter about aliases and string literals. Before, adding a single string literal to a file would cause diffs to say that all following literals have changed sizes. Now, it agressively looks for symbols of the same name+size to match up, before falling back on matching up symbols of different sizes. This also changes the clustering of .symbols for diffs to group by path aliases. It's useful to have path aliases ungrouped when querying by path, but for symbol diffs, their low PSS cause inline symbols to appear as many small symbols, rather than one symbol that just happened to be defined in a header. Grouping by aliases of the same name fixes this for diffs by having these symbols merged back into a single group. Bug: 746761 , 776032 Change-Id: Id871b5cc1e3aeb6d86d9e864088633a8d5a2f6fd Reviewed-on: https://chromium-review.googlesource.com/731187 Commit-Queue: agrieve <agrieve@chromium.org> Reviewed-by: Samuel Huang <huangs@chromium.org> Cr-Commit-Position: refs/heads/master@{#511028} [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/console.py [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/describe.py [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/diff.py [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/models.py [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/testdata/Console.golden [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/testdata/Diff_Basic.golden [modify] https://crrev.com/18bc8502b61bc45f17b1af4a2bead3e292359745/tools/binary_size/libsupersize/testdata/Diff_NullDiff.golden |
||
►
Sign in to add a comment |
||
Comment 1 by agrieve@chromium.org
, Jul 20 2017