SuperSize: Not properly recognizing v8 embedded builtins (inline assembly using exported labels) |
||||
Issue descriptionYou can see what these look like in code here: https://cs.chromium.org/chromium/src/out/Debug/gen/v8/embedded.cc The linker map file shows: Example from v8: de56c0 de56c0 11e900 32 thinlto-cache/llvmcache-5573B54B4EDC1E7F3AF7D65CC252AD2E88FA98CA:(.text) de56c0 de56c0 0 1 v8_Default_embedded_blob_ de8360 de8360 0 1 Builtins_RecordWrite de8680 de8680 0 1 Builtins_AdaptorWithExitFrame de86c0 de86c0 0 1 Builtins_AdaptorWithBuiltinExitFrame de8700 de8700 0 1 Builtins_ArgumentsAdaptorTrampoline de8800 de8800 0 1 Builtins_CallFunction_ReceiverIsNullOrUndefined ... So, it does look like we could use the start/end addresses to create individual symbols for all of these (and we should).
,
Nov 15
,
Nov 28
,
Dec 5
Oops the following commit should refer to this bug, rather than 891576: ******** The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/76da8c7057d89bee93be5828790eb27119d2e056 commit 76da8c7057d89bee93be5828790eb27119d2e056 Author: Samuel Huang <huangs@chromium.org> Date: Tue Dec 04 19:38:07 2018 [Supersize] Simplify linker_map_parser.py by removing _SymbolMaker. MapFileParserLld.Parse() traverses 3 indent levels: * Level 1 with indent 0: Sections names. * Level 2 with indent 8: Object path (may be '<internal>') and .o section name. * Level 3 with indent 16: Symbol names. Previously _SymbolMaker manages LLD .map file parsing by providing the following services: * Maintain a 'current symbol' that's modified when new data (perhaps at a deeper layer) are found. * Flush(): On detecting new symbol, stores the old 'current symbol' to to start anew (perhaps on emerging to a shallower layer). * Create(): Creates new 'current symbol'. Unfortunately, the code assumes that each contiguous Level 3 block corresponds to only one symbol. But to fix Bug 891576, we will need to break this assumption, and _SymbolMaker gets in the way! This CL remove _SymbolMaker without changing Supersize output. It turns out that since Create() is only called once (in Level 2), we don't really need Flush(). The code can be simplified by keeping a |syms| list, and directly modifying |syms[-1]|. Also add --dump argument to linker_map_parser.py's stand-alone parsing mode, to ensure identical results before and after this CL. Bug: 891576 Change-Id: I34b879c7e816049ec55f90d30f84213f6eb1acf9 Reviewed-on: https://chromium-review.googlesource.com/c/1361591 Reviewed-by: Samuel Huang <huangs@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Commit-Queue: Samuel Huang <huangs@chromium.org> Cr-Commit-Position: refs/heads/master@{#613647} [modify] https://crrev.com/76da8c7057d89bee93be5828790eb27119d2e056/tools/binary_size/libsupersize/linker_map_parser.py ========
,
Dec 6
The canonical case that we'd like to handle, as stated above (but for the case when .o file is available), is 83c640 83c640 13aee0 32 obj/v8/v8_external_snapshot/embedded.o:(.text) 83c640 83c640 0 1 v8_Default_embedded_blob_data_ 83f4a0 83f4a0 0 1 Builtins_RecordWrite 83f7c0 83f7c0 0 1 Builtins_AdaptorWithExitFrame 83f800 ... ... 9772a0 9772a0 0 1 Builtins_ResumeGeneratorExtraWideHandler 9773e0 9773e0 0 1 Builtins_IncBlockCounterExtraWideHandler 9774a0 9774a0 0 1 Builtins_AbortExtraWideHandler 977520 977520 18 16 obj/third_party/blink/renderer/platform/heap/asm/asm/SaveRegisters_arm.o:(.text) The emitted symbols (from object_path=obj/v8/v8_external_snapshot/embedded.o) should be: text@83c640(size_without_padding=(83f4a0-83c640), full_name=v8_Default_embedded_blob_data_, ...) text@83f4a0(size_without_padding=(83f7c0-83f4a0), full_name=Builtins_RecordWrite, ...) text@83f7c0(size_without_padding=(83f800-83f7c0), full_name=Builtins_AdaptorWithExitFrame, ...) ... text@9772a0(size_without_padding=(9773e0-9772a0), full_name=Builtins_ResumeGeneratorExtraWideHandler, ...) text@9773e0(size_without_padding=(9774a0-9773e0), full_name=Builtins_IncBlockCounterExtraWideHandler, ...) text@9774a0(size_without_padding=(977520-9774a0), full_name=Builtins_AbortExtraWideHandler, ...) The size is obtained from the "span" of each input row, i.e., difference from the *next* address to the current address. However, quite a few weird cases exist (will expand on these).
,
Dec 6
Weird case:
VMA LMA Size Align Out In Symbol
...
99cad8 99cad8 fc 8 ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sinh.o):(.text.sinh)
99cad8 99cad8 0 1 $t
99cad9 99cad9 fc 1 sinhl
99cad9 99cad9 fc 1 sinh
99cbb8 99cbb8 0 1 $d
99cbd4 99cbd4 168 4 ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sqrt.o):(.text.sqrt)
Highlights:
* sinhl() starts from 99cad9, which is +1 from 99cad8.
* sinhl() and sinh() have the same address.
* '$d' also pop up in arbitrary places.
Analysis:
* The +1 comes from the linker convention of giving ARM (arm32) symbols even addresses, and Thumb symbols an odd value. Source:
https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf
(See 4.5.3 Symbol Values)
Since Thumb symbols are 2-byte aligned, the address is obtained by removing the LSB (least significant bit).
* By inspection, it seems the .map uses "$t" (size = 0) to annotate this; and "$a" (size = 0) to annotate arm32 symbols.
* sinhl() and sinh() are aliases, and for Supersize it makes sense to create both symbols.
* However, downstream nm can discover this as well! So for simplicity, for linker_map_parsing we should output just one. We choose to take the last one (sinh()).
* The desired emitted symbols from this are therefore:
text@99cad8(size_without_padding=0xfc, full_name=sinh, object_path=../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sinh.o))
^ Note: 8, not 9.
Implementation plan:
* Goals:
* Somehow compute "span" of each relevant input line, obtained by difference of a later address with the current address.
* Recognize "$t" and "$a" annotations, to adjust for Thumb2 accordingly (more robust then blindly removing the LSB).
* Denoise "$d", which messes with "span". For example, sinh() has length 0xFC, and 0x99CAD8 + 0xFC = 0x99CBD4, as seen in the last line. But if we naively consider the $d line, then we'd have the incorrect "span" of (0x99CBB8 - 0x99CAD8) = 0xE0, which is too small. Worse, $d might appear at the same address of the symbol, and resulting in a "span" of 0!
* Proposal:
* In MapFileParserLld, add a generator that filters lines from the .map file (with possible look-ahead for "span"), and yields parsed components:
* address (if $t, properly adjusted)
* size
* "span": Computed by looking ahead, *and* skipping annotations like {$t, $a, $d}.
* level: Depends on the indent: 0 => 1, 8 => 2, 16 => 3, with:
* Level 1: For "Out" column: Section names, e.g., .text
* Level 2: For "In" column: Object path (may be '<internal>') and .o section name.
* Level 3: For "Symbol" column: Actual symbol name.
* tok: The text after the indent.
* Annotations, which have size 0, have prefix "$", and may have ".#", e.g., "$a.6", "$t.21", should affect parsing, but not be emitted explicitly.
* Now MapFileParserLld.Parse() can deal with high-level details. There will still be some complexity regarding dynamics relating to size, span, and level -- more on this later.
,
Dec 6
Addendum to comment #6: * sinhl() would have "span" = 0, whereas sinh() has "span" = 0x99CBD4 - (0x99CAD9 - 1) = 0xFC (after skipping $d). Therefore "span" allows us to skip aliases, and just emit the last symbol (recall that nm will take care of filling in the aliases).
,
Dec 6
Weird case:
VMA LMA Size Align Out In Symbol
...
97c834 97c834 4dc 4 obj/third_party/libvpx/libvpx_assembly_arm.a(libvpx_assembly_arm/vpx_convolve8_avg_horiz_filter_type1_neon.asm.o):(.text)
97c834 97c834 0 1 $a.0
97c834 97c834 0 1 _vpx_convolve8_avg_horiz_filter_type1_neon
97c834 97c834 4dc 1 vpx_convolve8_avg_horiz_filter_type1_neon
97c848 97c848 0 1 start_loop_count
97c8d0 97c8d0 0 1 outer_loop8_residual
97c8f0 97c8f0 0 1 outer_loop_8
97c900 97c900 0 1 inner_loop_8
97c9c8 97c9c8 0 1 end_inner_loop_8
97c9e4 97c9e4 0 1 end_loops
97c9e8 97c9e8 0 1 outer_loop_16
97ca54 97ca54 0 1 inner_loop_16
97cbcc 97cbcc 0 1 epilog_16
97cc08 97cc08 0 1 end_loops1
97cc0c 97cc0c 0 1 outer_loop4_residual
97cc2c 97cc2c 0 1 outer_loop_4
97cc3c 97cc3c 0 1 inner_loop_4
97ccf8 97ccf8 0 1 end_inner_loop_4
97cd08 97cd08 0 1 end_func
Highlights:
* _vpx_convolve8_avg_horiz_filter_type1_neon() has "span" of 0, so it's correctly ignored.
* start_loop_count, outer_loop8_residual, etc. look like the cases we want to handle in comment #5! However, as seen from the source .asm file:
https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vpx_dsp/arm/vpx_convolve8_avg_horiz_filter_type1_neon.asm
These are just labels, and should NOT be emitted as symbols.
* So the correct symbols to emit are:
text@97c834(size_without_padding=0x4dc, full_name=vpx_convolve8_avg_horiz_filter_type1_neon, object_path=obj/third_party/libvpx/libvpx_assembly_arm.a)
* However, note that vpx_convolve8_avg_horiz_filter_type1_neon() has size = 0x4DC, which distinguishes it from the other stuff.
Implementation Plan:
* The size field should be used: So once vpx_convolve8_avg_horiz_filter_type1_neon() is extracted, it "shadows" all subsequent Level 3 toks!
* So the plan from comment #5 (if size = 0 then extract toks with "span" != 0 as symbols, with size equals to their "span") still works, but are overrode by "shadowing" above.
,
Dec 6
Weird case:
VMA LMA Size Align Out In Symbol
4d7920 4d7920 266c 4 <internal>:(.rodata)
4d9f90 4d9f90 18 16 ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libc++_static.a(locale.o):(.rodata._ZTSNSt6__ndk110ctype_baseE)
4d9f90 4d9f90 0 1 $d
...
7218ba 7218ba 127 1 thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.90)
7219e1 7219e1 5c 1 thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.93)
721a3d 721a3d 28 1 thinlto-cache/Thin-e99da7.tmp.o:(.rodata._ZN24certificate_transparency5prefs16kCTRequiredHostsE)
721a3d 721a3d 28 1 certificate_transparency::prefs::kCTRequiredHosts
Highlights:
* Level 2 lines need not be followed by a Level 3 line. Examples above are:
4d7920 4d7920 266c 4 <internal>:(.rodata)
7218ba 7218ba 127 1 thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.90)
7219e1 7219e1 5c 1 thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.93)
* We already handle <internal> as a special case. In fact we expect it to not have Level 3.
* The other case seems to always have .rodata..L.str.#. However, the converse is not true (i.e., symbol can have .rodata..L.str.#, but still have Level 3).
Implementation Plan:
* There are 2 types of symbols:
* Level 2 symbols, which uses data from Level 2, but full_name is not from Level 3 tok.
* <internal> is already handled:
* Inside .rodata: full_name := '** lld merge strings'
* Else full_name := '** ' + mangled name.
* Level 3 symbols, which takes full_name as tok from parsing Level 3 lines, and also has address and size assigned.
* Gets object_path from Level 2.
* Complexity: On encountering a Level 2 line, we create a Level 2 symbol.
* If <internal> then we assign everything and move on (assumes no Level 3 line follows).
* Otherwise full_name is left blank. If another Level 2 line is found, it's left as that.
* Once Level 3 line is encountered:
* The Level 2 line is now populated, and becoming a Level 3 line!
* Subsequent symbols for the current run of Level 3 lines are created as Level 3 symbols.
,
Dec 6
Addendum to comment #9: The symbols created from the data shown should be: .rodata@4d7920(size_without_padding=0x266c, full_name=** lld merge strings, object_path=, ...) .rodata@7218ba(size_without_padding=0x127, full_name=, object_path=, ...) .rodata@7219e1(size_without_padding=0x5c, full_name=, object_path=, ...) Note that there's no object_path, due to thinlto-cache/ usage. (Also, I forgot to add '.' before text@ in the previous examples..)
,
Dec 7
Weird case:
VMA LMA Size Align Out In Symbol
...
831000 831000 a28 32 obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o:(.text)
831004 831004 0 1 $d.0
831040 831040 0 1 $a.1
831040 831040 0 1 .Lenc_key
831040 831040 218 1 aes_hw_set_encrypt_key
8310b0 8310b0 0 1 $d.2
8310b4 8310b4 0 1 $a.3
8310f0 8310f0 0 1 $d.4
8310f4 8310f4 0 1 $a.5
831124 831124 0 1 $d.6
831128 831128 0 1 $a.7
831180 831180 0 1 $d.8
831184 831184 0 1 $a.9
8311f0 8311f0 0 1 $d.10
8311f4 8311f4 0 1 $a.11
831228 831228 0 1 $d.12
83122c 83122c 0 1 $a.13
831260 831260 60 1 aes_hw_set_decrypt_key
831294 831294 0 1 $d.14
83129c 83129c 0 1 $a.15
...
Highlights:
* .Lenc_key is a label from the assembler code (e.g., aesv8-armx32.S). But since it has "span" = 0, so it gets ignored.
* aes_hw_set_encrypt_key() has address = 831040 -- this is different from Level 2 address of 831000!
Analysis:
* The extra bytes that appear before aes_hw_set_encrypt_key() arises from static data added before the function:
https://cs.chromium.org/chromium/src/third_party/boringssl/linux-arm/crypto/fipsmodule/aesv8-armx32.S?rcl=fbe75ef510a9d9123f6c7f8cbd54d5f5b5680277&l=35
* Also verified in the binary, the bytes are:
.long 0x01,0x01,0x01,0x01
.long 0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d @ rotate-n-splat
.long 0x1b,0x1b,0x1b,0x1b
Implementation plans:
* Note that $d.0 hints that there is some data. But it would be odd if we rely on it, since we ignore other $d instances
* Perhaps we can just bundle this into the Level 2 symbol. Then when aes_hw_set_encrypt_key() is created as a Level 3 symbol, it becomes stand-alone, rather than modifying the Level 2 symbols (which happens if the first Level 3 symbol has the same address as the Level 2 symbol).
* So created symbols should be:
.text@831000(size_without_padding=0x40, full_name=, object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)
.text@831040(size_without_padding=0x218, full_name=aes_hw_set_encrypt_key, object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)
.text@831260(size_without_padding=0x60, full_name=aes_hw_set_decrypt_key,object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)
,
Dec 7
Weird case:
1401788 1401788 a4 4 thinlto-cache/Thin-826b9f.tmp.o:(.text._ZNK4$_21clEP10FamilyDataPKcPS3_)
1401788 1401788 0 1 $t.78
1401789 1401789 a4 1 $_21::operator()(FamilyData*, char const*, char const**) const
1401818 1401818 0 1 $d.79
Highlights:
* We have a (Thumb22) symbol that starts with '$'. So the created symbol should be:
.text@1401788(size_without_padding=0xA4, full_name=$_21::operator()(FamilyData*, char const*, char const**) const, object_path=, ...)
Analysis:
* This weird case only appears twice (the other is $_21::__invoke(FamilyData*, char const*, char const**))
* '$a', '$d', and '$t' seems to be the only annotations there are. Command to verify:
$ cat libchrome.so.map | grep ' \$' | cut -c50- | sed s/[0-9]\\+$//g | sort -u
$_21::__invoke(FamilyData*, char const*, char const**)
$_21::operator()(FamilyData*, char const*, char const**) const
$a
$a.
$d
$d.
$t
$t.
Implementation plans:
* In the data generator when determining whether |tok| is an annotation, we should first check for tok.startswith('$'), then also restrict to length == 2 or (length >= 3, tok[2] == '.').
,
Dec 7
For the case in #11 - would it be possible to omit the outer symbol such that the 40 bytes gets picked up as padding for aes_hw_set_encrypt_key?
,
Dec 11
Yeah that's possible. The disadvantage is that the 40 bytes are not padding, so technically it would be incorrect. However, an advantage would be that it might simplify the code re. Level 2 symbols: * Obvious Level 2 symbols like '<internal>' can just be handled directly (so no Level 3 lines are assumed; maybe display warnings if found). * The logic to demote Level 2 symbols to Level 3 can be removed. Not sure how much simplification there will be though; will worry about that after some code cleanup to see.
,
Dec 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4cb24b816ce84e34e37fc456bbaeea3d258d9015 commit 4cb24b816ce84e34e37fc456bbaeea3d258d9015 Author: Samuel Huang <huangs@chromium.org> Date: Thu Dec 13 01:38:40 2018 [Supersize] Add linker_map_parser_test.py and test_util.py. We plan to update linker_map_parser.py to extract more Level 3 symbols from .map files. As seen in crbug.com/892648 , there are many cases that need to be handled. This CL adds linker_map_parser_test.py, which takes the identified issues from Android Chrome's .map file as test cases. Details: * Currently focus on ARM (32-bit) builds using LLD with ThinLTO. * Similar to integration_test.py, the '--update' can be specified to overwrite "golden" output files with current output. * Add test_util.py to factor common code involving usage and management of "golden" files. * The initial Parser.golden is obtained from running the test for unmodified linker_map_parser.py. This allows upcoming parser changes to be tracked. Bug: 892648 Change-Id: Ia8a4dbce4c235f2b08df2bb848409caffcedac32 Reviewed-on: https://chromium-review.googlesource.com/c/1373873 Commit-Queue: agrieve <agrieve@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#616152} [modify] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/integration_test.py [add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/linker_map_parser_test.py [add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/test_util.py [add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden [add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/testdata/linker_map_parser/test_lld-lto_v1.map
,
Dec 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/514e34fa47261afc25fd6f732c7d8628df1ca7ba commit 514e34fa47261afc25fd6f732c7d8628df1ca7ba Author: Samuel Huang <huangs@chromium.org> Date: Thu Dec 13 14:33:25 2018 [SuperSize] Fix missing sizes in LinkerMapParserTest.test_Parser(). LinkerMapParserTest.test_Parser() creates an iterator of processed test .map file. Previously this iterator was shared between calls to (1) DetectLinkerNameFromMapFile(), (2) .MapFileParser().Parse(). This causes a bug in the test: (2) requires a fresh iterator, and it explicitly it skips the first line! Consequently, a number of section sizes ({.interp, .ARM.exidx, .dynsyn}) were missing. This CL fixes the test by creating fresh iterators for (1) and (2). Also, the section sizes are sorted to reduce arbitrariness of dict iteration. Bug: 892648 Change-Id: Ib52f17b0446012fe37951c86409c450d59b57e4a Reviewed-on: https://chromium-review.googlesource.com/c/1375610 Commit-Queue: agrieve <agrieve@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#616304} [modify] https://crrev.com/514e34fa47261afc25fd6f732c7d8628df1ca7ba/tools/binary_size/libsupersize/linker_map_parser_test.py [modify] https://crrev.com/514e34fa47261afc25fd6f732c7d8628df1ca7ba/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden
,
Dec 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c957e24190c983d437126399606dd9bf743b873b commit c957e24190c983d437126399606dd9bf743b873b Author: Samuel Huang <huangs@chromium.org> Date: Fri Dec 14 05:56:28 2018 [SuperSize] Refactor MapFileParserLld.Parse() using a filtering generator. We wish to extract more symbols from Level 3 lines from .map files. Some lines have |size == 0| (this can happen for symbols from assembly sources), but we can approximate actual size from |address| difference between successive lines. We call this approximation "span". This CL refactors MapFileParserLld.Parse() by extracting some parsing logic into a new generator MapFileParserLld.Tokenize(). Details: * Tokenize() is a generator that: * Consumes .map file lines and yields parsed fields. * Applies a one-step lookhead, and maintains "next" and "current" tokens from successive lines to compute "span" (some complication exist). * Skips Level 3 annotations ('$a', '$t.123', etc.), which also has |size == 0| but produces no symbols, and so should be ignored for "span" computation. * Detects Thumb2 annotations '$t' and adjusts |address| accordingly. * Add 2 new tests to LinkerMapParserTest: * test_ParseArmAnnotation() for annotation parsing. * test_Tokenize() to dump Tokenize() output, and compare against new Tokenize.golden file. As a refactoring CL, MapFileParserLld output remains unchanged. Actual usage of "span" will be implemented in follow-ups. Bug: 892648 Change-Id: Ia630027b85e45a234101396917ea4626b9dba1bc Reviewed-on: https://chromium-review.googlesource.com/c/1377189 Reviewed-by: Samuel Huang <huangs@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Commit-Queue: Samuel Huang <huangs@chromium.org> Cr-Commit-Position: refs/heads/master@{#616594} [modify] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/linker_map_parser.py [modify] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/linker_map_parser_test.py [add] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/testdata/linker_map_parser/Tokenize.golden
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a0cee45fde3a11fde81a75af6ad469d93af90f0b commit a0cee45fde3a11fde81a75af6ad469d93af90f0b Author: Samuel Huang <huangs@chromium.org> Date: Tue Dec 18 17:04:08 2018 [SuperSize] MapFileParserLld: Extract mutiple symbosl from Level 3 lines. Previously MapFileParserLld.Parse() extracts one symbol per section per object file. In terms of .map file, this means each Level 2 line can produce one symbol only, regardless of the number of nested Level 3 lines it nests. This causes symbols to be missing (e.g., for code from assembly sources), and makes some function appear larger than they are (since they subsumed its siblings). This CL fixes MapFileParserLld.Parse() by allowing multiple symbols to be emitted from Level 3 lines nested under the same Level 2 line. This helps SuperSize recover missing symbols under LLD. Details: * Each Level 2 line initially produces a "partial" symbol, which has an empty |full_name|. * If no Level 3 lines follow then the symbol is left as-is. * On encountering the first (non-annotation) Level 3 line, the symbol adopts the parsed |tok| as |full_name|. * On encountering second+ (non-annotation) Level 3 lines, new symbols are created (copies object path parsed from Level 2). * An exception of the above are Level 2 lines with '<internal>'. These should not nest Level 3 lines anyway. * Special case: If the first Level 3 line has |address| different from the "partial" symbol's, then "partial" symbol has its |address| turncated, and |full_name| left blank. The Level 3 line is then used to create a new symbol. * This handles the rare case where data precedes Level 3 lines. * Level 3 line |span| is used instead of |size| (which can be 0) to decide whether the line should form a symbol. * To handle assembly labels, which create as Level 3 lines but should NOT should form symbols: Each Level 3 symbol supporesses subsequent Level 3 lines whose |address| is covered by the Level 3's address range. * Update some comments in test_lld-lto_v1.map. * Update test output Parser.golden. Bug: 892648 Change-Id: I9b3e06563e5df4fcf83ec899a4ec8274d11e6023 Reviewed-on: https://chromium-review.googlesource.com/c/1378679 Commit-Queue: Samuel Huang <huangs@chromium.org> Reviewed-by: Samuel Huang <huangs@chromium.org> Reviewed-by: agrieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#617535} [modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/archive.py [modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/linker_map_parser.py [modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden [modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/testdata/linker_map_parser/test_lld-lto_v1.map
,
Jan 9
This is fixed I believe. Re-open if I'm wrong. |
||||
►
Sign in to add a comment |
||||
Comment 1 by agrieve@google.com
, Oct 5