New issue
Advanced search Search tips
Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 9
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug

Blocked on:
issue 905329



Sign in to add a comment
link

Issue 892648: SuperSize: Not properly recognizing v8 embedded builtins (inline assembly using exported labels)

Reported by agrieve@chromium.org, Oct 5 Project Member

Issue description

You can see what these look like in code here:
https://cs.chromium.org/chromium/src/out/Debug/gen/v8/embedded.cc

The linker map file shows:

Example from v8:
  de56c0   de56c0   11e900    32         thinlto-cache/llvmcache-5573B54B4EDC1E7F3AF7D65CC252AD2E88FA98CA:(.text)
  de56c0   de56c0        0     1                 v8_Default_embedded_blob_
  de8360   de8360        0     1                 Builtins_RecordWrite
  de8680   de8680        0     1                 Builtins_AdaptorWithExitFrame
  de86c0   de86c0        0     1                 Builtins_AdaptorWithBuiltinExitFrame
  de8700   de8700        0     1                 Builtins_ArgumentsAdaptorTrampoline
  de8800   de8800        0     1                 Builtins_CallFunction_ReceiverIsNullOrUndefined
...



So, it does look like we could use the start/end addresses to create individual symbols for all of these (and we should).
 

Comment 1 by agrieve@google.com, Oct 5

Summary: SuperSize: Not properly recognizing v8 embedded builtins (inline assembly using exported labels) (was: SupersSize: Not properly recognizing v8 embedded builtins (inline assembly using exported labels))

Comment 2 by hua...@chromium.org, Nov 15

Blockedon: 905329

Comment 3 by agrieve@google.com, Nov 28

Labels: -Pri-3 Pri-2
Status: Assigned (was: Untriaged)

Comment 4 by hua...@chromium.org, Dec 5

Oops the following commit should refer to this bug, rather than 891576:

********
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/76da8c7057d89bee93be5828790eb27119d2e056

commit 76da8c7057d89bee93be5828790eb27119d2e056
Author: Samuel Huang <huangs@chromium.org>
Date: Tue Dec 04 19:38:07 2018

[Supersize] Simplify linker_map_parser.py by removing _SymbolMaker.

MapFileParserLld.Parse() traverses 3 indent levels:
* Level 1 with indent 0: Sections names.
* Level 2 with indent 8: Object path (may be '<internal>') and .o
  section name.
* Level 3 with indent 16: Symbol names.

Previously _SymbolMaker manages LLD .map file parsing by providing the
following services:
* Maintain a 'current symbol' that's modified when new data (perhaps
  at a deeper layer) are found.
* Flush(): On detecting new symbol, stores the old 'current symbol' to
  to start anew (perhaps on emerging to a shallower layer).
* Create(): Creates new 'current symbol'.

Unfortunately, the code assumes that each contiguous Level 3 block
corresponds to only one symbol. But to fix Bug 891576, we will need to
break this assumption, and _SymbolMaker gets in the way!

This CL remove _SymbolMaker without changing Supersize output. It turns
out that since Create() is only called once (in Level 2), we don't
really need Flush(). The code can be simplified by keeping a |syms|
list, and directly modifying |syms[-1]|.

Also add --dump argument to linker_map_parser.py's stand-alone parsing
mode, to ensure identical results before and after this CL.

Bug: 891576
Change-Id: I34b879c7e816049ec55f90d30f84213f6eb1acf9
Reviewed-on: https://chromium-review.googlesource.com/c/1361591
Reviewed-by: Samuel Huang <huangs@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Commit-Queue: Samuel Huang <huangs@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613647}
[modify] https://crrev.com/76da8c7057d89bee93be5828790eb27119d2e056/tools/binary_size/libsupersize/linker_map_parser.py
========

Comment 5 by hua...@chromium.org, Dec 6

The canonical case that we'd like to handle, as stated above (but for the case when .o file is available), is

  83c640   83c640   13aee0    32         obj/v8/v8_external_snapshot/embedded.o:(.text)
  83c640   83c640        0     1                 v8_Default_embedded_blob_data_
  83f4a0   83f4a0        0     1                 Builtins_RecordWrite
  83f7c0   83f7c0        0     1                 Builtins_AdaptorWithExitFrame
  83f800   ...
  ...
  9772a0   9772a0        0     1                 Builtins_ResumeGeneratorExtraWideHandler
  9773e0   9773e0        0     1                 Builtins_IncBlockCounterExtraWideHandler
  9774a0   9774a0        0     1                 Builtins_AbortExtraWideHandler
  977520   977520       18    16         obj/third_party/blink/renderer/platform/heap/asm/asm/SaveRegisters_arm.o:(.text)

The emitted symbols (from object_path=obj/v8/v8_external_snapshot/embedded.o) should be:

  text@83c640(size_without_padding=(83f4a0-83c640), full_name=v8_Default_embedded_blob_data_, ...)
  text@83f4a0(size_without_padding=(83f7c0-83f4a0), full_name=Builtins_RecordWrite, ...)
  text@83f7c0(size_without_padding=(83f800-83f7c0), full_name=Builtins_AdaptorWithExitFrame, ...)
  ...
  text@9772a0(size_without_padding=(9773e0-9772a0), full_name=Builtins_ResumeGeneratorExtraWideHandler, ...)
  text@9773e0(size_without_padding=(9774a0-9773e0), full_name=Builtins_IncBlockCounterExtraWideHandler, ...)
  text@9774a0(size_without_padding=(977520-9774a0), full_name=Builtins_AbortExtraWideHandler, ...)

The size is obtained from the "span" of each input row, i.e., difference from the *next* address to the current address. However, quite a few weird cases exist (will expand on these).

Comment 6 by hua...@chromium.org, Dec 6

Weird case:

     VMA      LMA     Size Align Out     In      Symbol
...
  99cad8   99cad8       fc     8         ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sinh.o):(.text.sinh)
  99cad8   99cad8        0     1                 $t
  99cad9   99cad9       fc     1                 sinhl
  99cad9   99cad9       fc     1                 sinh
  99cbb8   99cbb8        0     1                 $d
  99cbd4   99cbd4      168     4         ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sqrt.o):(.text.sqrt)


Highlights:
* sinhl() starts from 99cad9, which is +1 from 99cad8.
* sinhl() and sinh() have the same address.
* '$d' also pop up in arbitrary places.

Analysis:
* The +1 comes from the linker convention of giving ARM (arm32) symbols even addresses, and Thumb symbols an odd value. Source:

  https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf
  (See 4.5.3 Symbol Values)

  Since Thumb symbols are 2-byte aligned, the address is obtained by removing the LSB (least significant bit).

* By inspection, it seems the .map uses "$t" (size = 0) to annotate this; and "$a" (size = 0) to annotate arm32 symbols.
* sinhl() and sinh() are aliases, and for Supersize it makes sense to create both symbols.
  * However, downstream nm can discover this as well!  So for simplicity, for linker_map_parsing we should output just one. We choose to take the last one (sinh()).

* The desired emitted symbols from this are therefore:

  text@99cad8(size_without_padding=0xfc, full_name=sinh, object_path=../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a(e_sinh.o))

            ^ Note: 8, not 9.

Implementation plan:
* Goals:
  * Somehow compute "span" of each relevant input line, obtained by difference of a later address with the current address.
  * Recognize "$t" and "$a" annotations, to adjust for Thumb2 accordingly (more robust then blindly removing the LSB).
  * Denoise  "$d", which messes with "span". For example, sinh() has length 0xFC, and 0x99CAD8 + 0xFC = 0x99CBD4, as seen in the last line. But if we naively consider the $d line, then we'd have the incorrect "span" of (0x99CBB8 - 0x99CAD8) = 0xE0, which is too small. Worse, $d might appear at the same address of the symbol, and resulting in a "span" of 0!
* Proposal:
  * In MapFileParserLld, add a generator that filters lines from the .map file (with possible look-ahead for "span"), and yields parsed components:
    * address (if $t, properly adjusted)
    * size
    * "span": Computed by looking ahead, *and* skipping annotations like {$t, $a, $d}.
    * level: Depends on the indent: 0 => 1, 8 => 2, 16 => 3, with:
      * Level 1: For "Out" column: Section names, e.g., .text
      * Level 2: For "In" column: Object path (may be '<internal>') and .o section name.
      * Level 3: For "Symbol" column: Actual symbol name.
    * tok: The text after the indent.
  * Annotations, which have size 0, have prefix "$", and may have ".#", e.g., "$a.6", "$t.21", should affect parsing, but not be emitted explicitly.
  * Now MapFileParserLld.Parse() can deal with high-level details. There will still be some complexity regarding dynamics relating to size, span, and level -- more on this later.

Comment 7 by hua...@chromium.org, Dec 6

Addendum to comment #6:

* sinhl() would have "span" = 0, whereas sinh() has "span" = 0x99CBD4 - (0x99CAD9 - 1) = 0xFC (after skipping $d). Therefore "span" allows us to skip aliases, and just emit the last symbol (recall that nm will take care of filling in the aliases).

Comment 8 by hua...@chromium.org, Dec 6

Weird case:

     VMA      LMA     Size Align Out     In      Symbol
...
  97c834   97c834      4dc     4         obj/third_party/libvpx/libvpx_assembly_arm.a(libvpx_assembly_arm/vpx_convolve8_avg_horiz_filter_type1_neon.asm.o):(.text)
  97c834   97c834        0     1                 $a.0
  97c834   97c834        0     1                 _vpx_convolve8_avg_horiz_filter_type1_neon
  97c834   97c834      4dc     1                 vpx_convolve8_avg_horiz_filter_type1_neon
  97c848   97c848        0     1                 start_loop_count
  97c8d0   97c8d0        0     1                 outer_loop8_residual
  97c8f0   97c8f0        0     1                 outer_loop_8
  97c900   97c900        0     1                 inner_loop_8
  97c9c8   97c9c8        0     1                 end_inner_loop_8
  97c9e4   97c9e4        0     1                 end_loops
  97c9e8   97c9e8        0     1                 outer_loop_16
  97ca54   97ca54        0     1                 inner_loop_16
  97cbcc   97cbcc        0     1                 epilog_16
  97cc08   97cc08        0     1                 end_loops1
  97cc0c   97cc0c        0     1                 outer_loop4_residual
  97cc2c   97cc2c        0     1                 outer_loop_4
  97cc3c   97cc3c        0     1                 inner_loop_4
  97ccf8   97ccf8        0     1                 end_inner_loop_4
  97cd08   97cd08        0     1                 end_func

Highlights:
* _vpx_convolve8_avg_horiz_filter_type1_neon() has "span" of 0, so it's correctly ignored.
* start_loop_count, outer_loop8_residual, etc. look like the cases we want to handle in comment #5! However, as seen from the source .asm file:

  https://cs.chromium.org/chromium/src/third_party/libvpx/source/libvpx/vpx_dsp/arm/vpx_convolve8_avg_horiz_filter_type1_neon.asm

  These are just labels, and should NOT be emitted as symbols.
* So the correct symbols to emit are:

  text@97c834(size_without_padding=0x4dc, full_name=vpx_convolve8_avg_horiz_filter_type1_neon, object_path=obj/third_party/libvpx/libvpx_assembly_arm.a)
* However, note that vpx_convolve8_avg_horiz_filter_type1_neon() has size = 0x4DC, which distinguishes it from the other stuff.

Implementation Plan:
* The size field should be used: So once vpx_convolve8_avg_horiz_filter_type1_neon() is extracted, it "shadows" all subsequent Level 3 toks!
* So the plan from comment #5 (if size = 0 then extract toks with "span" != 0 as symbols, with size equals to their "span") still works, but are overrode by "shadowing" above.

Comment 9 by hua...@chromium.org, Dec 6

Weird case:

     VMA      LMA     Size Align Out     In      Symbol

  4d7920   4d7920     266c     4         <internal>:(.rodata)
  4d9f90   4d9f90       18    16         ../../third_party/android_ndk/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libc++_static.a(locale.o):(.rodata._ZTSNSt6__ndk110ctype_baseE)
  4d9f90   4d9f90        0     1                 $d
  ...
  7218ba   7218ba      127     1         thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.90)
  7219e1   7219e1       5c     1         thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.93)
  721a3d   721a3d       28     1         thinlto-cache/Thin-e99da7.tmp.o:(.rodata._ZN24certificate_transparency5prefs16kCTRequiredHostsE)
  721a3d   721a3d       28     1                 certificate_transparency::prefs::kCTRequiredHosts

Highlights:
* Level 2 lines need not be followed by a Level 3 line. Examples above are:
  4d7920   4d7920     266c     4         <internal>:(.rodata)
  7218ba   7218ba      127     1         thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.90)
  7219e1   7219e1       5c     1         thinlto-cache/Thin-c38380.tmp.o:(.rodata..L.str.93)

* We already handle <internal> as a special case. In fact we expect it to not have Level 3.
* The other case seems to always have .rodata..L.str.#. However, the converse is not true (i.e., symbol can have .rodata..L.str.#, but still have Level 3).

Implementation Plan:
* There are 2 types of symbols:
  * Level 2 symbols, which uses data from Level 2, but full_name is not from Level 3 tok.
    * <internal> is already handled:
      * Inside .rodata: full_name := '** lld merge strings'
      * Else full_name := '** ' + mangled name.
  * Level 3 symbols, which takes full_name as tok from parsing Level 3 lines, and also has address and size assigned.
    * Gets object_path from Level 2.
* Complexity: On encountering a Level 2 line, we create a Level 2 symbol.
  * If <internal> then we assign everything and move on (assumes no Level 3 line follows).
  * Otherwise full_name is left blank. If another Level 2 line is found, it's left as that.
  * Once Level 3 line is encountered:
    * The Level 2 line is now populated, and becoming a Level 3 line!
    * Subsequent symbols for the current run of Level 3 lines are created as Level 3 symbols.

Comment 10 by hua...@chromium.org, Dec 6

Addendum to comment #9: The symbols created from the data shown should be:

  .rodata@4d7920(size_without_padding=0x266c, full_name=** lld merge strings, object_path=, ...)
  .rodata@7218ba(size_without_padding=0x127, full_name=, object_path=, ...)
  .rodata@7219e1(size_without_padding=0x5c, full_name=, object_path=, ...)

Note that there's no object_path, due to thinlto-cache/ usage.

(Also, I forgot to add '.' before text@ in the previous examples..)

Comment 11 by hua...@chromium.org, Dec 7

Weird case: 

     VMA      LMA     Size Align Out     In      Symbol
  ...
  831000   831000      a28    32         obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o:(.text)
  831004   831004        0     1                 $d.0
  831040   831040        0     1                 $a.1
  831040   831040        0     1                 .Lenc_key
  831040   831040      218     1                 aes_hw_set_encrypt_key
  8310b0   8310b0        0     1                 $d.2
  8310b4   8310b4        0     1                 $a.3
  8310f0   8310f0        0     1                 $d.4
  8310f4   8310f4        0     1                 $a.5
  831124   831124        0     1                 $d.6
  831128   831128        0     1                 $a.7
  831180   831180        0     1                 $d.8
  831184   831184        0     1                 $a.9
  8311f0   8311f0        0     1                 $d.10
  8311f4   8311f4        0     1                 $a.11
  831228   831228        0     1                 $d.12
  83122c   83122c        0     1                 $a.13
  831260   831260       60     1                 aes_hw_set_decrypt_key
  831294   831294        0     1                 $d.14
  83129c   83129c        0     1                 $a.15
  ...

Highlights:
* .Lenc_key is a label from the assembler code (e.g., aesv8-armx32.S). But since it has "span" = 0, so it gets ignored.
* aes_hw_set_encrypt_key() has address = 831040 -- this is different from Level 2 address of 831000!

Analysis:
* The extra bytes that appear before aes_hw_set_encrypt_key() arises from static data added before the function:

  https://cs.chromium.org/chromium/src/third_party/boringssl/linux-arm/crypto/fipsmodule/aesv8-armx32.S?rcl=fbe75ef510a9d9123f6c7f8cbd54d5f5b5680277&l=35

  * Also verified in the binary, the bytes are:

.long	0x01,0x01,0x01,0x01
.long	0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d,0x0c0f0e0d	@ rotate-n-splat
.long	0x1b,0x1b,0x1b,0x1b

Implementation plans:
* Note that $d.0 hints that there is some data. But it would be odd if we rely on it, since we ignore other $d instances
* Perhaps we can just bundle this into the Level 2 symbol. Then when aes_hw_set_encrypt_key() is created as a Level 3 symbol, it becomes stand-alone, rather than modifying the Level 2 symbols (which happens if the first Level 3 symbol has the same address as the Level 2 symbol).
* So created symbols should be:

  .text@831000(size_without_padding=0x40, full_name=, object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)
  .text@831040(size_without_padding=0x218, full_name=aes_hw_set_encrypt_key, object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)
  .text@831260(size_without_padding=0x60, full_name=aes_hw_set_decrypt_key,object_path=obj/third_party/boringssl/boringssl_asm/aesv8-armx32.o, ...)

Comment 12 by hua...@chromium.org, Dec 7

Weird case:

 1401788  1401788       a4     4         thinlto-cache/Thin-826b9f.tmp.o:(.text._ZNK4$_21clEP10FamilyDataPKcPS3_)
 1401788  1401788        0     1                 $t.78
 1401789  1401789       a4     1                 $_21::operator()(FamilyData*, char const*, char const**) const
 1401818  1401818        0     1                 $d.79

Highlights:
* We have a (Thumb22) symbol that starts with '$'. So the created symbol should be:

  .text@1401788(size_without_padding=0xA4, full_name=$_21::operator()(FamilyData*, char const*, char const**) const, object_path=, ...)

Analysis:
* This weird case only appears twice (the other is $_21::__invoke(FamilyData*, char const*, char const**))
* '$a', '$d', and '$t' seems to be the only annotations there are. Command to verify:

$ cat libchrome.so.map | grep '         \$' | cut -c50- | sed s/[0-9]\\+$//g | sort -u
$_21::__invoke(FamilyData*, char const*, char const**)
$_21::operator()(FamilyData*, char const*, char const**) const
$a
$a.
$d
$d.
$t
$t.

Implementation plans:
* In the data generator when determining whether |tok| is an annotation, we should first check for tok.startswith('$'), then also restrict to length == 2 or (length >= 3, tok[2] == '.').

Comment 13 by agrieve@google.com, Dec 7

For the case in #11 - would it be possible to omit the outer symbol such that the 40 bytes gets picked up as padding for aes_hw_set_encrypt_key?

Comment 14 by hua...@chromium.org, Dec 11

Yeah that's possible. The disadvantage is that the 40 bytes are not padding, so technically it would be incorrect. However, an advantage would be that it might simplify the code re. Level 2 symbols:
* Obvious Level 2 symbols like '<internal>' can just be handled directly (so no Level 3 lines are assumed; maybe display warnings if found).
* The logic to demote Level 2 symbols to Level 3 can be removed.

Not sure how much simplification there will be though; will worry about that after some code cleanup to see.

Comment 15 by bugdroid1@chromium.org, Dec 13

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4cb24b816ce84e34e37fc456bbaeea3d258d9015

commit 4cb24b816ce84e34e37fc456bbaeea3d258d9015
Author: Samuel Huang <huangs@chromium.org>
Date: Thu Dec 13 01:38:40 2018

[Supersize] Add linker_map_parser_test.py and test_util.py.

We plan to update linker_map_parser.py to extract more Level 3 symbols
from .map files. As seen in  crbug.com/892648 , there are many cases that
need to be handled.

This CL adds linker_map_parser_test.py, which takes the identified
issues from Android Chrome's .map file as test cases. Details:
* Currently focus on ARM (32-bit) builds using LLD with ThinLTO.
* Similar to integration_test.py, the '--update' can be specified to
  overwrite "golden" output files with current output.
  * Add test_util.py to factor common code involving usage and
    management of "golden" files.
* The initial Parser.golden is obtained from running the test for
  unmodified linker_map_parser.py. This allows upcoming parser changes
  to be tracked.

Bug:  892648 
Change-Id: Ia8a4dbce4c235f2b08df2bb848409caffcedac32
Reviewed-on: https://chromium-review.googlesource.com/c/1373873
Commit-Queue: agrieve <agrieve@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616152}
[modify] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/integration_test.py
[add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/linker_map_parser_test.py
[add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/test_util.py
[add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden
[add] https://crrev.com/4cb24b816ce84e34e37fc456bbaeea3d258d9015/tools/binary_size/libsupersize/testdata/linker_map_parser/test_lld-lto_v1.map

Comment 16 by bugdroid1@chromium.org, Dec 13

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/514e34fa47261afc25fd6f732c7d8628df1ca7ba

commit 514e34fa47261afc25fd6f732c7d8628df1ca7ba
Author: Samuel Huang <huangs@chromium.org>
Date: Thu Dec 13 14:33:25 2018

[SuperSize] Fix missing sizes in LinkerMapParserTest.test_Parser().

LinkerMapParserTest.test_Parser() creates an iterator of processed
test .map file. Previously this iterator was shared between calls to
  (1) DetectLinkerNameFromMapFile(),
  (2) .MapFileParser().Parse().

This causes a bug in the test: (2) requires a fresh iterator, and it
explicitly it skips the first line! Consequently, a number of section
sizes ({.interp, .ARM.exidx, .dynsyn}) were missing.

This CL fixes the test by creating fresh iterators for (1) and (2).
Also, the section sizes are sorted to reduce arbitrariness of dict
iteration.

Bug:  892648 
Change-Id: Ib52f17b0446012fe37951c86409c450d59b57e4a
Reviewed-on: https://chromium-review.googlesource.com/c/1375610
Commit-Queue: agrieve <agrieve@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616304}
[modify] https://crrev.com/514e34fa47261afc25fd6f732c7d8628df1ca7ba/tools/binary_size/libsupersize/linker_map_parser_test.py
[modify] https://crrev.com/514e34fa47261afc25fd6f732c7d8628df1ca7ba/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden

Comment 17 by bugdroid1@chromium.org, Dec 14

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c957e24190c983d437126399606dd9bf743b873b

commit c957e24190c983d437126399606dd9bf743b873b
Author: Samuel Huang <huangs@chromium.org>
Date: Fri Dec 14 05:56:28 2018

[SuperSize] Refactor MapFileParserLld.Parse() using a filtering generator.

We wish to extract more symbols from Level 3 lines from .map files.
Some lines have |size == 0| (this can happen for symbols from assembly
sources), but we can approximate actual size from |address| difference
between successive lines. We call this approximation "span".

This CL refactors MapFileParserLld.Parse() by extracting some parsing
logic into a new generator MapFileParserLld.Tokenize(). Details:
* Tokenize() is a generator that:
  * Consumes .map file lines and yields parsed fields.
  * Applies a one-step lookhead, and maintains "next" and "current"
    tokens from successive lines to compute "span" (some complication
    exist).
  * Skips Level 3 annotations ('$a', '$t.123', etc.), which also has
    |size == 0| but produces no symbols, and so should be ignored for
    "span" computation.
  * Detects Thumb2 annotations '$t' and adjusts |address| accordingly.
* Add 2 new tests to LinkerMapParserTest:
  * test_ParseArmAnnotation() for annotation parsing.
  * test_Tokenize() to dump Tokenize() output, and compare against new
    Tokenize.golden file.

As a refactoring CL, MapFileParserLld output remains unchanged. Actual
usage of "span" will be implemented in follow-ups.

Bug:  892648 
Change-Id: Ia630027b85e45a234101396917ea4626b9dba1bc
Reviewed-on: https://chromium-review.googlesource.com/c/1377189
Reviewed-by: Samuel Huang <huangs@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Commit-Queue: Samuel Huang <huangs@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616594}
[modify] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/linker_map_parser.py
[modify] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/linker_map_parser_test.py
[add] https://crrev.com/c957e24190c983d437126399606dd9bf743b873b/tools/binary_size/libsupersize/testdata/linker_map_parser/Tokenize.golden

Comment 18 by bugdroid1@chromium.org, Dec 18

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a0cee45fde3a11fde81a75af6ad469d93af90f0b

commit a0cee45fde3a11fde81a75af6ad469d93af90f0b
Author: Samuel Huang <huangs@chromium.org>
Date: Tue Dec 18 17:04:08 2018

[SuperSize] MapFileParserLld: Extract mutiple symbosl from Level 3 lines.

Previously MapFileParserLld.Parse() extracts one symbol per section per
object file. In terms of .map file, this means each Level 2 line can
produce one symbol only, regardless of the number of nested Level 3
lines it nests. This causes symbols to be missing (e.g., for code from
assembly sources), and makes some function appear larger than they are
(since they subsumed its siblings).

This CL fixes MapFileParserLld.Parse() by allowing multiple symbols to
be emitted from Level 3 lines nested under the same Level 2 line. This
helps SuperSize recover missing symbols under LLD. Details:
* Each Level 2 line initially produces a "partial" symbol,  which has
  an empty |full_name|.
  * If no Level 3 lines follow then the symbol is left as-is.
  * On encountering the first (non-annotation) Level 3 line, the symbol
    adopts the parsed |tok| as |full_name|.
  * On encountering second+ (non-annotation) Level 3 lines, new symbols
    are created (copies object path parsed from Level 2).
* An exception of the above are Level 2 lines with '<internal>'. These
  should not nest Level 3 lines anyway.
* Special case: If the first Level 3 line has |address| different from
  the "partial" symbol's, then "partial" symbol has its |address|
  turncated, and |full_name| left blank. The Level 3 line is then used
  to create a new symbol.
  * This handles the rare case where data precedes Level 3 lines.
* Level 3 line |span| is used instead of |size| (which can be 0) to
  decide whether the line should form a symbol.
* To handle assembly labels, which create as Level 3 lines but should
  NOT should form symbols: Each Level 3 symbol supporesses subsequent
  Level 3 lines whose |address| is covered by the Level 3's address
  range.
* Update some comments in test_lld-lto_v1.map.
* Update test output Parser.golden.

Bug:  892648 
Change-Id: I9b3e06563e5df4fcf83ec899a4ec8274d11e6023
Reviewed-on: https://chromium-review.googlesource.com/c/1378679
Commit-Queue: Samuel Huang <huangs@chromium.org>
Reviewed-by: Samuel Huang <huangs@chromium.org>
Reviewed-by: agrieve <agrieve@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617535}
[modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/archive.py
[modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/linker_map_parser.py
[modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/testdata/linker_map_parser/Parser.golden
[modify] https://crrev.com/a0cee45fde3a11fde81a75af6ad469d93af90f0b/tools/binary_size/libsupersize/testdata/linker_map_parser/test_lld-lto_v1.map

Comment 19 by agrieve@chromium.org, Jan 9

Status: Fixed (was: Assigned)
This is fixed I believe. Re-open if I'm wrong.

Sign in to add a comment