Issue metadata
Sign in to add a comment
|
Investigate how AFDO profile gathering works with CFI/ThinLTO |
||||||||||||||||||||||||
Issue description(Assigning to you, laszio, at llozano's request. If you can't get to it, I'm happy to pick it up :) ) When doing other AFDO investigations, I found that CFI can cause us to emit different mangled symbols (`chrome` is Chrome/Linux in this case): ~/l/l/AFDO-3481 [ 3.92s ] ~> nm chrome | wc -l 617054 ~/l/l/AFDO-3481 [ 4.00s ] ~> nm chrome | grep '\.cfi$' | wc -l 75477 It appears that ThinLTO will also (potentially) tag symbols with a hash-like suffix: ~/l/l/AFDO-3481 [ 4.14s ] ~> nm chrome | grep '\.cfi$' | head -n20 | cut -d' ' -f3 a2i_ipadd.cfi aac_decode_close$936cd9b7879c207b48f2913effcacacf.cfi aac_decode_frame$936cd9b7879c207b48f2913effcacacf.cfi aac_decode_init$936cd9b7879c207b48f2913effcacacf.cfi aac_parse_init$cbfa93a464bf2b41dc7181fd39bfa2f6.cfi aac_static_table_init$936cd9b7879c207b48f2913effcacacf.cfi aac_sync$cbfa93a464bf2b41dc7181fd39bfa2f6.cfi ABGRToI420.cfi ABGRToUVRow_Any_SSSE3.cfi ABGRToUVRow_C.cfi ABGRToUVRow_SSSE3.cfi ABGRToYRow_Any_SSSE3.cfi ABGRToYRow_C.cfi ABGRToYRow_SSSE3.cfi abort_message.cfi absFunc$c0ebdb0c05b4ae8b8d2321e6f41a5205.cfi ACCESS_DESCRIPTION_free.cfi access_virt_barray$6d492c98ee523c524cb677da1abfc522.cfi access_virt_sarray$6d492c98ee523c524cb677da1abfc522.cfi AddGreenToBlueAndRed_SSE2$2f46114d91e02a1a72b3290089da97d9.cfi It would likely be good to verify that neither of these keeps our `perf` traces from matching with symbols in AFDO (since profile data matching presumably happens before cfi/this hash are applied, and "this is `aac_decode_close`, not `aac_decode_close$936cd9b7879c207b48f2913effcacacf.cfi`! Guess I don't have any profile data for it, ..." seems like a subtle issue) It's also interesting to note that we may emit both a cfi'ed and non-cfi'ed variant of the same symbol: ~/l/l/AFDO-3481 [ 7.91s ] ~> nm chrome | cut -d' ' -f3 | sort -u | wc -l 611644 ~/l/l/AFDO-3481 [ 8.12s ] ~> nm chrome | cut -d' ' -f3 | sed 's/\.cfi$//' | sort -u | wc -l 536504 So uh, even if we see samples from a function that was CFI'ed in the final profile, it may be that we're calling the non-CFI'ed variant? I dunno how CFI works in enough detail to say anything about that.
,
Aug 14
> 1. Are they additional symbols, or are they transformed from "the original" symbols? To be clear, do they replace the original symbols?
,
Aug 14
If we don't see this, that's great. :) Like said, I dunno much about cfi's implementation details, so it would be really nice if this isn't something we'll ever encounter. If we do decide to rely on that (and decide that it does potentially break afdo/etc), it may be worth a test somewhere? > 1. Are they additional symbols, or are they transformed from "the original" symbols? It appears to be a mixture of both. e.g. from my Linux chrome build: 00000000019603f8 t a2i_ipadd 0000000003e8ca90 t a2i_ipadd.cfi a2i_ipadd boils down to tailcall to a2i_ipadd.cfi: "19603f8: e9 93 c6 52 02 jmpq 3e8ca90 <a2i_ipadd.cfi>". a2i_ipadd.cfi looks like it does a lot of work, so I assume that's the "real" version of this function. There's quite a few cases where we don't have both symbols. As noted above, I count 75,477 CFI-like symbols (all in .text). In 75,140 cases, we have a non-".cfi" suffix symbol to match. So, in 337 cases, we only have the .cfi variant. 10 examples of this are: BIO_snprintf.cfi WelsSleep.cfi _Z13SkStrEndsWithPKcS0_.cfi _Z15FX_CharSetIsCJKh.cfi _Z26SkNotifyBitmapGenIDIsStalej.cfi _ZN10DevToolsUI11GetProxyURLERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE.cfi _ZN10SkTextBlob14MakeFromBufferER12SkReadBuffer.cfi _ZN10disk_cache11simple_util15GetEntryHashKeyERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE.cfi _ZN10extensions9Extension25GetBaseURLFromExtensionIdERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE.cfi _ZN10spellcheck27GetSpellCheckLanguageRegionEN4base16BasicStringPieceINSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEEE.cfi ...I'm failing to see why these are special, though. The non-.cfi versions of the above have no references (obviously), but a2i_ipadd has no grepable references in disassembly, either (though I can't tell what references may lie in e.g. tables sitting in rodata). > 2. Can you show addresses and sizes of them? I think the notes above gives you what you were looking for with this? If not, happy to dump some output here (or you can come poke me and I'm happy to chat/look at chrome interactively)
,
Aug 14
> (since profile data matching presumably happens before cfi/this hash ... Is cfi applied before most of the optimization passes? If not, IIUC this is the life of a sample in afdo: 1. sampled on target 2. saved in virtual address in perf.data 3. resolved into a symbol + offset and then source line by create_llvm_prof 4. corresponding source line count += 1 then compiler uses the aggregated profile, which is represented in source line counts, to make decisions. Chances are, if cfi handles debug info correctly (s.t. offsets can be correctly mapped to source lines), we should be fine. If cfi is applied before the optimization passes, I guess some translation has to be done. Having said that, only performance test tells. We should definitely do performance tests before enabling more cfi flags.
,
Aug 14
The purpose of the .cfi symbols is to point the "real" definition of the function. The non-.cfi symbols point to jump tables that are used by indirect calls. The debug info is preserved when the real function definition is renamed so assuming that the lookup is based on source lines I don't see why things wouldn't work.
,
Sep 3
According to #5, it doesn't look like a problem. Marking it won't fix for now. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by laszio@chromium.org
, Aug 14