New issue
Advanced search Search tips

Issue 916130 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

0.7%-233.5% regression in sizes at 617437:617440

Project Member Reported by mustaq@chromium.org, Dec 18

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=916130

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=5f6a02a089acecf11ecf2c90fb8d3bd041c64a6cc5612d06e386f36d6051d61b


Bot(s) for this bug's original alert(s):

Google Chrome Linux x64
Cc: thestig@chromium.org
thestig@chromium.org: Any clue why I can't bisect any of these?
Not sure. What are you using to bisect?
Owner: g...@chromium.org
Well, this is very likely r617437.
Well, I've certainly never seen an AFDO roll bump us by anything near *that* much. Looks like the subsequent roll in r617511 didn't undo this, which is odd.

Investigating now -- thanks for the heads-up.
Doesn't immediately appear to be AFDO-related:
$ du Linux{,OldProf}/chrome.stripped
139544  Linux/chrome.stripped
139632  LinuxOldProf/chrome.stripped
$ md5sum *.prof
99f55761af26dff96be889a4efc2f5f8  afdo-old.prof
28c6fb47cea2a44d7e500b06faae259b  afdo.prof
$ md5sum *.afdo
654f6ec943e1753f5a0cf79c80d1504a  chromeos-chrome-amd64-73.0.3641.0_rc-r1.afdo
99f55761af26dff96be889a4efc2f5f8  chromeos-chrome-amd64-73.0.3642.0_rc-r1.afdo
2b6a924596dfad17b233d337764f5e87  chromeos-chrome-amd64-73.0.3643.0_rc-r1.afdo
28c6fb47cea2a44d7e500b06faae259b  chromeos-chrome-amd64-73.0.3644.0_rc-r1.afdo

Please note that r617437 was 3642 -> 3643, and r617511 was 3643 -> 3644.

Since the commit range is so small, I'm happy to bump through them and find out more (including if I made a mistake on the above, since all of those other commit messages do sound pretty innocuous ;) ).
I looked through the blame list and I can't imagine this being any of the other CLs:

r617440: Added an extra logging statement. (Chrome OS only)
r617439: Updated a comment and changed base_unittests.
r617438: Java changes. (Android only)
I totally agree, which is why I wanted to look into it more. :)

Looks like staring at Chrome was my mistake; I assumed totals-* meant that Chrome would show a substantial difference. chrome.perf.fyi indicates that this is > 100% localized to nacl (Chrome actually saw a 180KB size decrease from the AFDO roll in question).

From our very own linux.perf.fyi builder: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/linux-builder-perf/31891 (31892 is the change in question)

chrome: build 31891 == 5,479,171,264 bytes; build 31892 == 5,476,629,360 bytes. Diff: -0.046%
chrome-bss: build 31891 == 2,006,420 bytes; build 31892 == 2,006,420 bytes. Diff: 0.000%
chrome-data: build 31891 == 6,492,936 bytes; build 31892 == 6,493,000 bytes. Diff: 0.001%
chrome-stripped: build 31891 == 142,891,352 bytes; build 31892 == 142,715,288 bytes. Diff: -0.123%
chrome-text: build 31891 == 136,388,271 bytes; build 31892 == 136,211,887 bytes. Diff: -0.129%
Skipping chrome-textrel
nacl_helper: build 31891 == 387,043,144 bytes; build 31892 == 431,989,168 bytes. Diff: 11.613%
nacl_helper-bss: build 31891 == 217,520 bytes; build 31892 == 243,344 bytes. Diff: 11.872%
nacl_helper-data: build 31891 == 49,672 bytes; build 31892 == 181,236 bytes. Diff: 264.866%
nacl_helper-stripped: build 31891 == 2,596,112 bytes; build 31892 == 8,656,328 bytes. Diff: 233.434%
nacl_helper-text: build 31891 == 2,538,258 bytes; build 31892 == 8,464,202 bytes. Diff: 233.465%
Skipping nacl_helper-textrel
nacl_helper_bootstrap: build 31891 == 35,120 bytes; build 31892 == 35,120 bytes. Diff: 0.000%
nacl_helper_bootstrap-bss: build 31891 == 4,113 bytes; build 31892 == 4,113 bytes. Diff: 0.000%
nacl_helper_bootstrap-data: build 31891 == 40 bytes; build 31892 == 40 bytes. Diff: 0.000%
nacl_helper_bootstrap-stripped: build 31891 == 8,744 bytes; build 31892 == 8,744 bytes. Diff: 0.000%
nacl_helper_bootstrap-text: build 31891 == 6,559 bytes; build 31892 == 6,559 bytes. Diff: 0.000%
Skipping nacl_helper_bootstrap-textrel
nacl_irt_x86_64.nexe: build 31891 == 3,802,672 bytes; build 31892 == 3,802,672 bytes. Diff: 0.000%
resources.pak: build 31891 == 12,820,204 bytes; build 31892 == 12,820,204 bytes. Diff: 0.000%
totals-bss: build 31891 == 2,228,053 bytes; build 31892 == 2,253,877 bytes. Diff: 1.159%
totals-data: build 31891 == 6,542,648 bytes; build 31892 == 6,674,276 bytes. Diff: 2.012%
totals-size: build 31891 == 5,882,872,404 bytes; build 31892 == 5,925,276,524 bytes. Diff: 0.721%
totals-stripped: build 31891 == 145,496,208 bytes; build 31892 == 151,380,360 bytes. Diff: 4.044%
totals-text: build 31891 == 138,933,088 bytes; build 31892 == 144,682,648 bytes. Diff: 4.138%
Skipping totals-textrel

Digging more now
(...s/linux.perf.fyi/linux-builder-perf/. Names are hard :) )
nacl_helper is shipped as part of the Linux Chrome package. The relevant lines are:

nacl_helper-stripped: build 31891 == 2,596,112 bytes; build 31892 == 8,656,328 bytes. Diff: 233.434%
totals-stripped: build 31891 == 145,496,208 bytes; build 31892 == 151,380,360 bytes. Diff: 4.044%

Why did nacl_helper go from 2.5 MB to 6.5 MB?
Thanks for the explanation.

> Why did nacl_helper go from 2.5 MB to 6.5 MB?
Because we're putting ~17x more distinct pieces of code/data into the final binary, for reasons unclear to me:

$ ~/environments/bloaty/build/bloaty -s vm -d symbols -n 0 Linux/nacl_helper -- LinuxOldProf/nacl_helper | grep -cF '[NEW]'
12755
$ ~/environments/bloaty/build/bloaty -s vm -d symbols -n 0 Linux/nacl_helper -- LinuxOldProf/nacl_helper | grep -vcF '[NEW]'
741

This only appears to occur with all three of {CFI, ThinLTO, AFDO} enabled. AFDO-only gives us:
     VM SIZE                     FILE SIZE      
 --------------               --------------    
  [ = ]       0 .debug_ranges +31.4Ki  +0.2%    
  +0.3% +3.62Ki .text         +3.62Ki  +0.3%    
  [ = ]       0 .strtab       +2.28Ki  +0.4%    
   +18% +1.02Ki google_malloc +1.02Ki   +18%    
  [ = ]       0 .symtab          +384  +0.1%    
  [ = ]       0 [Unmapped]       -659 -11.9%    
  [ = ]       0 .debug_str    -1.50Ki  -0.0%    
  [ = ]       0 .debug_frame  -1.78Ki  -0.0%    
  [ = ]       0 .debug_abbrev -1.95Ki  -0.0%    
  [ = ]       0 .gdb_index    -6.29Ki  -0.0%    
  [ = ]       0 .debug_line   -34.8Ki  -0.1%    
  [ = ]       0 .debug_loc    -74.1Ki  -0.1%
  [ = ]       0 .debug_info   -88.9Ki  -0.0%
  +0.2% +4.64Ki TOTAL          -171Ki  -0.0%                                      

And ThinLTO+AFDO-only:
     VM SIZE                         FILE SIZE
 --------------                   --------------
  [ = ]       0 .debug_loc        +12.4Mi  +287%
  [ = ]       0 .debug_info       +12.1Mi  +5.5%
  [ = ]       0 .debug_line       +3.90Mi   +41%
  [ = ]       0 .gdb_index        +2.70Mi  +7.5%
  [ = ]       0 .debug_str        +2.44Mi  +2.7%
  [ = ]       0 .debug_ranges     +1.81Mi  +237%
  [ = ]       0 .debug_frame       +515Ki  +244%
  [ = ]       0 .debug_abbrev      +464Ki  +7.8%
  [ = ]       0 .strtab           +2.77Ki  +0.5%
  [ = ]       0 .symtab           +1.15Ki  +0.3%
   +19% +1.09Ki google_malloc     +1.09Ki   +19%
  +1.4%    +248 .eh_frame            +248  +1.4%
  +0.2%    +168 .rela.dyn            +168  +0.2%
  +3.1%    +152 .gcc_except_table    +152  +3.1%
  +0.0%    +128 .rodata              +128  +0.0%
  +1.9%     +56 .eh_frame_hdr         +56  +1.9%
  +0.0%     +56 [3 Others]            +27  +0.4%
  +0.1%     +48 .data.rel.ro          +48  +0.1%
 -12.2%     -48 [LOAD [RW]]             0  [ = ]
  [ = ]       0 [Unmapped]           -821 -19.5%
  -0.1% -1.05Ki .text             -1.05Ki  -0.1%
  +0.0%    +856 TOTAL             +36.3Mi  +9.8%                               

...The large increase of debuginfo is interesting, given the otherwise-stable sizes.

Turning all three on:

     VM SIZE                     FILE SIZE  
 --------------               --------------                                      
  [ = ]       0 .debug_loc    +12.1Mi  +281%  
  [ = ]       0 .debug_info   +11.6Mi  +5.3%    
  +389% +4.57Mi .text         +4.57Mi  +389%    
  [ = ]       0 .debug_line   +3.83Mi   +41%    
  [ = ]       0 .gdb_index    +2.85Mi  +7.9%    
  [ = ]       0 .debug_str    +2.41Mi  +2.7%    
  [ = ]       0 .debug_ranges +1.89Mi  +239%    
  [ = ]       0 .strtab        +975Ki  +154%    
   +76%  +856Ki .rodata        +856Ki   +76%    
  [ = ]       0 .debug_frame   +522Ki  +244%    
  [ = ]       0 .debug_abbrev  +464Ki  +7.8%    
  [ = ]       0 .symtab        +451Ki  +131%    
  +249%  +239Ki .rela.dyn      +239Ki  +249%    
  +322%  +121Ki .data.rel.ro   +121Ki  +322%    
   +12% +25.2Ki .bss                0  [ = ]    
   +41% +7.49Ki .eh_frame     +7.49Ki   +41%    
   +79% +6.30Ki .data         +6.30Ki   +79%    
   +14% +4.29Ki [15 Others]   +4.95Ki   +12%    
   +73% +2.42Ki .eh_frame_hdr +2.42Ki   +73%    
 -13.8%    -492 [LOAD [RW]]   +1.70Ki +27e2%    
   +18% +1.11Ki google_malloc +1.11Ki   +18%    
  +216% +5.80Mi TOTAL         +42.9Mi   +12%    

`du`ing the object files in each build directory shows no meaningful changes between old/new profiles (the best we got was a single object file getting a whopping 72K larger), so I assume this is going to be confined to ThinLTO. Looking more into that now.
(BTW, if you think we should stop the roller and revert until we better understand this issue, I'm happy to. Dunno how critical binary size is for this one component, and we're far enough from a branch that idk what our policy is for how quickly we want to fix upticks like this.

As alluded to above, the roller may also just roll in a profile that makes this vanish. It still warrants an investigation, because +216% binary size seems like a ton, but ...)
The bottom line is we don't want to ship M-73 to users with a size regression. So you probably do not have to stop the roller, but please resolve this issue before M-73 branches.
Will do -- thanks!
Status: Assigned (was: Untriaged)
Update: this size increase appears to be enabled by more aggressive inlining prompted by our AFDO profiles. It also only happens if -fsanitize=cfi-icall is enabled (vcall need not apply).

I've noted that tons of the symbols we're bringing in are from places like skia, libxml, etc. None of which were previously linked statically into this binary. I originally suspected that we were speculatively inlining the wrong thing at one place, but disabling ICP doesn't help our repro.

Instead, this only happens if the direct call inlining in SampleProfile is enabled. Oddly, disabling all hotness/coldness bonuses inside of the inliner does not make this go away.

Currently using name-based "bisection" to cut down on the number of things we inline. The steps after that will depend on how many things we inline after filtering out all of the trivially OK inlining decisions. :)

I'm out all next week, but this'll be at the top of my to-do list when I'm back on the first week of January.
obj/ipc/ipc/ipc_sync_channel.o is what enables this. Three inlining decisions in that TU, all of which are inlined into `base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunOnce(base::internal::BindStateBase*, base::WaitableEvent*)`. 

(Invoker::RunOnce)

These three functions are:

void base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunImpl<void (IPC::SyncChannel::*)(base::WaitableEvent*), std::__1::tuple<base::internal::UnretainedWrapper<IPC::SyncChannel> >, 0ul>(void (IPC::SyncChannel::*&&)(base::WaitableEvent*), std::__1::tuple<base::internal::UnretainedWrapper<IPC::SyncChannel> >&&, std::__1::integer_sequence<unsigned long, 0ul>, base::WaitableEvent*&&)

(Invoker::RunImpl)

void base::internal::InvokeHelper<false, void>::MakeItSo<void (IPC::SyncChannel::*)(base::WaitableEvent*), IPC::SyncChannel*, base::WaitableEvent*>(void (IPC::SyncChannel::*&&)(base::WaitableEvent*), IPC::SyncChannel*&&, base::WaitableEvent*&&)

(InvokeHelper::MakeItSo)

base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunOnce(base::internal::BindStateBase*, base::WaitableEvent*)

(Invoker::RunOnce, but with different template args than the caller).

Disabling any one of these inlining decisions makes the regression go away.

Will continue investigating when I'm back. Happy Holidays!
Project Member

Comment 18 by bugdroid1@chromium.org, Jan 7

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/af31b7d5a7c5c831c65a418ae305a4aad520ceea

commit af31b7d5a7c5c831c65a418ae305a4aad520ceea
Author: George Burgess IV <gbiv@chromium.org>
Date: Mon Jan 07 22:39:11 2019

ipc: disable AFDO on Linux

To work around a 200% (!) binary size increase caused by AFDO in
nacl_helper, locally disable it.

This may have a small performance impact, though it's likely that any
perf delta will be negligible.

Bug: 916130
Change-Id: Idf37c813215a8d5a19431499b671f7579b573277
Reviewed-on: https://chromium-review.googlesource.com/c/1396565
Commit-Queue: George Burgess <gbiv@chromium.org>
Reviewed-by: Ken Rockot <rockot@google.com>
Cr-Commit-Position: refs/heads/master@{#620506}
[modify] https://crrev.com/af31b7d5a7c5c831c65a418ae305a4aad520ceea/ipc/BUILD.gn

It appears that this was fixed and subsequently rebroken by a later profile roll (e.g. r618661) a few times.

For now, I've put ^ in place, which should fix this entirely, assuming ipc/ is the only problem area. If that isn't a complete fix, then we know that there are more holes to plug. :)

I plan to keep digging for the root cause later this week.

Comment 20 by g...@chromium.org, Today (9 hours ago)

As an update, it appears there are multiple places in LLVM in which we try to do some sort of speculative inlining of potential targets for indirect calls. When we see this substantial size regression, one of the candidates for indirect call promotion is gpu::raster::RasterImplementation::PaintOpSerializer::Serialize(cc::PaintOp const*, cc::PaintOp::SerializeOptions const&). Because Skia is big in the gpu-y world, it makes sense that skia symbols would pop up if we assumed this code is now somehow reachable.

Were this solely a regression in nacl_helper, I'd say 'mystery solved', but somehow this indirect call promotion is causing us to pull 5.5MB of text into Chrome itself, which is odd to me, since I'd expect for a profile of Chrome to only mention code that's not been stripped out of Chrome by the linker.

I'll look more. On the bright side, the workaround in r620506 appears to be WAI, at least. :)

Sign in to add a comment