Issue metadata
Sign in to add a comment
|
0.7%-233.5% regression in sizes at 617437:617440 |
||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Dec 18
,
Dec 18
Not sure. What are you using to bisect?
,
Dec 18
,
Dec 18
Well, I've certainly never seen an AFDO roll bump us by anything near *that* much. Looks like the subsequent roll in r617511 didn't undo this, which is odd. Investigating now -- thanks for the heads-up.
,
Dec 18
Doesn't immediately appear to be AFDO-related:
$ du Linux{,OldProf}/chrome.stripped
139544 Linux/chrome.stripped
139632 LinuxOldProf/chrome.stripped
$ md5sum *.prof
99f55761af26dff96be889a4efc2f5f8 afdo-old.prof
28c6fb47cea2a44d7e500b06faae259b afdo.prof
$ md5sum *.afdo
654f6ec943e1753f5a0cf79c80d1504a chromeos-chrome-amd64-73.0.3641.0_rc-r1.afdo
99f55761af26dff96be889a4efc2f5f8 chromeos-chrome-amd64-73.0.3642.0_rc-r1.afdo
2b6a924596dfad17b233d337764f5e87 chromeos-chrome-amd64-73.0.3643.0_rc-r1.afdo
28c6fb47cea2a44d7e500b06faae259b chromeos-chrome-amd64-73.0.3644.0_rc-r1.afdo
Please note that r617437 was 3642 -> 3643, and r617511 was 3643 -> 3644.
Since the commit range is so small, I'm happy to bump through them and find out more (including if I made a mistake on the above, since all of those other commit messages do sound pretty innocuous ;) ).
,
Dec 18
I totally agree, which is why I wanted to look into it more. :) Looks like staring at Chrome was my mistake; I assumed totals-* meant that Chrome would show a substantial difference. chrome.perf.fyi indicates that this is > 100% localized to nacl (Chrome actually saw a 180KB size decrease from the AFDO roll in question). From our very own linux.perf.fyi builder: https://ci.chromium.org/p/chrome/builders/luci.chrome.ci/linux-builder-perf/31891 (31892 is the change in question) chrome: build 31891 == 5,479,171,264 bytes; build 31892 == 5,476,629,360 bytes. Diff: -0.046% chrome-bss: build 31891 == 2,006,420 bytes; build 31892 == 2,006,420 bytes. Diff: 0.000% chrome-data: build 31891 == 6,492,936 bytes; build 31892 == 6,493,000 bytes. Diff: 0.001% chrome-stripped: build 31891 == 142,891,352 bytes; build 31892 == 142,715,288 bytes. Diff: -0.123% chrome-text: build 31891 == 136,388,271 bytes; build 31892 == 136,211,887 bytes. Diff: -0.129% Skipping chrome-textrel nacl_helper: build 31891 == 387,043,144 bytes; build 31892 == 431,989,168 bytes. Diff: 11.613% nacl_helper-bss: build 31891 == 217,520 bytes; build 31892 == 243,344 bytes. Diff: 11.872% nacl_helper-data: build 31891 == 49,672 bytes; build 31892 == 181,236 bytes. Diff: 264.866% nacl_helper-stripped: build 31891 == 2,596,112 bytes; build 31892 == 8,656,328 bytes. Diff: 233.434% nacl_helper-text: build 31891 == 2,538,258 bytes; build 31892 == 8,464,202 bytes. Diff: 233.465% Skipping nacl_helper-textrel nacl_helper_bootstrap: build 31891 == 35,120 bytes; build 31892 == 35,120 bytes. Diff: 0.000% nacl_helper_bootstrap-bss: build 31891 == 4,113 bytes; build 31892 == 4,113 bytes. Diff: 0.000% nacl_helper_bootstrap-data: build 31891 == 40 bytes; build 31892 == 40 bytes. Diff: 0.000% nacl_helper_bootstrap-stripped: build 31891 == 8,744 bytes; build 31892 == 8,744 bytes. Diff: 0.000% nacl_helper_bootstrap-text: build 31891 == 6,559 bytes; build 31892 == 6,559 bytes. Diff: 0.000% Skipping nacl_helper_bootstrap-textrel nacl_irt_x86_64.nexe: build 31891 == 3,802,672 bytes; build 31892 == 3,802,672 bytes. Diff: 0.000% resources.pak: build 31891 == 12,820,204 bytes; build 31892 == 12,820,204 bytes. Diff: 0.000% totals-bss: build 31891 == 2,228,053 bytes; build 31892 == 2,253,877 bytes. Diff: 1.159% totals-data: build 31891 == 6,542,648 bytes; build 31892 == 6,674,276 bytes. Diff: 2.012% totals-size: build 31891 == 5,882,872,404 bytes; build 31892 == 5,925,276,524 bytes. Diff: 0.721% totals-stripped: build 31891 == 145,496,208 bytes; build 31892 == 151,380,360 bytes. Diff: 4.044% totals-text: build 31891 == 138,933,088 bytes; build 31892 == 144,682,648 bytes. Diff: 4.138% Skipping totals-textrel Digging more now
,
Dec 18
(...s/linux.perf.fyi/linux-builder-perf/. Names are hard :) )
,
Dec 18
nacl_helper is shipped as part of the Linux Chrome package. The relevant lines are: nacl_helper-stripped: build 31891 == 2,596,112 bytes; build 31892 == 8,656,328 bytes. Diff: 233.434% totals-stripped: build 31891 == 145,496,208 bytes; build 31892 == 151,380,360 bytes. Diff: 4.044% Why did nacl_helper go from 2.5 MB to 6.5 MB?
,
Dec 18
Thanks for the explanation.
> Why did nacl_helper go from 2.5 MB to 6.5 MB?
Because we're putting ~17x more distinct pieces of code/data into the final binary, for reasons unclear to me:
$ ~/environments/bloaty/build/bloaty -s vm -d symbols -n 0 Linux/nacl_helper -- LinuxOldProf/nacl_helper | grep -cF '[NEW]'
12755
$ ~/environments/bloaty/build/bloaty -s vm -d symbols -n 0 Linux/nacl_helper -- LinuxOldProf/nacl_helper | grep -vcF '[NEW]'
741
This only appears to occur with all three of {CFI, ThinLTO, AFDO} enabled. AFDO-only gives us:
VM SIZE FILE SIZE
-------------- --------------
[ = ] 0 .debug_ranges +31.4Ki +0.2%
+0.3% +3.62Ki .text +3.62Ki +0.3%
[ = ] 0 .strtab +2.28Ki +0.4%
+18% +1.02Ki google_malloc +1.02Ki +18%
[ = ] 0 .symtab +384 +0.1%
[ = ] 0 [Unmapped] -659 -11.9%
[ = ] 0 .debug_str -1.50Ki -0.0%
[ = ] 0 .debug_frame -1.78Ki -0.0%
[ = ] 0 .debug_abbrev -1.95Ki -0.0%
[ = ] 0 .gdb_index -6.29Ki -0.0%
[ = ] 0 .debug_line -34.8Ki -0.1%
[ = ] 0 .debug_loc -74.1Ki -0.1%
[ = ] 0 .debug_info -88.9Ki -0.0%
+0.2% +4.64Ki TOTAL -171Ki -0.0%
And ThinLTO+AFDO-only:
VM SIZE FILE SIZE
-------------- --------------
[ = ] 0 .debug_loc +12.4Mi +287%
[ = ] 0 .debug_info +12.1Mi +5.5%
[ = ] 0 .debug_line +3.90Mi +41%
[ = ] 0 .gdb_index +2.70Mi +7.5%
[ = ] 0 .debug_str +2.44Mi +2.7%
[ = ] 0 .debug_ranges +1.81Mi +237%
[ = ] 0 .debug_frame +515Ki +244%
[ = ] 0 .debug_abbrev +464Ki +7.8%
[ = ] 0 .strtab +2.77Ki +0.5%
[ = ] 0 .symtab +1.15Ki +0.3%
+19% +1.09Ki google_malloc +1.09Ki +19%
+1.4% +248 .eh_frame +248 +1.4%
+0.2% +168 .rela.dyn +168 +0.2%
+3.1% +152 .gcc_except_table +152 +3.1%
+0.0% +128 .rodata +128 +0.0%
+1.9% +56 .eh_frame_hdr +56 +1.9%
+0.0% +56 [3 Others] +27 +0.4%
+0.1% +48 .data.rel.ro +48 +0.1%
-12.2% -48 [LOAD [RW]] 0 [ = ]
[ = ] 0 [Unmapped] -821 -19.5%
-0.1% -1.05Ki .text -1.05Ki -0.1%
+0.0% +856 TOTAL +36.3Mi +9.8%
...The large increase of debuginfo is interesting, given the otherwise-stable sizes.
Turning all three on:
VM SIZE FILE SIZE
-------------- --------------
[ = ] 0 .debug_loc +12.1Mi +281%
[ = ] 0 .debug_info +11.6Mi +5.3%
+389% +4.57Mi .text +4.57Mi +389%
[ = ] 0 .debug_line +3.83Mi +41%
[ = ] 0 .gdb_index +2.85Mi +7.9%
[ = ] 0 .debug_str +2.41Mi +2.7%
[ = ] 0 .debug_ranges +1.89Mi +239%
[ = ] 0 .strtab +975Ki +154%
+76% +856Ki .rodata +856Ki +76%
[ = ] 0 .debug_frame +522Ki +244%
[ = ] 0 .debug_abbrev +464Ki +7.8%
[ = ] 0 .symtab +451Ki +131%
+249% +239Ki .rela.dyn +239Ki +249%
+322% +121Ki .data.rel.ro +121Ki +322%
+12% +25.2Ki .bss 0 [ = ]
+41% +7.49Ki .eh_frame +7.49Ki +41%
+79% +6.30Ki .data +6.30Ki +79%
+14% +4.29Ki [15 Others] +4.95Ki +12%
+73% +2.42Ki .eh_frame_hdr +2.42Ki +73%
-13.8% -492 [LOAD [RW]] +1.70Ki +27e2%
+18% +1.11Ki google_malloc +1.11Ki +18%
+216% +5.80Mi TOTAL +42.9Mi +12%
`du`ing the object files in each build directory shows no meaningful changes between old/new profiles (the best we got was a single object file getting a whopping 72K larger), so I assume this is going to be confined to ThinLTO. Looking more into that now.
,
Dec 18
(BTW, if you think we should stop the roller and revert until we better understand this issue, I'm happy to. Dunno how critical binary size is for this one component, and we're far enough from a branch that idk what our policy is for how quickly we want to fix upticks like this. As alluded to above, the roller may also just roll in a profile that makes this vanish. It still warrants an investigation, because +216% binary size seems like a ton, but ...)
,
Dec 19
The bottom line is we don't want to ship M-73 to users with a size regression. So you probably do not have to stop the roller, but please resolve this issue before M-73 branches.
,
Dec 19
Will do -- thanks!
,
Dec 19
,
Dec 21
Update: this size increase appears to be enabled by more aggressive inlining prompted by our AFDO profiles. It also only happens if -fsanitize=cfi-icall is enabled (vcall need not apply). I've noted that tons of the symbols we're bringing in are from places like skia, libxml, etc. None of which were previously linked statically into this binary. I originally suspected that we were speculatively inlining the wrong thing at one place, but disabling ICP doesn't help our repro. Instead, this only happens if the direct call inlining in SampleProfile is enabled. Oddly, disabling all hotness/coldness bonuses inside of the inliner does not make this go away. Currently using name-based "bisection" to cut down on the number of things we inline. The steps after that will depend on how many things we inline after filtering out all of the trivially OK inlining decisions. :) I'm out all next week, but this'll be at the top of my to-do list when I'm back on the first week of January.
,
Dec 22
obj/ipc/ipc/ipc_sync_channel.o is what enables this. Three inlining decisions in that TU, all of which are inlined into `base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunOnce(base::internal::BindStateBase*, base::WaitableEvent*)`. (Invoker::RunOnce) These three functions are: void base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunImpl<void (IPC::SyncChannel::*)(base::WaitableEvent*), std::__1::tuple<base::internal::UnretainedWrapper<IPC::SyncChannel> >, 0ul>(void (IPC::SyncChannel::*&&)(base::WaitableEvent*), std::__1::tuple<base::internal::UnretainedWrapper<IPC::SyncChannel> >&&, std::__1::integer_sequence<unsigned long, 0ul>, base::WaitableEvent*&&) (Invoker::RunImpl) void base::internal::InvokeHelper<false, void>::MakeItSo<void (IPC::SyncChannel::*)(base::WaitableEvent*), IPC::SyncChannel*, base::WaitableEvent*>(void (IPC::SyncChannel::*&&)(base::WaitableEvent*), IPC::SyncChannel*&&, base::WaitableEvent*&&) (InvokeHelper::MakeItSo) base::internal::Invoker<base::internal::BindState<void (IPC::SyncChannel::*)(base::WaitableEvent*), base::internal::UnretainedWrapper<IPC::SyncChannel> >, void (base::WaitableEvent*)>::RunOnce(base::internal::BindStateBase*, base::WaitableEvent*) (Invoker::RunOnce, but with different template args than the caller). Disabling any one of these inlining decisions makes the regression go away. Will continue investigating when I'm back. Happy Holidays!
,
Jan 7
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/af31b7d5a7c5c831c65a418ae305a4aad520ceea commit af31b7d5a7c5c831c65a418ae305a4aad520ceea Author: George Burgess IV <gbiv@chromium.org> Date: Mon Jan 07 22:39:11 2019 ipc: disable AFDO on Linux To work around a 200% (!) binary size increase caused by AFDO in nacl_helper, locally disable it. This may have a small performance impact, though it's likely that any perf delta will be negligible. Bug: 916130 Change-Id: Idf37c813215a8d5a19431499b671f7579b573277 Reviewed-on: https://chromium-review.googlesource.com/c/1396565 Commit-Queue: George Burgess <gbiv@chromium.org> Reviewed-by: Ken Rockot <rockot@google.com> Cr-Commit-Position: refs/heads/master@{#620506} [modify] https://crrev.com/af31b7d5a7c5c831c65a418ae305a4aad520ceea/ipc/BUILD.gn
,
Jan 7
It appears that this was fixed and subsequently rebroken by a later profile roll (e.g. r618661) a few times. For now, I've put ^ in place, which should fix this entirely, assuming ipc/ is the only problem area. If that isn't a complete fix, then we know that there are more holes to plug. :) I plan to keep digging for the root cause later this week.
,
Today
(9 hours ago)
As an update, it appears there are multiple places in LLVM in which we try to do some sort of speculative inlining of potential targets for indirect calls. When we see this substantial size regression, one of the candidates for indirect call promotion is gpu::raster::RasterImplementation::PaintOpSerializer::Serialize(cc::PaintOp const*, cc::PaintOp::SerializeOptions const&). Because Skia is big in the gpu-y world, it makes sense that skia symbols would pop up if we assumed this code is now somehow reachable. Were this solely a regression in nacl_helper, I'd say 'mystery solved', but somehow this indirect call promotion is causing us to pull 5.5MB of text into Chrome itself, which is odd to me, since I'd expect for a profile of Chrome to only mention code that's not been stripped out of Chrome by the linker. I'll look more. On the bright side, the workaround in r620506 appears to be WAI, at least. :) |
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Dec 18