New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 681991 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug

Blocked on:
issue 681694

Blocking:
issue 482401



Sign in to add a comment

Investigate code size bloat caused by oilpan heap compaction (180kb)

Project Member Reported by agrieve@chromium.org, Jan 17 2017

Issue description

This change increase libchrome.so by 180kb on Android:
https://codereview.chromium.org/2531973002

Let's figure out why!
 
Blockedon: 681694
Attaching a diff of nm output.

1. Built with & without change.

2. Ran: nm --size-sort --reverse-sort --demangle --synthetic lib.unstripped.after/libchrome.so | perl -p -e 's/clone.*?\]/clone]/g' | perl -p -e 's/constprop.*?\]/constprop]/g'

3. Ran diff
nm-diff.txt
1.6 MB View Download
The difference report by nm is 238908, but maybe this is because it's run on unstripped .so:

$ grep '^>' nm-diff.txt | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
654172
$ grep '^<' nm-diff.txt | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
415264

Here are some specific size growths from the major code added:

HeapCompact:
$ grep '^>' nm-diff.txt | grep 'HeapCompact' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
2190

Changes to HeapPage.cpp:
$ grep '^>' nm-diff.txt | grep -E 'NormalPageArena|arenaSize|sweepAndCompact|freeListSize' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
678

SparseHeapBitmap:
$ grep '^>' nm-diff.txt | grep -E 'SparseHeapBitmap' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
456


Size bloat from changes to templated classes:

LinkedHashSet => 3.5k bigger:
$ grep '^>' nm-diff.txt | grep -E 'LinkedHashSet' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
5108
$ grep '^<' nm-diff.txt | grep -E 'LinkedHashSet' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
1572


Vector => 28k bigger:
$ grep '^<' nm-diff.txt | grep -E 'Vector' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
43362
$ grep '^>' nm-diff.txt | grep -E 'Vector' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
72182


InlinedGlobalMarkingVisitor => 170k bigger:
$ grep '^>' nm-diff.txt | grep -E 'InlinedGlobalMarkingVisitor' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
401006
$ grep '^<' nm-diff.txt | grep -E 'InlinedGlobalMarkingVisitor' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
223794


MarkingVisitor => < 1kb:
$ grep '^>' nm-diff.txt | grep -E 'registerMovingObject' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
690


You can see all symbols related to InlinedGlobalMarkingVisitor via:
grep InlinedGlobalMarkingVisitor nm-diff.txt | less

to get a feel for why it's so bloaty (massive template explosion).



Comment 3 by sigbjo...@opera.com, Jan 18 2017

Cc: sigbjo...@opera.com
Components: Blink>MemoryAllocator>GarbageCollection
Interesting findings (thanks!), need to get a handle on why.

Adding a Component for tracking.

Comment 4 by sigbjo...@opera.com, Jan 18 2017

The CL in question extended InlinedGlobalMarkingVisitor with a const field (an enum, MarkingMode) and some inlined methods on its derived classes. There doesn't appear to be too many inlinings of these. MarkingMode was also extended with an enum tag, which means that MarkingVisitor<> will be instantianted extra for this new value. I don't think that explains it.

We've subsequently simplified the visitor implementation, reducing the use of templates -- last change in the area being https://codereview.chromium.org/2625363002 (r443444).
r443444 looks to have actually made libchrome.so marginally bigger:
https://chromeperf.appspot.com/report?sid=cfc29eed1238fd38fb5e6cf83bdba6c619be621b606e03e5dfc2e99db14c418b&rev=443444

Here's a count of symbols with InlinedGlobalMarkingVisitor in their type:
$ grep InlinedGlobalMarkingVisitor nm.after | grep -v 'non-virtual thunk' | wc -l
3446

Here's the count minus ones that just have it as a parameter type:
grep InlinedGlobalMarkingVisitor nm.after | grep -v 'non-virtual thunk' | perl -p -e 's/\(.*?\)//g' | grep InlinedGlobalMarkingVisitor | wc -l
1260

And here is the list of the top 10 largest symbols:
00000720 t void blink::MediaControls::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000530 t void blink::HTMLMediaElement::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000524 t void blink::Document::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000470 t void blink::ElementRareData::traceAfterDispatchImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000470 t void blink::LocalDOMWindow::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000418 t void blink::LocalFrame::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
00000410 t void blink::Page::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
000003d4 t void blink::WebDevToolsAgentImpl::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
000003cc t void blink::StyleEngine::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
0000038c t void blink::EventHandler::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)


The biggest one here is MediaControls::traceImpl<>, which actually expands into 4 symbols:

grep MediaControls:.*trace nm.after | grep -v 'non-virtual thunk' | perl -p -e 's/clone.*?\]/clone]/g'
00000720 t void blink::MediaControls::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor)
000004e8 t void blink::MediaControls::traceImpl<blink::Visitor*>(blink::Visitor*)
00000010 t blink::MediaControls::trace(blink::InlinedGlobalMarkingVisitor)
00000004 t blink::MediaControls::trace(blink::Visitor*)

The definition of the function is here:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/shadow/MediaControls.cpp?sq=package:chromium&l=951

pasted:
DEFINE_TRACE(MediaControls) {
  visitor->trace(m_mediaElement);
  visitor->trace(m_panel);
  visitor->trace(m_overlayPlayButton);
  visitor->trace(m_overlayEnclosure);
  visitor->trace(m_playButton);
  visitor->trace(m_currentTimeDisplay);
  visitor->trace(m_timeline);
  visitor->trace(m_muteButton);
  visitor->trace(m_volumeSlider);
  visitor->trace(m_toggleClosedCaptionsButton);
  visitor->trace(m_fullscreenButton);
  visitor->trace(m_downloadButton);
  visitor->trace(m_durationDisplay);
  visitor->trace(m_enclosure);
  visitor->trace(m_textTrackList);
  visitor->trace(m_overflowMenu);
  visitor->trace(m_overflowList);
  visitor->trace(m_castButton);
  visitor->trace(m_overlayCastButton);
  visitor->trace(m_mediaEventListener);
  visitor->trace(m_windowEventListener);
  visitor->trace(m_orientationLockDelegate);
  HTMLDivElement::trace(visitor);
}

The method is comprised of 24 function calls, yet takes 0x720+0x4e8=3080 bytes.

Also notable is that there appear to be almost 1500 explicit definitions of trace():
https://cs.chromium.org/search/?q=DEFINE_%5Cw*TRACE+-file:%5C.h


So, ideas for code-size wins here (in order of least crazy to most crazy):
a) convert all DECLARE_TRACE to DECLARE_VIRTUAL_TRACE (and likewise for DEFINE_INLINE_TRACE / DEFINE_INLINE_VIRTUAL_TRACE)
  - theory: this will reduce the size of each traceImpl
b) combine traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) and blink::MediaControls::traceImpl<blink::Visitor*>(blink::Visitor*) into a single symbol
  - theory: this half then number of traceImpl
c) Come up with some "clever" way to not need to explicitly define 1500 traceImpl definitions.
  - theory: this will eliminate most traceImpl overhead entirely

Example idea for c) (which I'm pretty sure is bad):
- Have all oilpan objects layout their data members such that all traceables are at the front. Have them implement a virtual method that returns the count of traceables (where hopefully there's some default count that works for a reasonable number of them). traceImpl could then trace them all by walking memory (sanity checking that addresses are null/-1/in the oilpan heap).

Disclaimer: I'm not a C++ expert, and this is basically the first c++ bloat investigation that I'm doing, so I'm just learning as I go and may be wrong about many things :P.

Comment 6 by sigbjo...@opera.com, Jan 18 2017

Just considering the growth, I notice that blink::MediaControls::traceImpl<..>(blink::InlinedGlobalMarkingVisitor) grew in size from 0x42c to 0x720, and EventHandler::traceImpl<..>(..) from 0x228 to 0x38c. For MediaControls (w/ 24 trace() calls), that's about 800 bytes extra - what could that be due to?

Tracing needs to be fast to keep down GC pause times, which is why we have the InlinedGlobalMarkingVisitor special visitor case alongside the standard one. Trying to come up with a new design that improves space&time behavior for the tracing of oilpan objects, might be worth doing, but that would be a larger, longer-term project - agree?
I'd guess the size is due to the implementations of these calling a lot of inlined methods (the symbol sizes include all inlining that happen). I'd be curious to see the new sizes after all DEFINE_INLINE_* are removed.

Reducing code size increases locality at the expense of more jumps, so it would certainly be worth measuring whether redesigning would be faster / slower.

I filed  bug 681694  to be able to investigate template bloat more easily, but just doing a one-off for trace():

$ grep 'Visitor' nm.after | grep trace | cut -c1-8 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
825444

This leads me to believe that the current tracing design takes up 825kb!

I'll defer to you how much time / effort it would take to change the design to take up less space though. This is my first time looking at the code.

Comment 8 by sigbjo...@opera.com, Jan 19 2017

Trying to reproduce w/ linux64-release, taking a while.. but re: #4, it'd be worthwhile to check ToT's traceImpl sizes.
Blocking: 482401
With linux64-release, i only see a 66 byte increase for blink::MediaControls::traceImpl<..>(blink::InlinedGlobalMarkingVisitor) with r438125; EventHandler's traceImpl() increases with 37 bytes.

That's considerably less than the corresponding 756 and 356 bytes increases reported above (cf. #6)
what are the toolchain details for this android builder?
The biggest difference is that it's still on gcc rather than clang ( bug 481675 ).

The easiest way to see what's going on is to build with "ninja -v" to show all the flags that are used.
thanks, and this is with is_official_build=true, i guess (?)

Attached is the code generated for linux64-release for the MediaControls tracing code - nothing too bad going on there (but room for improvement, i notice.) You wouldn't be able to do same for the build you already have? Too much is being inlined, would be the first guess.

(I don't have access to anything like goma, so it'll take me a while to do this cross-compile at this end, but could try tomorrow.)
mediacontrols-traceImpl-x86.asm
25.7 KB View Download
What command did you run to create the disassembly?
"objdump -dC libblink_core.so" and then manually snip out the relevant portion out of that (rather large) dump.
Attached for the android build (at tip-of-tree).
Note - I used a non-component build with is_official_build.


MediaControls.trace.arm.asm.txt
88.8 KB View Download
re #8, just reran the command from #7 on tip-of-tree and got:
$ grep 'Visitor' nm.after | grep trace | cut -c1-8 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc
878542
Thanks, will look closer later, but it inlines  https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/TraceTraits.h?q=AdjustAndMark&sq=package:chromium&l=51 for each trace() , which the above x86 version doesn't.
Filed  issue 683019  for looking at our tracing method code generation.
Have a look build; trying to determine what triggers g++ to start inlining these methods. We can add explicit annotations to prevent it, if it comes to that.
Putting a NOINLINE on the static method referred to in #18, takes care of the size regression. I rather doubt it will hurt performance, but I have no data to back up that claim :)

I still think it is something we want to avoid doing, as constraining the range of code optimizations is preferably avoided..without a clear reason. i.e., I haven't yet understood what puts g++ (4.9) over the inlining edge here. Will continue investigating, still some things to try & experiment with.

https://codereview.chromium.org/2642933005/ has a code simplification in this area; a worthwhile change, without it solving this one.
Forgot to add, this doesn't reproduce w/ clang for this target.
Cc: h...@chromium.org thakis@chromium.org
For g++-4.9, this reproduces with -Os (optimize for size), but not with -O2 nor -O3. So this looks like an optimization bundle selection generating suboptimal code on the gcc side. Can't reproduce it w/ x86, just what's bundled for android,

GNU C (GCC) version 4.9.x 20150123 (prerelease) (arm-linux-androideabi)

I've tried to selectively enable various optimizations (for size, and others) listed on https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html along with -Os, to no avail. It ends up being too eager to inline (see above), and generates far too much code for these trace methods.

Anyone know a good person to check with wrt gcc optimization quality etc? I'm currently at a loss.
I'd suggest just adding the NOINLINE with a comment and move on. While it does constrain the compiler from inlining, that's exactly what we want anyways. Add a comment that points to this bug and that should be sufficient explanation.
Final update for today.. :) Tuning down -finline-limit to around 20 (from 64) is effective in disabling inlining for the method mentioned in #18. Which avoids the problem -- the problem being that the inlining machinery will inline that mark() method without taking the resulting growth in code size at the call site into account.

N calls to trace() (each of which inlines to a null-check + mark() call) => N inlinings of that mark() method => too much code.
Re: #24, that's where it is heading & making it conditional on __GNUC__
Project Member

Comment 27 by bugdroid1@chromium.org, Jan 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/715f6ff05807d86ba55be0614c999a823a32f850

commit 715f6ff05807d86ba55be0614c999a823a32f850
Author: sigbjornf <sigbjornf@opera.com>
Date: Sat Jan 21 01:38:09 2017

Have VisitorHelper<> handle moving object registration.

As the registration of objects is only done by a compacting
GC, there isn't any need to be indirect about registering
them - remove a layer of Visitor virtual methods and have
the VisitorHelper<> do the registration directly.

R=haraken
BUG= 681991 

Review-Url: https://codereview.chromium.org/2642933005
Cr-Commit-Position: refs/heads/master@{#445245}

[modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/HeapTest.cpp
[modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/MarkingVisitor.h
[modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/MarkingVisitorImpl.h
[modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/SparseHeapBitmap.cpp
[modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/Visitor.h

Project Member

Comment 28 by bugdroid1@chromium.org, Jan 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9fa52ef0da5a0a6e03a15c4c72d8021567070127

commit 9fa52ef0da5a0a6e03a15c4c72d8021567070127
Author: sigbjornf <sigbjornf@opera.com>
Date: Sat Jan 21 14:26:38 2017

Disable g++ inlining of eager-tracing mark() method.

Versions of g++ (with -Os) are over-eager about inlining the mark()
method that's used for all non-mixin Oilpan objects, resulting in
a code size increase that's unwanted (for Android official builds.)
Other compilers and g++ (with -O2/-O3) are more discriminate about
inlining the method, with no comparable code size increase delta.

Tuning the compiler's optimization option set to avoid the problem
hasn't proven successful, so bluntly address the problem by disabling
inlining for the method.

R=haraken
BUG= 681991 

Review-Url: https://codereview.chromium.org/2643403003
Cr-Commit-Position: refs/heads/master@{#445288}

[modify] https://crrev.com/9fa52ef0da5a0a6e03a15c4c72d8021567070127/third_party/WebKit/Source/platform/heap/TraceTraits.h

Given that the size increase happens with -Os, I checked how -O3 fared for code size. About ~1G larger (libcontent_shell_content_view.so) than -Os; -O2 isn't much better.

From this end, no further work planned, but will watch out for any Android perf regressions. If anyone's run into this kind of inlining anomaly w/ g++ elsewhere, pointer appreciated.
Status: Fixed (was: Started)
The no-inline saved 140kb. 

https://chromeperf.appspot.com/report?sid=cfc29eed1238fd38fb5e6cf83bdba6c619be621b606e03e5dfc2e99db14c418b&rev=445288

Happy to mark this particular regression as fixed :). I think there might be more savings to be had in the tracing code / DOM bindings in general though. That's an exploration for another day.
 Issue 683019  is the one for that.

Sign in to add a comment