Investigate code size bloat caused by oilpan heap compaction (180kb) |
|||||
Issue descriptionThis change increase libchrome.so by 180kb on Android: https://codereview.chromium.org/2531973002 Let's figure out why!
,
Jan 17 2017
The difference report by nm is 238908, but maybe this is because it's run on unstripped .so: $ grep '^>' nm-diff.txt | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 654172 $ grep '^<' nm-diff.txt | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 415264 Here are some specific size growths from the major code added: HeapCompact: $ grep '^>' nm-diff.txt | grep 'HeapCompact' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 2190 Changes to HeapPage.cpp: $ grep '^>' nm-diff.txt | grep -E 'NormalPageArena|arenaSize|sweepAndCompact|freeListSize' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 678 SparseHeapBitmap: $ grep '^>' nm-diff.txt | grep -E 'SparseHeapBitmap' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 456 Size bloat from changes to templated classes: LinkedHashSet => 3.5k bigger: $ grep '^>' nm-diff.txt | grep -E 'LinkedHashSet' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 5108 $ grep '^<' nm-diff.txt | grep -E 'LinkedHashSet' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 1572 Vector => 28k bigger: $ grep '^<' nm-diff.txt | grep -E 'Vector' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 43362 $ grep '^>' nm-diff.txt | grep -E 'Vector' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 72182 InlinedGlobalMarkingVisitor => 170k bigger: $ grep '^>' nm-diff.txt | grep -E 'InlinedGlobalMarkingVisitor' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 401006 $ grep '^<' nm-diff.txt | grep -E 'InlinedGlobalMarkingVisitor' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 223794 MarkingVisitor => < 1kb: $ grep '^>' nm-diff.txt | grep -E 'registerMovingObject' | cut -c3-10 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 690 You can see all symbols related to InlinedGlobalMarkingVisitor via: grep InlinedGlobalMarkingVisitor nm-diff.txt | less to get a feel for why it's so bloaty (massive template explosion).
,
Jan 18 2017
Interesting findings (thanks!), need to get a handle on why. Adding a Component for tracking.
,
Jan 18 2017
The CL in question extended InlinedGlobalMarkingVisitor with a const field (an enum, MarkingMode) and some inlined methods on its derived classes. There doesn't appear to be too many inlinings of these. MarkingMode was also extended with an enum tag, which means that MarkingVisitor<> will be instantianted extra for this new value. I don't think that explains it. We've subsequently simplified the visitor implementation, reducing the use of templates -- last change in the area being https://codereview.chromium.org/2625363002 (r443444).
,
Jan 18 2017
r443444 looks to have actually made libchrome.so marginally bigger: https://chromeperf.appspot.com/report?sid=cfc29eed1238fd38fb5e6cf83bdba6c619be621b606e03e5dfc2e99db14c418b&rev=443444 Here's a count of symbols with InlinedGlobalMarkingVisitor in their type: $ grep InlinedGlobalMarkingVisitor nm.after | grep -v 'non-virtual thunk' | wc -l 3446 Here's the count minus ones that just have it as a parameter type: grep InlinedGlobalMarkingVisitor nm.after | grep -v 'non-virtual thunk' | perl -p -e 's/\(.*?\)//g' | grep InlinedGlobalMarkingVisitor | wc -l 1260 And here is the list of the top 10 largest symbols: 00000720 t void blink::MediaControls::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000530 t void blink::HTMLMediaElement::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000524 t void blink::Document::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000470 t void blink::ElementRareData::traceAfterDispatchImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000470 t void blink::LocalDOMWindow::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000418 t void blink::LocalFrame::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 00000410 t void blink::Page::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 000003d4 t void blink::WebDevToolsAgentImpl::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 000003cc t void blink::StyleEngine::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 0000038c t void blink::EventHandler::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) The biggest one here is MediaControls::traceImpl<>, which actually expands into 4 symbols: grep MediaControls:.*trace nm.after | grep -v 'non-virtual thunk' | perl -p -e 's/clone.*?\]/clone]/g' 00000720 t void blink::MediaControls::traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) 000004e8 t void blink::MediaControls::traceImpl<blink::Visitor*>(blink::Visitor*) 00000010 t blink::MediaControls::trace(blink::InlinedGlobalMarkingVisitor) 00000004 t blink::MediaControls::trace(blink::Visitor*) The definition of the function is here: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/shadow/MediaControls.cpp?sq=package:chromium&l=951 pasted: DEFINE_TRACE(MediaControls) { visitor->trace(m_mediaElement); visitor->trace(m_panel); visitor->trace(m_overlayPlayButton); visitor->trace(m_overlayEnclosure); visitor->trace(m_playButton); visitor->trace(m_currentTimeDisplay); visitor->trace(m_timeline); visitor->trace(m_muteButton); visitor->trace(m_volumeSlider); visitor->trace(m_toggleClosedCaptionsButton); visitor->trace(m_fullscreenButton); visitor->trace(m_downloadButton); visitor->trace(m_durationDisplay); visitor->trace(m_enclosure); visitor->trace(m_textTrackList); visitor->trace(m_overflowMenu); visitor->trace(m_overflowList); visitor->trace(m_castButton); visitor->trace(m_overlayCastButton); visitor->trace(m_mediaEventListener); visitor->trace(m_windowEventListener); visitor->trace(m_orientationLockDelegate); HTMLDivElement::trace(visitor); } The method is comprised of 24 function calls, yet takes 0x720+0x4e8=3080 bytes. Also notable is that there appear to be almost 1500 explicit definitions of trace(): https://cs.chromium.org/search/?q=DEFINE_%5Cw*TRACE+-file:%5C.h So, ideas for code-size wins here (in order of least crazy to most crazy): a) convert all DECLARE_TRACE to DECLARE_VIRTUAL_TRACE (and likewise for DEFINE_INLINE_TRACE / DEFINE_INLINE_VIRTUAL_TRACE) - theory: this will reduce the size of each traceImpl b) combine traceImpl<blink::InlinedGlobalMarkingVisitor>(blink::InlinedGlobalMarkingVisitor) and blink::MediaControls::traceImpl<blink::Visitor*>(blink::Visitor*) into a single symbol - theory: this half then number of traceImpl c) Come up with some "clever" way to not need to explicitly define 1500 traceImpl definitions. - theory: this will eliminate most traceImpl overhead entirely Example idea for c) (which I'm pretty sure is bad): - Have all oilpan objects layout their data members such that all traceables are at the front. Have them implement a virtual method that returns the count of traceables (where hopefully there's some default count that works for a reasonable number of them). traceImpl could then trace them all by walking memory (sanity checking that addresses are null/-1/in the oilpan heap). Disclaimer: I'm not a C++ expert, and this is basically the first c++ bloat investigation that I'm doing, so I'm just learning as I go and may be wrong about many things :P.
,
Jan 18 2017
Just considering the growth, I notice that blink::MediaControls::traceImpl<..>(blink::InlinedGlobalMarkingVisitor) grew in size from 0x42c to 0x720, and EventHandler::traceImpl<..>(..) from 0x228 to 0x38c. For MediaControls (w/ 24 trace() calls), that's about 800 bytes extra - what could that be due to? Tracing needs to be fast to keep down GC pause times, which is why we have the InlinedGlobalMarkingVisitor special visitor case alongside the standard one. Trying to come up with a new design that improves space&time behavior for the tracing of oilpan objects, might be worth doing, but that would be a larger, longer-term project - agree?
,
Jan 19 2017
I'd guess the size is due to the implementations of these calling a lot of inlined methods (the symbol sizes include all inlining that happen). I'd be curious to see the new sizes after all DEFINE_INLINE_* are removed. Reducing code size increases locality at the expense of more jumps, so it would certainly be worth measuring whether redesigning would be faster / slower. I filed bug 681694 to be able to investigate template bloat more easily, but just doing a one-off for trace(): $ grep 'Visitor' nm.after | grep trace | cut -c1-8 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 825444 This leads me to believe that the current tracing design takes up 825kb! I'll defer to you how much time / effort it would take to change the design to take up less space though. This is my first time looking at the code.
,
Jan 19 2017
Trying to reproduce w/ linux64-release, taking a while.. but re: #4, it'd be worthwhile to check ToT's traceImpl sizes.
,
Jan 19 2017
,
Jan 19 2017
With linux64-release, i only see a 66 byte increase for blink::MediaControls::traceImpl<..>(blink::InlinedGlobalMarkingVisitor) with r438125; EventHandler's traceImpl() increases with 37 bytes. That's considerably less than the corresponding 756 and 356 bytes increases reported above (cf. #6)
,
Jan 19 2017
what are the toolchain details for this android builder?
,
Jan 19 2017
The biggest difference is that it's still on gcc rather than clang ( bug 481675 ). The easiest way to see what's going on is to build with "ninja -v" to show all the flags that are used.
,
Jan 19 2017
thanks, and this is with is_official_build=true, i guess (?) Attached is the code generated for linux64-release for the MediaControls tracing code - nothing too bad going on there (but room for improvement, i notice.) You wouldn't be able to do same for the build you already have? Too much is being inlined, would be the first guess. (I don't have access to anything like goma, so it'll take me a while to do this cross-compile at this end, but could try tomorrow.)
,
Jan 19 2017
What command did you run to create the disassembly?
,
Jan 19 2017
"objdump -dC libblink_core.so" and then manually snip out the relevant portion out of that (rather large) dump.
,
Jan 19 2017
Attached for the android build (at tip-of-tree). Note - I used a non-component build with is_official_build.
,
Jan 19 2017
re #8, just reran the command from #7 on tip-of-tree and got: $ grep 'Visitor' nm.after | grep trace | cut -c1-8 | paste -sd+ - | tr 'a-z' 'A-Z' | cat <(echo "ibase=16;") - | bc 878542
,
Jan 19 2017
Thanks, will look closer later, but it inlines https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/TraceTraits.h?q=AdjustAndMark&sq=package:chromium&l=51 for each trace() , which the above x86 version doesn't.
,
Jan 20 2017
Filed issue 683019 for looking at our tracing method code generation.
,
Jan 20 2017
Have a look build; trying to determine what triggers g++ to start inlining these methods. We can add explicit annotations to prevent it, if it comes to that.
,
Jan 20 2017
Putting a NOINLINE on the static method referred to in #18, takes care of the size regression. I rather doubt it will hurt performance, but I have no data to back up that claim :) I still think it is something we want to avoid doing, as constraining the range of code optimizations is preferably avoided..without a clear reason. i.e., I haven't yet understood what puts g++ (4.9) over the inlining edge here. Will continue investigating, still some things to try & experiment with. https://codereview.chromium.org/2642933005/ has a code simplification in this area; a worthwhile change, without it solving this one.
,
Jan 20 2017
Forgot to add, this doesn't reproduce w/ clang for this target.
,
Jan 20 2017
For g++-4.9, this reproduces with -Os (optimize for size), but not with -O2 nor -O3. So this looks like an optimization bundle selection generating suboptimal code on the gcc side. Can't reproduce it w/ x86, just what's bundled for android, GNU C (GCC) version 4.9.x 20150123 (prerelease) (arm-linux-androideabi) I've tried to selectively enable various optimizations (for size, and others) listed on https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html along with -Os, to no avail. It ends up being too eager to inline (see above), and generates far too much code for these trace methods. Anyone know a good person to check with wrt gcc optimization quality etc? I'm currently at a loss.
,
Jan 20 2017
I'd suggest just adding the NOINLINE with a comment and move on. While it does constrain the compiler from inlining, that's exactly what we want anyways. Add a comment that points to this bug and that should be sufficient explanation.
,
Jan 20 2017
Final update for today.. :) Tuning down -finline-limit to around 20 (from 64) is effective in disabling inlining for the method mentioned in #18. Which avoids the problem -- the problem being that the inlining machinery will inline that mark() method without taking the resulting growth in code size at the call site into account. N calls to trace() (each of which inlines to a null-check + mark() call) => N inlinings of that mark() method => too much code.
,
Jan 20 2017
Re: #24, that's where it is heading & making it conditional on __GNUC__
,
Jan 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/715f6ff05807d86ba55be0614c999a823a32f850 commit 715f6ff05807d86ba55be0614c999a823a32f850 Author: sigbjornf <sigbjornf@opera.com> Date: Sat Jan 21 01:38:09 2017 Have VisitorHelper<> handle moving object registration. As the registration of objects is only done by a compacting GC, there isn't any need to be indirect about registering them - remove a layer of Visitor virtual methods and have the VisitorHelper<> do the registration directly. R=haraken BUG= 681991 Review-Url: https://codereview.chromium.org/2642933005 Cr-Commit-Position: refs/heads/master@{#445245} [modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/HeapTest.cpp [modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/MarkingVisitor.h [modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/MarkingVisitorImpl.h [modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/SparseHeapBitmap.cpp [modify] https://crrev.com/715f6ff05807d86ba55be0614c999a823a32f850/third_party/WebKit/Source/platform/heap/Visitor.h
,
Jan 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9fa52ef0da5a0a6e03a15c4c72d8021567070127 commit 9fa52ef0da5a0a6e03a15c4c72d8021567070127 Author: sigbjornf <sigbjornf@opera.com> Date: Sat Jan 21 14:26:38 2017 Disable g++ inlining of eager-tracing mark() method. Versions of g++ (with -Os) are over-eager about inlining the mark() method that's used for all non-mixin Oilpan objects, resulting in a code size increase that's unwanted (for Android official builds.) Other compilers and g++ (with -O2/-O3) are more discriminate about inlining the method, with no comparable code size increase delta. Tuning the compiler's optimization option set to avoid the problem hasn't proven successful, so bluntly address the problem by disabling inlining for the method. R=haraken BUG= 681991 Review-Url: https://codereview.chromium.org/2643403003 Cr-Commit-Position: refs/heads/master@{#445288} [modify] https://crrev.com/9fa52ef0da5a0a6e03a15c4c72d8021567070127/third_party/WebKit/Source/platform/heap/TraceTraits.h
,
Jan 21 2017
Given that the size increase happens with -Os, I checked how -O3 fared for code size. About ~1G larger (libcontent_shell_content_view.so) than -Os; -O2 isn't much better. From this end, no further work planned, but will watch out for any Android perf regressions. If anyone's run into this kind of inlining anomaly w/ g++ elsewhere, pointer appreciated.
,
Jan 23 2017
The no-inline saved 140kb. https://chromeperf.appspot.com/report?sid=cfc29eed1238fd38fb5e6cf83bdba6c619be621b606e03e5dfc2e99db14c418b&rev=445288 Happy to mark this particular regression as fixed :). I think there might be more savings to be had in the tracing code / DOM bindings in general though. That's an exploration for another day.
,
Jan 23 2017
Issue 683019 is the one for that. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by agrieve@chromium.org
, Jan 17 20171.6 MB
1.6 MB View Download