Issue metadata
Sign in to add a comment
|
0.1% regression in resource_sizes (MonochromePublic.apk) at 577387:577387 |
||||||||||||||||||
Issue description53343 byte binary size increase due to the following V8 roll. This will require a bisect to pick out the offending V8 commits. commit: d9eb39755b89dccb176e3b8a9503cbf26106e2b9 author v8-ci-autoroll-builder <v8-ci-autoroll-builder@chops-service-accounts.iam.gserviceaccount.com> Tue Jul 24 01:16:13 2018 committer Commit Bot <commit-bot@chromium.org> Tue Jul 24 01:16:13 2018 Update V8 to version 7.0.2. Summary of changes available at: https://chromium.googlesource.com/v8/v8/+log/d7b61abe..f0bf9a40 Please follow these instructions for assigning/CC'ing issues: https://github.com/v8/v8/wiki/Triaging%20issues
,
Jul 30
Assigning to v8-ci-autoroll-builder@chops-service-accounts.iam.gserviceaccount.com because this is the only CL in range: Update V8 to version 7.0.2. Summary of changes available at: https://chromium.googlesource.com/v8/v8/+log/d7b61abe..f0bf9a40 Please follow these instructions for assigning/CC'ing issues: https://github.com/v8/v8/wiki/Triaging%20issues Please close rolling in case of a roll revert: https://v8-roll.appspot.com/ This only works with a Google account. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;luci.chromium.try:android_optional_gpu_tests_rel TBR=hablich@chromium.org,v8-waterfall-sheriff@grotations.appspotmail.com Change-Id: Ic7e97a0961ba28c3c69ea9fb051e32023f78577d Reviewed-on: https://chromium-review.googlesource.com/1146776 Reviewed-by: V8 Autoroller <v8-ci-autoroll-builder@chops-service-accounts.iam.gserviceaccount.com> Commit-Queue: V8 Autoroller <v8-ci-autoroll-builder@chops-service-accounts.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#577387}
,
Jul 30
Outer stepped run across the full set of V8 changes in that roll:
+20418 bytes MonochromePublic.apk_Specifics normalized apk size for range: 8043f283f66478b0e8bbf536483962c4a5248978..bd4387dc73b460642cf4aedd50074009047c8553
+8385 bytes MonochromePublic.apk_Specifics normalized apk size for range: 23cb219fb2a30f345ba83cc59fcff4a16f620088..b6f7ea580595f98b89fc47c50f9ccfbbd3b9c448
+8313 bytes MonochromePublic.apk_Specifics normalized apk size for range: 48e5ef55637628ec0f734443068ed21b7efecd96..e8c5a51c3bd3a24be239ddb81bcf111c23af829a
+4179 bytes MonochromePublic.apk_Specifics normalized apk size for range: e8c5a51c3bd3a24be239ddb81bcf111c23af829a..8043f283f66478b0e8bbf536483962c4a5248978
+4095 bytes MonochromePublic.apk_Specifics normalized apk size for range: af0451d96b474c1b4ae32f92b546e842c6d87273..84efdf02491b686d705eecb8c0ab23997f2b29e5
+4 bytes MonochromePublic.apk_Specifics normalized apk size for range: 84efdf02491b686d705eecb8c0ab23997f2b29e5..54723da77150699df42c9fc21f47d4013b52cd7f
+1 bytes MonochromePublic.apk_Specifics normalized apk size for range: b6f7ea580595f98b89fc47c50f9ccfbbd3b9c448..1f82fb540141b07e9cb197f1246c1b5b71240802
+0 bytes MonochromePublic.apk_Specifics normalized apk size for range: a796715eb5d3dce15ddc16b54e0311f5261b7136..48e5ef55637628ec0f734443068ed21b7efecd96
+0 bytes MonochromePublic.apk_Specifics normalized apk size for range: 54723da77150699df42c9fc21f47d4013b52cd7f..23cb219fb2a30f345ba83cc59fcff4a16f620088
-1 bytes MonochromePublic.apk_Specifics normalized apk size for range: bd4387dc73b460642cf4aedd50074009047c8553..af0451d96b474c1b4ae32f92b546e842c6d87273
,
Jul 30
+20483 bytes MonochromePublic.apk_Specifics normalized apk size for range: ac0c19b623cdf30c42c5893379a05d555cc2722c..c941f11abddf57844e9807503df1877f6f60c94a commit c941f11abddf57844e9807503df1877f6f60c94a Author: Leszek Swirski <leszeks@chromium.org> Date: Wed Jul 18 16:11:31 2018 +0100 [sfi] Remove SFI function identifier field Remove the function identifier field from SharedFunctionInfo. This field would store one of a) the function's inferred name, b) the "builtin function id", or c) debug info. We remove these in turn: a) The function's inferred name is available on the ScopeInfo, so like the start/end position we read it off either the ScopeInfo (for compiled functions) or the UncompiledData (for uncompiled functions). As a side-effect, now both UncompiledData and its subclass, UncompiledDataWithPreparsedScope, contain a pointer field. To keep BodyDescriptors manageable, we introduce a SubclassBodyDescriptor which effectively appends two BodyDescriptors together. b) The builtin function id is < 255, so we can steal a byte from expected no. of properies (also <255) and store these together. Eventually we want to get rid of this field and use the builtin ID, but this is pending JS builtin removal. As a side-effect, BuiltinFunctionId becomes an enum class (for better storage size guarantees). c) The debug info can hang off anything (since it stores the field it replaces), so we can attach it to the script field instead. This saves a word on compiled function (uncompiled functions unfortunately still have to store it in UncompiledData). Bug: chromium:818642 Change-Id: I8b4b3a070f0fe328aafcaeac58842d144d12d996 Reviewed-on: https://chromium-review.googlesource.com/1138328 Reviewed-by: Yang Guo <yangguo@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Benedikt Meurer <bmeurer@chromium.org> Reviewed-by: Camillo Bruni <cbruni@chromium.org> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> Commit-Queue: Leszek Swirski <leszeks@chromium.org> Cr-Commit-Position: refs/heads/master@{#54543}
,
Jul 30
******************************Resource Sizes Diff****************************** For an explanation of these metrics, see: https://chromium.googlesource.com/chromium/src/+/master/docs/speed/binary_size/metrics.md#Metrics-for-Android Specifics: +20,483 bytes normalized apk size +20,480 bytes main lib size InstallSize: +20,483 bytes APK size +20,483.0 bytes Estimated installed size InstallBreakdown (+19,439 bytes): -1,044 bytes V8 Snapshots size +20,480 bytes Native code size +3 bytes Package metadata size
,
Jul 30
leszeks@, I did a bisect across the changes in the V8 roll mentioned above, and the largest contributor to binary size appears to be your CL. We investigate increases 16 KB or larger. Debugging size regressions is documented at: https://chromium.googlesource.com/chromium/src/+/master/docs/speed/apk_size_regressions.md#Debugging-Apk-Size-Increase Could you please comment on whether this size increase is expected? If so, could you offer an explanation?
,
Jul 31
Hm, this isn't expected at all -- maybe a small code size increase, but nothing of this magnitude. 20,480 is a suspiciously round number, could this be going over the edge of a page or something?
,
Jul 31
+rmcilroy FYI
,
Jul 31
The page comment make sense, but aren't the pages 4K? Presumably this is still a big delta, even if rounded up to the nearest 4K. Apologies that I don't know exactly. Do you have a way of inspecting binary size within the V8 lib directly? I've seen that there are V8 binary size graphs available... it'd be good to know if you can measure the impact of the change outside a full-Chrome build, as a sanity check.
,
Aug 1
The v8 binary size graphs are here: https://chromeperf.appspot.com/report?sid=9b798f1f5414befee07dd0dee28b3ff618e167cd846c2be1a1d42694578ce768&start_rev=53669&end_rev=54745 It looks like there was a ~50KB increase in libv8.so around that CL, although unfortunately there is a big gap between r54373 and r54553 so it's not clear exactly which CL caused it.
,
Aug 1
Locally (on x64) I see hardly any increase in code size: before: 15263584 out/x64.release.clean/libv8.so after: 15263680 out/x64.release.clean/libv8.so
,
Aug 1
Oh, actually, that's on the reland. On the original, linked here, there is a big code size difference: before: 15258552 out/x64.release.clean/libv8.so after: 15262744 out/x64.release.clean/libv8.so
,
Aug 1
Running through a few more CLs (the CL before the original "remove function literal id" patch, all the way to the reland), shows the size fluctuating by what looks like about 4K (so, a page). What's weird is that this is x64, while the regression is on ARM (I assume). It's strange that the same CL(s?) should hit the a size threshold on two difference architectures. feb20872c3 [turbofan] Add ToNumber for strings and oddballs 15258552 1d4a1172f5 [sfi] Remove SFI function literal id field 15262744 e3b4ffa9ba [tools] Improve function event logging and parse processor 15262744 58578584d6 Revert "[sfi] Remove SFI function literal id field" 15258552 941d5f960e [turbofan] Remove optimization for Cons strings 15258552 087cc34788 [fuzzer] Fix timeout in v8_script_parser_fuzzer due to unnecessary long inputs. 15258552 8a011b57d8 [explicit isolates] Remove nearly all GetIsolates in api.cc 15263464 70aacc2e5f [wasm] Make WasmCompilationUnit independent of Isolate. 15263416 46a93c9e60 [turbofan] Brokerize JSCreateLowering::ReduceJSCreateGeneratorObject. 15263584 414b841b54 [explicit isolates] Make IsDereferenceAllowed true for RO_SPACE 15263584 5dee5ade75 [sfi] Remove SFI function literal id field (reland^2) 15263680
,
Aug 1
Hmm, in comment #12, that's still only a 4K increase. How are you doing the measurements?
From my diagnose-bloat run, here's the breakdown of the top contributing symbol size changes for that commit. Do the changes look plausible to you?
Index | Running Total | Section@Address | Δ PSS (Δ size_without_padding) | Path
------------------------------------------------------------
~ 0) 12364 (60.4%) t@0xe76ba8 12364 (18720->31084) v8/src/heap/concurrent-marking.cc
v8::internal::ConcurrentMarking::Run
~ 1) 19416 (94.8%) t@0xea4d08 7052 (3968->11020) v8/src/heap/mark-compact.cc
v8::internal::MarkCompactCollector::ProcessMarkingWorklistInternal<>
- 2) 17788 (86.8%) t@0x0 -1628 (1628->0) v8/src/heap/concurrent-marking.cc
v8::internal::ConcurrentMarkingVisitor::VisitLeftTrimmableArray<>
~ 3) 16744 (81.7%) o@0x0 -1044 (1433964->1432920) v8/snapshot_blob_32.bin
assets/snapshot_blob_32.bin
~ 4) 17736 (86.6%) t@0xebb218 992 (6540->7532) v8/src/heap/scavenger.cc
v8::internal::Scavenger::Process
~ 5) 18460 (90.1%) t@0xf32bac 724 (724->1448) v8/src/objects.cc
v8::internal::SharedFunctionInfo::InitFromFunctionLiteral
,
Aug 1
I'm re-running a clean comparison on just the one commit in question, to make sure I get the same result.
,
Aug 1
I was doing the measurements with du on a local x64 release build, so barely comparable aside from correlation. Concurrent marking should be unrelated to this commit, unless there's something weird going on with instance types... +ulan, any thoughts?
,
Aug 1
Confirmed. $ tools/binary_size/diagnose_bloat.py c941f11abddf57844e9807503df1877f6f60c94a --subrepo v8 .... Specifics: +20,483 bytes normalized apk size +20,480 bytes main lib size agrieve@ just took a peek and mentioned that this is likely in-lining related (it's happened before).
,
Aug 1
Ah, I was looking at the wrong CL here, in the above calculations -- two very similar commit messages :$
,
Aug 1
Well, for that one I see a 12K increase locally on x64: ac0c19b623 [liveedit] Use start position in function lookup 15294256 out/x64.release.clean/libv8.so c941f11abd [sfi] Remove SFI function identifier field 15306848 out/x64.release.clean/libv8.so So, this CL added an instance type with non-trivial visitor, which means adding another case to the visit switch called by ConcurrentMarking::Run. I could buy that being an inlining change, yeah, not sure how actionable it is though.
,
Aug 1
Yes, diffing -Rpass=inline I get: 3698a3699,3733 > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal24ConcurrentMarkingVisitor11ShouldVisitEPNS0_10HeapObjectE inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=205 (threshold=250) [-Rpass=inline] > TYPED_VISITOR_ID_LIST(VISIT) > ^ > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal24ConcurrentMarkingVisitor25AllowDefaultJSObjectVisitEv inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=-35 (threshold=375) [-Rpass=inline] > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal19FixedBodyDescriptorILi8ELi16ELi32EE6SizeOfEPNS0_3MapEPNS0_10HeapObjectE inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=-40 (threshold=487) [-Rpass=inline] > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal19FixedBodyDescriptorILi8ELi16ELi32EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=50 (threshold=325) [-Rpass=inline] > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE21ShouldVisitMapPointerEv inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=always [-Rpass=inline] > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE15VisitMapPointerEPNS0_10HeapObjectEPS5_ inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=always [-Rpass=inline] > ../../src/heap/objects-visiting-inl.h:93:1: remark: _ZN2v88internal10HeapObject8map_slotEv inlined into _ZN2v88internal11HeapVisitorIiNS0_24ConcurrentMarkingVisitorEE40VisitUncompiledDataWithoutPreParsedScopeEPNS0_3MapEPNS0_35UncompiledDataWithoutPreParsedScopeE with cost=-30 (threshold=487) [-Rpass=inline] > ../../src/heap/concurrent-marking.cc:89:12: remark: _ZN2v88internal32UncompiledDataWithPreParsedScope4castEPNS0_6ObjectE inlined into _ZN2v88internal24ConcurrentMarkingVisitor4CastINS0_32UncompiledDataWithPreParsedScopeEEEPT_PNS0_10HeapObjectE with cost=always [-Rpass=inline] > return T::cast(object); > ^ > In file included from ../../src/heap/concurrent-marking.cc:13: > In file included from ../../src/heap/heap-inl.h:19: > In file included from ../../src/heap/incremental-marking-inl.h:8: > In file included from ../../src/heap/incremental-marking.h:11: > In file included from ../../src/heap/mark-compact.h:12: > In file included from ../../src/heap/objects-visiting.h:10: > ../../src/objects-body-descriptors.h:145:12: remark: _ZN2v88internal19FixedBodyDescriptorILi32ELi40ELi40EE6SizeOfEPNS0_3MapEPNS0_10HeapObjectE inlined into _ZN2v88internal22SubclassBodyDescriptorINS0_19FixedBodyDescriptorILi8ELi16ELi32EEENS2_ILi32ELi40ELi40EEEE6SizeOfEPNS0_3MapEPNS0_10HeapObjectE with cost=-40 (threshold=487) [-Rpass=inline] > return ChildBodyDescriptor::SizeOf(map, object); > ^ > ../../src/objects-body-descriptors.h:75:5: remark: _ZN2v88internal18BodyDescriptorBase15IteratePointersINS0_24ConcurrentMarkingVisitorEEEvPNS0_10HeapObjectEiiPT_ inlined into _ZN2v88internal19FixedBodyDescriptorILi32ELi40ELi40EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEPT_ with cost=55 (threshold=325) [-Rpass=inline] > IteratePointers(obj, start_offset, end_offset, v); > ^ > ../../src/objects-body-descriptors.h:81:5: remark: _ZN2v88internal19FixedBodyDescriptorILi32ELi40ELi40EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEPT_ inlined into _ZN2v88internal19FixedBodyDescriptorILi32ELi40ELi40EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ with cost=55 (threshold=325) [-Rpass=inline] > IterateBody(map, obj, v); > ^ > ../../src/objects-body-descriptors.h:139:5: remark: _ZN2v88internal19FixedBodyDescriptorILi8ELi16ELi32EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ inlined into _ZN2v88internal22SubclassBodyDescriptorINS0_19FixedBodyDescriptorILi8ELi16ELi32EEENS2_ILi32ELi40ELi40EEEE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ with cost=50 (threshold=325) [-Rpass=inline] > ParentBodyDescriptor::IterateBody(map, obj, object_size, v); > ^ > ../../src/objects-body-descriptors.h:140:5: remark: _ZN2v88internal19FixedBodyDescriptorILi32ELi40ELi40EE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ inlined into _ZN2v88internal22SubclassBodyDescriptorINS0_19FixedBodyDescriptorILi8ELi16ELi32EEENS2_ILi32ELi40ELi40EEEE11IterateBodyINS0_24ConcurrentMarkingVisitorEEEvPNS0_3MapEPNS0_10HeapObjectEiPT_ with cost=50 (threshold=325) [-Rpass=inline] > ChildBodyDescriptor::IterateBody(map, obj, object_size, v); > ^ > In file included from ../../src/heap/concurrent-marking.cc:15: > In file included from ../../src/heap/mark-compact-inl.h:10:
,
Aug 1
FWIW, I tried reordering the fields of UncompiledData & UncompiledDataWithPreParsedScope so that pointer fields are contiguous and there is only one iteration over UncompiledDataWithPreParsedScope pointers in ConcurrentMarkingVisitor, but it made exactly zero difference to binary size (probably because the loops end up unrolled anyway). Ulan, does it sound reasonable that adding one object visitor would increase concurrent marking size by 12K? Feels like a lot.
,
Aug 1
+mlippautz has experience with this kind of thing as well I think.
,
Aug 1
As far as I can remember, changing the whole infrastructure for marking was in the ballpark of ~50k (instance based vs static). So 12k seems a lot for a single instance type. Also, we've had people change instance types a lot these days and afaik we never got regressions that big. That CL also introduced the subclassing body descriptor, right? We probably have 0 aliasing with this one. Still it should not make up for 12k as it is only used with one type, right?
,
Aug 2
I tried getting rid of that, by moving the pointer fields of UncompiledData and UncompiledDataWithPreParsedScope together, and it didn't help. Which, to be fair, makes sense since there would be no aliasing - OTOH the body descriptor for UncompiledDataWithPreParsedScope existed before this CL, also for pointer fields somewhere in the middle of the object so also unlikely to alias.
Current:
________________
| | <-- UncompiledData::kStartOfPointerFieldsOffset
{ | UncompiledData |
{ | pointer fields |
| ------------ | <-- UncompiledData::kEndOfPointerFieldsOffset
| UncompiledData |
| int fields |
|----------------| <-- UDwPPS::kStartOfPointerFieldsOffset
{ | UDwPPS |
{ | pointer fields |
|----------------| <-- UDwPPS::kEndOfPointerFieldsOffset
| UDwPPS |
| int fields |
|________________|
Attempted fix, with no improvement
________________
| |
| UncompiledData |
| int fields |
| ------------ | <-- UncompiledData::kStartOfPointerFieldsOffset
{ | UncompiledData |
{ | pointer fields |
{ |----------------| <-- UncompiledData::kSize
{ | UDwPPS |
{ | pointer fields |
|----------------| <-- UDwPPS::kEndOfPointerFieldsOffset
| UDwPPS |
| int fields |
|________________|
,
Aug 2
So, I took a look at the (x64) disassembly of v8::internal::ConcurrentMarking::Run, and it barely increased in size, 12408 vs 12860 bytes on my machine. That makes more sense given the CL, but still doesn't explain the binary size difference.
,
Aug 2
Just curious: x64 vs ARM - sounds like you're assuming the size differences should correlate. Aren't we talking about a different architecture, compiler and potential optimizations? Seems like inlining could be done very differently.
,
Aug 2
Certainly, but I do observe a similar magnitude binary size increase on x64, and that's easier for me to debug. It also calls into question the idea that this is a heuristics or alignment difference, since I wouldn't expect those to stay correlated across architectures.
,
Aug 2
Ok, I can finally reproduce this in disassembly, by disassembling an is_official_build=true binary. Still x64.
,
Aug 3
Right, so, my working hypothesis here is actually insufficient inlining. It seems that a bunch of Visits and FixedBodyDescriptor::IterateBody are not being inlined, and I think that if they were inlined, they would share code more heavily. We might be able to fix this by marking a few more things as V8_INLINE, though that's fragile of course.
,
Aug 6
I am back from vacation. The 12K increase for one instance type is not expected.
Could it be that SubclassBodyDescriptor affects inlining for FixedBodyDescriptor?
SubclassBodyDescriptor<
UncompiledData::BodyDescriptor,
FixedBodyDescriptor<kStartOfPointerFieldsOffset,
kEndOfPointerFieldsOffset, kSize>>
Leszek, did you try defining the new body descriptor from scratch (i.e. without using SubclassBodyDescriptor) ?
,
Aug 6
Yes, see #24.
,
Aug 6
I tried V8_INLINE-ing the IterateBody/InteratePointer(s) methods in BodyDescriptor, and the single-line VisitFoo methods in ConcurrentMarkingVisitor, but that backfired and increased binary size by about 220K (not a typo!) |
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 30