Issue metadata
Sign in to add a comment
|
60 kb regression in resource_sizes (MonochromePublic.apk) at 481091:481091 |
||||||||||||||||||||
Issue descriptionCaused by “Make RefCountedThreadSafeBase::AddRef and Release available for inlining” Commit: 5e3bdec771619fd558e176dafe393bfe1690078a Link to size graph: https://chromeperf.appspot.com/report?sid=a097e74b1aa288511afb4cb616efe0f95ba4d347ad61d5e835072f23450938ba&num_points=10&rev=481091 Debugging size regressions is documented at: https://chromium.googlesource.com/chromium/src/+/master/docs/speed/apk_size_regressions.md#Debugging-Apk-Size-Increase
,
Jun 22 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8976066738990104720
,
Jun 22 2017
Growth is from native library size. Partial diff:
Index | Running Total | Section@Address | Δ PSS (Δ size_without_padding) | Path
------------------------------------------------------------
~ 0) -8362 (-13.5%) t@Group -8362 (0->0) {{no path}}
** symbol gaps (count=9)
~ 1) -7774 (-12.6%) t@0x41d2e4 +588 (2584->3172) cc/tiles/tile_manager.cc
cc::TileManager::ScheduleTasks
~ 2) -7318 (-11.8%) t@0x40c66c +456 (888->1344) cc/tiles/gpu_image_decode_cache.cc
cc::GpuImageDecodeCache::GetTaskForImageAndRefInternal
~ 3) -7518 (-12.2%) t@0x2348fac -200 (2012->1812) chrome/browser/io_thread.cc
IOThread::IOThread
~ 4) -7342 (-11.9%) t@0x166f6c4 +176 (384->560) gpu/ipc/service/gpu_channel.cc
gpu::GpuChannel::GpuChannel
~ 5) -7178 (-11.6%) t@0xd85a08 +164 (1996->2160) content/browser/renderer_host/render_process_host_impl.cc
content::RenderProcessHostImpl::CreateMessageFilters
~ 6) -7022 (-11.4%) t@0x410c78 +156 (672->828) cc/tiles/image_controller.cc
cc::ImageController::ProcessNextImageDecodeOnWorkerThread
~ 7) -6866 (-11.1%) t@0x7cb4fc +156 (684->840) ui/gfx/image/image.cc
gfx::Image::As1xPNGBytes const
~ 8) -6714 (-10.9%) t@0x41714c +152 (1272->1424) cc/tiles/software_image_decode_cache.cc
cc::SoftwareImageDecodeCache::GetTaskForImageAndRefInternal
~ 9) -6570 (-10.6%) t@0x428da0 +144 (168->312) cc/trees/layer_tree_host.cc
cc::LayerTreeHost::InitializeThreaded
~ 10) -6426 (-10.4%) t@0xde12be +144 (84->228) content/browser/shared_worker/worker_storage_partition.cc
content::WorkerStoragePartition::WorkerStoragePartition
~ 11) -6284 (-10.2%) t@Group +142 (98->240) cc/trees/task_runner_provider.cc
cc::TaskRunnerProvider::~TaskRunnerProvider (count=2)
~ 12) -6148 (-9.9%) t@0x240d8dc +136 (576->712) chrome/browser/android/preferences/website_preference_bridge.cc
Java_org_chromium_chrome_browser_preferences_website_WebsitePreferenceBridge_nativeClearCookieData
~ 13) -6020 (-9.7%) t@0xde75f8 +128 (764->892) content/browser/storage_partition_impl.cc
content::StoragePartitionImpl::DataDeletionHelper::ClearDataOnUIThread
- 14) -6144 (-9.9%) t@0x0 -124 (124->0) cc/tiles/tile_manager.cc
cc::InsertNodeForTask
- 15) -6266 (-10.1%) t@0x0 -122 (122->0) cc/tiles/image_controller.cc
cc::ImageController::ImageDecodeRequest::operator=
~ 16) -6144 (-9.9%) t@0x160dce2 +122 (16->138) gpu/{{shared}}/3
scoped_refptr::~scoped_refptr
~ 17) -6024 (-9.7%) t@0xde6d38 +120 (1444->1564) content/browser/storage_partition_impl.cc
content::StoragePartitionImpl::Create
~ 18) -5914 (-9.6%) t@Group +110 (98->208) base/files/file_descriptor_watcher_posix.cc
base::FileDescriptorWatcher::Controller::Watcher::~Watcher (count=2)
~ 19) -5804 (-9.4%) t@Group +110 (98->208) base/task_scheduler/scheduler_worker.cc
base::internal::SchedulerWorker::Thread::~Thread (count=2)
~ 20) -5694 (-9.2%) t@0x41f35c +110 (234->344) cc/tiles/tile_manager.cc
cc::InsertNodeForDecodeTask
~ 21) -5586 (-9.0%) t@0x35d02c +108 (480->588) base/synchronization/waitable_event_watcher_posix.cc
base::WaitableEventWatcher::StartWatching
- 22) -5692 (-9.2%) t@0x0 -106 (106->0) cc/trees/layer_tree_host.cc
cc::TaskRunnerProvider::Create
~ 23) -5588 (-9.0%) t@0x379dd4 +104 (912->1016) base/trace_event/memory_dump_manager.cc
base::trace_event::MemoryDumpManager::SetupForTracing
~ 24) -5484 (-8.9%) t@0xd2c124 +104 (1168->1272) content/browser/loader/resource_dispatcher_host_impl.cc
content::ResourceDispatcherHostImpl::BeginNavigationRequest
- 25) -5586 (-9.0%) t@0x0 -102 (102->0) gpu/command_buffer/service/sync_point_manager.cc
base::DefaultRefCountedThreadSafeTraits::Destruct
~ 26) -5484 (-8.9%) t@Group +102 (102->204) cc/layers/picture_layer.cc
cc::PictureLayer::~PictureLayer (count=2)
~ 27) -5383 (-8.7%) t@0xa937d4 +101 (118->160) components/policy/core/common/schema.cc
std::__ndk1::vector::__swap_out_circular_buffer (num_aliases=2->1)
~ 28) -5283 (-8.5%) t@0xc5d308 +100 (568->668) content/browser/byte_stream.cc
content::CreateByteStream
~ 29) -5183 (-8.4%) t@0xd86278 +100 (1788->1888) content/browser/renderer_host/render_process_host_impl.cc
content::RenderProcessHostImpl::RegisterMojoInterfaces
~ 30) -5279 (-8.5%) t@0x2352390 -96 (1376->1280) chrome/browser/metrics/chrome_metrics_service_client.cc
ChromeMetricsServiceClient::RegisterMetricsServiceProviders
~ 31) -5183 (-8.4%) t@Group +96 (1026->1122) base/message_loop/message_loop.cc
base::MessageLoop::~MessageLoop (count=2)
~ 32) -5087 (-8.2%) t@0x160d40e +96 (538->634) gpu/command_buffer/service/sync_point_manager.cc
gpu::SyncPointOrderData::BeginProcessingOrderNumber
~ 33) -4991 (-8.1%) t@0x160d8bc +96 (536->632) gpu/command_buffer/service/sync_point_manager.cc
You can see more details if you run "tools/binary_size/diagnose_bloat.py 5e3bdec771619fd558e176dafe393bfe1690078a -v --cloud".
From looking at the comments on the CL it seems that this size change was probably unexpected.
hans@ - Is this change necessary? Is there anything we can do to reduce the size jump on Android?
,
Jun 22 2017
> hans@ - Is this change necessary? Is there anything we can do to reduce the size jump on Android? The change resulted in a size decrease on other platforms. Let me investigate why it's causing growth on Android/ARM..
,
Jun 23 2017
It looks like the machine code for atomic decrement (and probably increment too) on ARM and Thumb is a larger than on X86.
Short example:
bool ref_dec(volatile int *ptr) {
return __atomic_fetch_add(ptr, -1, __ATOMIC_SEQ_CST) - 1 != 0;
}
I suppose we could #ifdef for ARM and out-line the implementation..
,
Jun 30 2017
60 kb is a pretty large regression for Android but I'm not sure if we want to start recommending arch specific implementations for this amount of growth.. agrieve@ - thoughts on c#8?
,
Jun 30 2017
I agree, unless this makes ARM measurably faster, it's not worth 60kb.
,
Jun 30 2017
On x86, the change _reduces_ size. The question is: Should we have different code paths on x86 and arm here?
,
Jun 30 2017
Perhaps we could define the method body in a macro to ensure it doesn't differ between .h and .cc? If we don't want to have different paths, then I do not think this change should go in.
,
Jun 30 2017
Hans, you said that it looks like the combination of all your perf work amounted to a 30ms startup reduction on Android, right? ...yeah, query here: Before: https://uma.googleplex.com/p/chrome/histograms/?endDate=latest&dayCount=28&histograms=PageLoad.PaintTiming.NavigationToFirstContentfulPaint&fixupData=true&showMax=true&analysis=0.1%200.2%200.3%200.4%200.5%200.6%200.7%200.75%200.8%200.9%200.95%200.98%200.99%200.995&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C2%2Call_version%2Ceq%2C61.0.3136.4%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial After: https://uma.googleplex.com/p/chrome/histograms/?endDate=latest&dayCount=28&histograms=PageLoad.PaintTiming.NavigationToFirstContentfulPaint&fixupData=true&showMax=true&analysis=0.1%200.2%200.3%200.4%200.5%200.6%200.7%200.75%200.8%200.9%200.95%200.98%200.99%200.995&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C2%2Call_version%2Ceq%2C61.0.3142.0%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial Looks like 40ms at 50%ile, 100ms at 90%ile, 700ms at 99%ile. The macro sounds like a good idea, I guess we could do that and see how much of the perf we lose on Android (there were a couple changes like this)? If we do lose most of it again: agrieve, what's the startup-ms-binary-size-conversion factor? Is 1s faster startup worth 1.5MB of size?
,
Jun 30 2017
Wow, great work on the start-up time front! I don't think any of the other changes appeared as size alerts, so it's possible that this one in particular isn't necessary (on arm). I'd say that if this change was even responsible for 10ms at 50%, then that would justify keeping it.
,
Jun 30 2017
I looked at the generated code, and while it's very small for X86 (that's why we want it inline), there doesn't seem to be a way to generate a really small code sequence for ARM, so perhaps it makes sense to have it out-of-line there. I'm working on a patch to see what it would look like.
,
Jun 30 2017
Taking a stab here, using an inline method instead of macro: https://codereview.chromium.org/2961413003/
,
Jun 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/79eb7ebaabeb87fd76da56f35de0ab5776806a15 commit 79eb7ebaabeb87fd76da56f35de0ab5776806a15 Author: hans <hans@chromium.org> Date: Fri Jun 30 18:14:48 2017 Outline RefCountedThreadSafeBase::AddRef and Release on non-X86 They were made inline in #481091, which caused a size drop on X86 and size increase on ARM. The code sequence on ARM is larger than on X86, and probably not worth inlining. BUG= 728324 , 736015 Review-Url: https://codereview.chromium.org/2961413003 Cr-Commit-Position: refs/heads/master@{#483759} [modify] https://crrev.com/79eb7ebaabeb87fd76da56f35de0ab5776806a15/base/memory/ref_counted.cc [modify] https://crrev.com/79eb7ebaabeb87fd76da56f35de0ab5776806a15/base/memory/ref_counted.h
,
Jun 30 2017
Graph here: https://chromeperf.appspot.com/report?sid=99caa6c5389f77e13b562afa5703ad8fc1dfd9784effeb8760626be0c71e498b&start_rev=483726&end_rev=483779 - 60 kb drop as expected! Thanks for fixing :).
,
Jun 30 2017
Let's keep this open for another two weeks so we remember to look at startup perf with next week's canary.
,
Jul 17 2017
The outlining patch in #17 landed in 61.0.3146.0 according to omahaproxy. The next canary push was based on 3149.0. 3145.0: https://uma.googleplex.com/p/chrome/histograms/?endDate=latest&dayCount=28&histograms=PageLoad.PaintTiming.NavigationToFirstContentfulPaint&fixupData=true&showMax=true&analysis=0.1%200.2%200.3%200.4%200.5%200.6%200.7%200.75%200.8%200.9%200.95%200.98%200.99%200.995&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Call_version%2Ceq%2C61.0.3145.0%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial Median was 591.1 3149.0: https://uma.googleplex.com/p/chrome/histograms/?endDate=latest&dayCount=28&histograms=PageLoad.PaintTiming.NavigationToFirstContentfulPaint&fixupData=true&showMax=true&analysis=0.1%200.2%200.3%200.4%200.5%200.6%200.7%200.75%200.8%200.9%200.95%200.98%200.99%200.995&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Call_version%2Ceq%2C61.0.3149.0%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial Median was 600.8. Maybe some of that was due to the outlining, but it could also be just noise.
,
Jul 17 2017
Closing as I don't think there's more to do here. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by estevenson@chromium.org
, Jun 22 2017