New issue
Advanced search Search tips

Issue 631396 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 633137
Owner:
Closed: Aug 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

6.5% regression in dromaeo.domcoremodify at 407372:407372

Project Member Reported by primiano@chromium.org, Jul 26 2016

Issue description

Assigning to test owner (yukishiino).
This one doesn't seem realistic to me. The only CL in range is brucedawson's https://codereview.chromium.org/2172033002, which, from what I see, affects only component builds  (source-set vs static-libraries). Should be a noop on perf builds.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=631396

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICguoe4hgoM


Bot(s) for this bug's original alert(s):

android-nexus6
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 26 2016

Cc: brucedaw...@chromium.org
Owner: brucedaw...@chromium.org

=== Auto-CCing suspected CL author brucedawson@chromium.org ===

Hi brucedawson@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Reland of Shrink gn's chrome.dll - now smaller than gyp's (patchset #1 id:1 of https://codereview.chromium.org/2173453004/ )
Author  : brucedawson
Commit description:
  
Reason for revert (reland):
Problem is understood, mostly. Needed to slightly update the patch to fix it. Blink split counts were copied from gyp.

Original issue's description (with indents removed):
More work to shrink gn's chrome.dll

The three largest globals that were present in gn's chrome.dll but not in gyp's chrome.dll were eliminated by using /verbose linker output to track the object files that pulled them in and then conditionally changing source_set targets to static_library targets. Specifically:

unigram_table, in compact_enc_det.obj
- Referenced by TextResourceDecoder.obj from //third_party/WebKit/Source/core:html - some other source_set targets in this file were also modified

gpu::ApplyFramebufferAttachmentCMAAINTELResourceManager::cmaa_frag_s1_ and cmaa_frag_s2_, in gles2_cmd_apply_framebuffer_attachment_cmaa_intel.obj from //gpu/command_buffer/service:service_sources
- Referenced by gpu_command_buffer_stub.obj from //gpu/ipc/service:ipc_service_sources
- Referenced by gpu_video_decode_accelerator.obj from //media/gpu/ipc/service:service
- Referenced by gpu_child_thread.obj from //content/gpu:gpu_sources
- Referenced by gpu_video_decode_accelerator_factory.obj from //content/public/gpu:gpu_sources

As of R406709 this shrinks gn's 32-bit official chrome.dll file size from 38,907,904 bytes to 37,571,584 bytes - an unexpected 1,336,320 byte savings, mostly from the .text section. There is also ~67,000 bytes of memory-only savings in the zero-init part of the .data section.

At the same revision gyp's 32-bit official chrome.dll file size is 37,843,456 bytes - 271,872 bytes *larger* than the gn version.

There are still globals that are present in gn's chrome.dll but not gyp's chrome.dll, so the optimization technique can still be applied some more, but the priority is much lower now that gn is winning.

This is a follow-on to crrev.com/2163823002.

TBR=brettw@chromium.org,achuith@chromium.org
BUG= 624274 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2172033002
Cr-Commit-Position: refs/heads/master@{#407372}
Commit  : 6e53a2e16dc26dec41494b02da83a1221b72e7c4
Date    : Sat Jul 23 19:37:25 2016


===== TESTED REVISIONS =====
Revision         Mean     Std Dev   N  Good?
chromium@407371  83.3516  0.854636  5  good
chromium@407372  76.1947  0.287466  5  bad    <--

Bisect job ran on: android_nexus6_perf_bisect
Bug ID: 631396

Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --also-run-disabled-tests dromaeo.domcoremodify
Test Metric: dom/dom
Relative Change: 8.59%
Score: 99.9

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus6_perf_bisect/builds/2338
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9006084840545577248


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5866983027900416

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Aha, this sounds then a dupe of  Issue 631421 . My comment there was:
-----
Hmm Bruce, your https://codereview.chromium.org/2172033002 seems the only possible candidate in the list.
Which is weird, the only side effect I can see there is switching gpu and few other targets from source_set to static_libraries. Any idea on how could that affect the performance of an Android official build (which is a static_build).
Let's wait for bisect to confirm, I'd not expect static_library vs source_set to make a perf difference in non-component builds. hmm
-----
Cc: primiano@chromium.org
 Issue 631421  has been merged into this issue.
The algorithms that a linker uses to search a list of .obj files and a static library are different. With .obj files it pulls in everything and then discards what it can prove is not needed, with .lib files it only pulls in what is referenced (and then discards what it can prove is not needed).

That's it.

This means that changing from .obj files (source_set) to static_library could easily change the layout of code, and could easily shrink code (by not pulling in code that isn't strictly needed) but shouldn't have other performance effects.

The code-shrinking effect was seen quite dramatically on Windows, by this change.

The only way that a significant performance change *should* be possible is if there is an ODR violation somewhere and this change causes a different variant of a function to be pulled in, or if there is some sort of binary layout optimization that this change causes to be skipped.

That is, random binary layout changes could make a performance difference, but 6.5% seems excessive. Careful binary layout changes can make a 6.5% difference, so maybe some such step is now being skipped?

Note that I am in training right now so I will be responding slowly, and I have no Android development experience.
Pretty sure this is your change, Bruce. The revision range is pretty small, and everything except yours is a test change (unrelated to this test).

https://chromium.googlesource.com/chromium/src/+log/4eaf2aebd0b16710e3495d3448bcdbe3dab4a78d%5E..6e53a2e16dc26dec41494b02da83a1221b72e7c4?pretty=fuller
Perf sheriff ping: reminder to follow up on possible performance issues
Mergedinto: 633137
Status: Duplicate (was: Assigned)
This was discussed and investigated in great detail in  bug 631421  and we confirmed, to our satisfaction, that the change caused enough binary rearrangement that it caused some benchmarks to slow down due to some branches now covering longer distances.

This change did not just cause regressions - it also caused some benchmarks to run faster. The change basically shuffled the code (from one "random" order to another "random" order). We currently don't have a sane or systematic way of dealing with this.

 Bug 633137  is a request for a way of ordering our binaries more intelligently so that we don't get random regressions like this one and so that we'll get optimum performance across all of the benchmarks and real scenarios, instead of just randomly good/bad performance.

Sign in to add a comment