New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 617732 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug-Regression



Sign in to add a comment

72.2% regression in load_library_perf_tests at 396984:397015

Project Member Reported by xhw...@chromium.org, Jun 6 2016

Issue description

This happens for both libclearkeycdm.so and libclearkeycdmadapter.so. Widevine CDM and adapter are not affected.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=617732

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICgnNrztwoM


Bot(s) for this bug's original alert(s):

linux-release
Bisect has been kicked off, still waiting for the result.
Project Member

Comment 3 by sheriffbot@chromium.org, Jul 6 2016

Labels: -M-53 M-54 MovedFrom-53
Moving this nonessential bug to the next milestone.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 5 by 42576172...@developer.gserviceaccount.com, Jul 12 2016

Cc: p...@chromium.org
Owner: p...@chromium.org

=== Auto-CCing suspected CL author pcc@chromium.org ===

Hi pcc@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Restore the -fvisibility=hidden flag for tcmalloc when using allocator shim.
Author  : pcc
Commit description:
  
Hidden visibility was initially disabled for tcmalloc in
https://codereview.chromium.org/343037 as a workaround for
a lack of symbol visibility annotations in the source. These
annotations were finally added as part of the new allocator shim
(see //base/allocator/allocator_shim_override_glibc_weak_symbols.h
and //base/allocator/allocator_shim_internals.h for the definition of
SHIM_ALWAYS_EXPORT). Hence it is no longer necessary to compile tcmalloc
with default visibility when using the allocator shim.

This fixes an issue where some of tcmalloc's internal classes were being
compiled with default visibility in the implementation and with hidden
visibility in clients. This is technically an ODR violation, but not normally
a significant one. The ODR violation was exposed by an upcoming LLVM change
[1] that enables the relative C++ ABI for (approximately) classes with
hidden visibility.

[1] http://reviews.llvm.org/D20749

BUG=589384
R=primiano@chromium.org, thakis@chromium.org

Review-Url: https://codereview.chromium.org/2019183002
Cr-Commit-Position: refs/heads/master@{#397015}
Commit  : 1110ac7a5c525ab81105a8cf7be2f7b59d669657
Date    : Wed Jun 01 01:28:00 2016


===== TESTED REVISIONS =====
Revision         Mean      Std Dev     N  Good?
chromium@396983  0.318875  0.050702    8  good
chromium@396999  39.012    109.527     8  good
chromium@397007  0.2938    0.0147207   5  good
chromium@397011  0.2902    0.00990959  5  good
chromium@397013  0.2862    0.00356371  5  good
chromium@397014  0.2938    0.00798123  5  good
chromium@397015  0.512375  0.0389136   8  bad    <--

Bisect job ran on: linux_perf_bisect
Bug ID: 617732

Test Command: ./src/out/Release/load_library_perf_tests --single-process-tests
Test Metric: time_to_load_library/libclearkeycdm.so
Relative Change: 62.19%
Score: 99.9

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6592
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9007320664687832064


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=4971924711538688

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Perf sheriff ping: pcc@, does it seem to make sense that that CL could have affected this performance test?
Cc: sullivan@chromium.org
pcc@ can you please provide an update on this issue?

/cc sullivan@ any idea who owns load_library_perf_test.cc  
xhwang owns this test.

Comment 9 by p...@chromium.org, Oct 17 2016

Sorry about the delay, I will take a look at this today.

Comment 10 by p...@chromium.org, Oct 17 2016

Cc: primiano@chromium.org
Okay, I've looked into this, and I've been able to reproduce the regression locally. This is what I think is going on here:

- We link an additional, unnecessary copy of tcmalloc into each of the libclearkeycdm*.so DSOs. This was also the case prior to my change.
- Each copy of tcmalloc tries to initialise itself in the TCMallocGuard constructor [0] via a static variable named module_enter_exit_hook [1].
- The tcmalloc initialisation code is guarded by a static variable named tcmallocguard_refcount [2].
- Prior to my change, the dynamic loader overrode the definition of the TCMallocGuard constructor in the DSOs with the one in the executable. This meant that each call to the constructor used the same copy of tcmallocguard_refcount.
- After my change, the visibility of the TCMallocGuard constructor became hidden, so each DSO started using its own copy of the constructor and therefore its own copy of tcmallocguard_refcount.
- This led to the tcmalloc initialisation code being run for each DSO. The time spent in the initialisation code likely caused the perf regression.

I think we should fix the underlying problem here by removing the unnecessary copies of tcmalloc in each of these DSOs. The DSOs will continue to use tcmalloc for allocation because the main executable's definition of malloc will override libc's. I haven't looked into how hard this would be from a build system perspective though.

An easier workaround may be to give tcmallocguard_refcount default visibility. I made this change locally:

diff --git a/third_party/tcmalloc/chromium/src/tcmalloc.cc b/third_party/tcmalloc/chromium/src/tcmalloc.cc
index 4ea7cdc..b031e50 100644
--- a/third_party/tcmalloc/chromium/src/tcmalloc.cc
+++ b/third_party/tcmalloc/chromium/src/tcmalloc.cc
@@ -920,7 +920,8 @@ class TCMallocImplementation : public MallocExtension {
 // well for STL).
 //
 // The destructor prints stats when the program exits.
-static int tcmallocguard_refcount = 0;  // no lock needed: runs before main()
+__attribute__((visibility("default"))) int tcmallocguard_refcount =
+    0;  // no lock needed: runs before main()
 TCMallocGuard::TCMallocGuard() {
   if (tcmallocguard_refcount++ == 0) {
 #ifdef HAVE_TLS    // this is true if the cc/ld/libc combo support TLS


but it didn't appear to fix the regression entirely. My local perf measurements for libclearkeycdm.so are:
- trunk: ~1.4ms
- trunk with https://codereview.chromium.org/2019183002 reverted: ~0.6ms
- trunk with default visibility change: ~1.0ms
So it may be that there are more static initialisers lurking in the tcmalloc code.

Adding primiano@, who has been looking at allocator issues.

[0] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&dr=CSs&l=924
[1] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&l=959&dr=CSs
[2] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&dr=CSs&l=925

Comment 11 by p...@chromium.org, Oct 17 2016

FWIW, if I hack my libclearkeycdm.so.rsp file to remove all references to files in base/allocator/, the link completes successfully, file size decreases by ~600KB and test execution time decreases to ~0.5ms.
Cc: wfh@chromium.org
"extra copy of tcmalloc? Who is extra copy of tcmalloc? My name is workaround for MSVS2013 shim" :D

Joking aside, thanks ppc@ for the excellent analysis and also for the double-check in #11.
Yes I agree the resolution here is just removing that dependency to //base from clearkeycdm. 
Also the current situation is a bit scary because it means that now cdmadapter is using its own copy of tcmalloc with its own heap. Which means that if some allocation crosses the shared lib boundary (alloc on one side, free on the other) it will boom!

The only reason why that is there, according to crrev.com/1616793003 commit message, is because of the way the old windows shim (for msvs2013) was working. Essentially in the old way everything that indirectly (indirectly: through a dependency which crossed a shared-library boundary) depended on base was getting the dep against libcrt.lib (or whatever is the name on win) removed, but wasn't getting a replacement .
all this shouldn't be needed anymore after wfh@'s work on the shim for MSVS2015, so that can be just ripped out.

Longer description on https://codereview.chromium.org/2429773002/, which is also the fix to all this.
primiano - is your comment #12 a justification for this regression? After the CL reference landed, do you have any additional concerns?
Owner: primiano@chromium.org
Status: Started (was: Assigned)
I completely forgot about this.
https://codereview.chromium.org/2429773002/ should fix this regression. Is not landed yet, I didn't have time to address comments about tests.
If somebody wants to pick it up I'll be happy. Otherwise I'll get to this at some point.
Friendly Perf-Sheriff ping, primiano@ Looks like your CL is still not landed, do you have any update here?

Comment 16 by wfh@chromium.org, Dec 13 2016

Labels: OS-Linux
Cc: krasin@chromium.org
Hi Primiano,

the exact thing you have predicted in #12 just happened:
https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1283
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FThinLTO_Linux_ToT%2F1283%2F%2B%2Frecipes%2Fsteps%2Faura_unittests%2F0%2Flogs%2FGestureRecognizerTest.EagerGestureDetection%2F0

When ThinLTO is enabled, the linker makes some decisions (most likely, completely valid) which lead to a string allocated in one page heap and deallocated in another. And the thing goes boom.

Would it be a good idea to resurrect the patch from #14?
Alternatively (providing for the completeness of the options we have), we can revert

https://codereview.chromium.org/2697123007/
and
https://codereview.chromium.org/2720703004/

as those two CLs are the triggers for the kaboom. But my feel that landing https://codereview.chromium.org/2429773002/ is a better way to handle this.
I have also verified that removing the dependency on //base from third_party/mesa, fixes the issue.

I have created a minimal CL that should fix the buildbot, let's submit it now, unless there are any real objections: https://codereview.chromium.org/2736613005/
Cc: erikc...@chromium.org
Hmm I'd be quite against reverting the CLs in #18. The author there is not doing anything bad (other than indirectly triggering an unfortunate situation).
I think we should just fix the original problem and resurrect crrev.com/2429773002 . Sorry that fell off my plate and forgot about that (I think twice or so).
I'd be definitely up for rebasing that, the only problem is that I am completely out of bandwidth this week because of the *other* perf. If you or somebody else could help with that would be great. That CL has already most LGTMs, I think there was only some concern from wfh@ whether that would have regressed security check in the third party libs. 
Just LGTM-d crrev.com/2429773002, thanks for picking that up. Is that enough to unblock your bot or is clearkeycdm also causing problems? 
I *hope* that this CL will be enough for unblock the bot. I am submitting the CL as TBR=kbr and we'll see how it goes.

Thank you for taking a look and LGTM'ing it.
Project Member

Comment 23 by bugdroid1@chromium.org, Mar 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/efcd0c614acac68022267a2eb1af6519353906fb

commit efcd0c614acac68022267a2eb1af6519353906fb
Author: krasin <krasin@chromium.org>
Date: Tue Mar 07 22:15:11 2017

osmesa: drop workaround for VS2013 allocator shim

This is a partial revert of crrev.com/1616793003 .

crrev.com/1616793003 introduced a workaround to the clearkeycdmadapter
and osmesa libs to deal with a breakage on windows. The breakage was
because in the days of MSVS2013 and gyp the win shim was working by:
1. removing the dependency to libcrt from any target that would depend
   directly or indirectly on //base.
2. giving them a replacement shim which did redefine allocator syms.

However, all this was working only in the case that the target
was ending up linking //base, which is the case for *almost* any target
in chrome. It was known to break if we had a case of a target depending
on //base only via a shared library dependency (so depending on, but not
linking, //base).
In such case only 1 would happen but not 2, leading to a build failure
on win, as highlighted in the commit message of crrev.com/1616793003 .
We estimated this to be very rare in the codebase, and effectively did
affect only osmesa and cdmadaptr, so we added a workaround there.

Now that both gyp and MSVS2013 are no more, we can go back to the
more sensible case where only the root executable (chrome, or whatever
else executable linking base) is the only thing defining the allocator
symbols. No workaround should be needed anymore.

This only removes the shim from osmesa, but not clearkeycdmadapter.

BUG= 617732 
TBR=kbr@chromium.org

Review-Url: https://codereview.chromium.org/2736613005
Cr-Commit-Position: refs/heads/master@{#455245}

[modify] https://crrev.com/efcd0c614acac68022267a2eb1af6519353906fb/third_party/mesa/BUILD.gn

So, the ThinLTO ToT buildbot has become more green after I sent the fix:
https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1285

Sadly, some larger tests still fail due to the clearkeycdmadapter issue. :(

Comment 25 by p...@chromium.org, Mar 15 2017

Sent https://codereview.chromium.org/2748083004 with a fix for the LTO issue.
Project Member

Comment 26 by bugdroid1@chromium.org, Mar 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464

commit 5ba86e8e4101bf5cd483c0cc6da5264db3bc2464
Author: pcc <pcc@chromium.org>
Date: Wed Mar 15 16:33:38 2017

Set noinline attribute on exported shim layer functions.

If an exported symbol is linked into a DSO, it may be preempted by a
definition in the main executable. If this happens to an allocator symbol, it
will mean that the DSO will use the main executable's allocator. This is
normally relatively harmless -- regular allocations should all use the same
allocator, but if the DSO tries to hook the allocator it will not see any
allocations.

However, if LLVM LTO is enabled, the compiler may inline the shim layer
symbols into callers. The end result is that allocator calls in DSOs may use
either the main executable's allocator or the DSO's allocator, depending on
whether the call was inlined. This is arguably a bug in LLVM caused by its
somewhat irregular handling of symbol interposition (see llvm.org/PR23501).
To work around the bug we use noinline to prevent the symbols from being
inlined.

In the long run we probably want to avoid linking the allocator bits into
DSOs altogether. This will save a little space and stop giving DSOs the false
impression that they can hook the allocator.

BUG= 617732 
R=primiano@chromium.org

Review-Url: https://codereview.chromium.org/2748083004
Cr-Commit-Position: refs/heads/master@{#457103}

[modify] https://crrev.com/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464/base/allocator/allocator_shim_internals.h
[modify] https://crrev.com/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464/base/allocator/allocator_shim_override_glibc_weak_symbols.h

Is this still being worked on? Shouldn't it be assigned to pcc if so?
Status: Fixed (was: Started)
I think that #23 and #26 worked around the problem (thanks to krasin and pcc) and there is nothing left here, unless I am missing something (possible).
Please reopen if that is the case.

Sign in to add a comment