Issue metadata
Sign in to add a comment
|
72.2% regression in load_library_perf_tests at 396984:397015 |
||||||||||||||||||||
Issue descriptionThis happens for both libclearkeycdm.so and libclearkeycdmadapter.so. Widevine CDM and adapter are not affected.
,
Jun 24 2016
Bisect has been kicked off, still waiting for the result.
,
Jul 6 2016
Moving this nonessential bug to the next milestone. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 12 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9007320664687832064
,
Jul 12 2016
=== Auto-CCing suspected CL author pcc@chromium.org === Hi pcc@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Restore the -fvisibility=hidden flag for tcmalloc when using allocator shim. Author : pcc Commit description: Hidden visibility was initially disabled for tcmalloc in https://codereview.chromium.org/343037 as a workaround for a lack of symbol visibility annotations in the source. These annotations were finally added as part of the new allocator shim (see //base/allocator/allocator_shim_override_glibc_weak_symbols.h and //base/allocator/allocator_shim_internals.h for the definition of SHIM_ALWAYS_EXPORT). Hence it is no longer necessary to compile tcmalloc with default visibility when using the allocator shim. This fixes an issue where some of tcmalloc's internal classes were being compiled with default visibility in the implementation and with hidden visibility in clients. This is technically an ODR violation, but not normally a significant one. The ODR violation was exposed by an upcoming LLVM change [1] that enables the relative C++ ABI for (approximately) classes with hidden visibility. [1] http://reviews.llvm.org/D20749 BUG=589384 R=primiano@chromium.org, thakis@chromium.org Review-Url: https://codereview.chromium.org/2019183002 Cr-Commit-Position: refs/heads/master@{#397015} Commit : 1110ac7a5c525ab81105a8cf7be2f7b59d669657 Date : Wed Jun 01 01:28:00 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@396983 0.318875 0.050702 8 good chromium@396999 39.012 109.527 8 good chromium@397007 0.2938 0.0147207 5 good chromium@397011 0.2902 0.00990959 5 good chromium@397013 0.2862 0.00356371 5 good chromium@397014 0.2938 0.00798123 5 good chromium@397015 0.512375 0.0389136 8 bad <-- Bisect job ran on: linux_perf_bisect Bug ID: 617732 Test Command: ./src/out/Release/load_library_perf_tests --single-process-tests Test Metric: time_to_load_library/libclearkeycdm.so Relative Change: 62.19% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6592 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9007320664687832064 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=4971924711538688 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 17 2016
Perf sheriff ping: pcc@, does it seem to make sense that that CL could have affected this performance test?
,
Oct 5 2016
pcc@ can you please provide an update on this issue? /cc sullivan@ any idea who owns load_library_perf_test.cc
,
Oct 17 2016
xhwang owns this test.
,
Oct 17 2016
Sorry about the delay, I will take a look at this today.
,
Oct 17 2016
Okay, I've looked into this, and I've been able to reproduce the regression locally. This is what I think is going on here:
- We link an additional, unnecessary copy of tcmalloc into each of the libclearkeycdm*.so DSOs. This was also the case prior to my change.
- Each copy of tcmalloc tries to initialise itself in the TCMallocGuard constructor [0] via a static variable named module_enter_exit_hook [1].
- The tcmalloc initialisation code is guarded by a static variable named tcmallocguard_refcount [2].
- Prior to my change, the dynamic loader overrode the definition of the TCMallocGuard constructor in the DSOs with the one in the executable. This meant that each call to the constructor used the same copy of tcmallocguard_refcount.
- After my change, the visibility of the TCMallocGuard constructor became hidden, so each DSO started using its own copy of the constructor and therefore its own copy of tcmallocguard_refcount.
- This led to the tcmalloc initialisation code being run for each DSO. The time spent in the initialisation code likely caused the perf regression.
I think we should fix the underlying problem here by removing the unnecessary copies of tcmalloc in each of these DSOs. The DSOs will continue to use tcmalloc for allocation because the main executable's definition of malloc will override libc's. I haven't looked into how hard this would be from a build system perspective though.
An easier workaround may be to give tcmallocguard_refcount default visibility. I made this change locally:
diff --git a/third_party/tcmalloc/chromium/src/tcmalloc.cc b/third_party/tcmalloc/chromium/src/tcmalloc.cc
index 4ea7cdc..b031e50 100644
--- a/third_party/tcmalloc/chromium/src/tcmalloc.cc
+++ b/third_party/tcmalloc/chromium/src/tcmalloc.cc
@@ -920,7 +920,8 @@ class TCMallocImplementation : public MallocExtension {
// well for STL).
//
// The destructor prints stats when the program exits.
-static int tcmallocguard_refcount = 0; // no lock needed: runs before main()
+__attribute__((visibility("default"))) int tcmallocguard_refcount =
+ 0; // no lock needed: runs before main()
TCMallocGuard::TCMallocGuard() {
if (tcmallocguard_refcount++ == 0) {
#ifdef HAVE_TLS // this is true if the cc/ld/libc combo support TLS
but it didn't appear to fix the regression entirely. My local perf measurements for libclearkeycdm.so are:
- trunk: ~1.4ms
- trunk with https://codereview.chromium.org/2019183002 reverted: ~0.6ms
- trunk with default visibility change: ~1.0ms
So it may be that there are more static initialisers lurking in the tcmalloc code.
Adding primiano@, who has been looking at allocator issues.
[0] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&dr=CSs&l=924
[1] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&l=959&dr=CSs
[2] https://cs.chromium.org/chromium/src/third_party/tcmalloc/chromium/src/tcmalloc.cc?q=module_enter_exit_hook&sq=package:chromium&dr=CSs&l=925
,
Oct 17 2016
FWIW, if I hack my libclearkeycdm.so.rsp file to remove all references to files in base/allocator/, the link completes successfully, file size decreases by ~600KB and test execution time decreases to ~0.5ms.
,
Oct 18 2016
"extra copy of tcmalloc? Who is extra copy of tcmalloc? My name is workaround for MSVS2013 shim" :D Joking aside, thanks ppc@ for the excellent analysis and also for the double-check in #11. Yes I agree the resolution here is just removing that dependency to //base from clearkeycdm. Also the current situation is a bit scary because it means that now cdmadapter is using its own copy of tcmalloc with its own heap. Which means that if some allocation crosses the shared lib boundary (alloc on one side, free on the other) it will boom! The only reason why that is there, according to crrev.com/1616793003 commit message, is because of the way the old windows shim (for msvs2013) was working. Essentially in the old way everything that indirectly (indirectly: through a dependency which crossed a shared-library boundary) depended on base was getting the dep against libcrt.lib (or whatever is the name on win) removed, but wasn't getting a replacement . all this shouldn't be needed anymore after wfh@'s work on the shim for MSVS2015, so that can be just ripped out. Longer description on https://codereview.chromium.org/2429773002/, which is also the fix to all this.
,
Dec 7 2016
primiano - is your comment #12 a justification for this regression? After the CL reference landed, do you have any additional concerns?
,
Dec 7 2016
I completely forgot about this. https://codereview.chromium.org/2429773002/ should fix this regression. Is not landed yet, I didn't have time to address comments about tests. If somebody wants to pick it up I'll be happy. Otherwise I'll get to this at some point.
,
Dec 13 2016
Friendly Perf-Sheriff ping, primiano@ Looks like your CL is still not landed, do you have any update here?
,
Dec 13 2016
,
Mar 7 2017
Hi Primiano, the exact thing you have predicted in #12 just happened: https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1283 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FThinLTO_Linux_ToT%2F1283%2F%2B%2Frecipes%2Fsteps%2Faura_unittests%2F0%2Flogs%2FGestureRecognizerTest.EagerGestureDetection%2F0 When ThinLTO is enabled, the linker makes some decisions (most likely, completely valid) which lead to a string allocated in one page heap and deallocated in another. And the thing goes boom. Would it be a good idea to resurrect the patch from #14?
,
Mar 7 2017
Alternatively (providing for the completeness of the options we have), we can revert https://codereview.chromium.org/2697123007/ and https://codereview.chromium.org/2720703004/ as those two CLs are the triggers for the kaboom. But my feel that landing https://codereview.chromium.org/2429773002/ is a better way to handle this.
,
Mar 7 2017
I have also verified that removing the dependency on //base from third_party/mesa, fixes the issue. I have created a minimal CL that should fix the buildbot, let's submit it now, unless there are any real objections: https://codereview.chromium.org/2736613005/
,
Mar 7 2017
Hmm I'd be quite against reverting the CLs in #18. The author there is not doing anything bad (other than indirectly triggering an unfortunate situation). I think we should just fix the original problem and resurrect crrev.com/2429773002 . Sorry that fell off my plate and forgot about that (I think twice or so). I'd be definitely up for rebasing that, the only problem is that I am completely out of bandwidth this week because of the *other* perf. If you or somebody else could help with that would be great. That CL has already most LGTMs, I think there was only some concern from wfh@ whether that would have regressed security check in the third party libs.
,
Mar 7 2017
Just LGTM-d crrev.com/2429773002, thanks for picking that up. Is that enough to unblock your bot or is clearkeycdm also causing problems?
,
Mar 7 2017
I *hope* that this CL will be enough for unblock the bot. I am submitting the CL as TBR=kbr and we'll see how it goes. Thank you for taking a look and LGTM'ing it.
,
Mar 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/efcd0c614acac68022267a2eb1af6519353906fb commit efcd0c614acac68022267a2eb1af6519353906fb Author: krasin <krasin@chromium.org> Date: Tue Mar 07 22:15:11 2017 osmesa: drop workaround for VS2013 allocator shim This is a partial revert of crrev.com/1616793003 . crrev.com/1616793003 introduced a workaround to the clearkeycdmadapter and osmesa libs to deal with a breakage on windows. The breakage was because in the days of MSVS2013 and gyp the win shim was working by: 1. removing the dependency to libcrt from any target that would depend directly or indirectly on //base. 2. giving them a replacement shim which did redefine allocator syms. However, all this was working only in the case that the target was ending up linking //base, which is the case for *almost* any target in chrome. It was known to break if we had a case of a target depending on //base only via a shared library dependency (so depending on, but not linking, //base). In such case only 1 would happen but not 2, leading to a build failure on win, as highlighted in the commit message of crrev.com/1616793003 . We estimated this to be very rare in the codebase, and effectively did affect only osmesa and cdmadaptr, so we added a workaround there. Now that both gyp and MSVS2013 are no more, we can go back to the more sensible case where only the root executable (chrome, or whatever else executable linking base) is the only thing defining the allocator symbols. No workaround should be needed anymore. This only removes the shim from osmesa, but not clearkeycdmadapter. BUG= 617732 TBR=kbr@chromium.org Review-Url: https://codereview.chromium.org/2736613005 Cr-Commit-Position: refs/heads/master@{#455245} [modify] https://crrev.com/efcd0c614acac68022267a2eb1af6519353906fb/third_party/mesa/BUILD.gn
,
Mar 8 2017
So, the ThinLTO ToT buildbot has become more green after I sent the fix: https://build.chromium.org/p/chromium.fyi/builders/ThinLTO%20Linux%20ToT/builds/1285 Sadly, some larger tests still fail due to the clearkeycdmadapter issue. :(
,
Mar 15 2017
Sent https://codereview.chromium.org/2748083004 with a fix for the LTO issue.
,
Mar 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464 commit 5ba86e8e4101bf5cd483c0cc6da5264db3bc2464 Author: pcc <pcc@chromium.org> Date: Wed Mar 15 16:33:38 2017 Set noinline attribute on exported shim layer functions. If an exported symbol is linked into a DSO, it may be preempted by a definition in the main executable. If this happens to an allocator symbol, it will mean that the DSO will use the main executable's allocator. This is normally relatively harmless -- regular allocations should all use the same allocator, but if the DSO tries to hook the allocator it will not see any allocations. However, if LLVM LTO is enabled, the compiler may inline the shim layer symbols into callers. The end result is that allocator calls in DSOs may use either the main executable's allocator or the DSO's allocator, depending on whether the call was inlined. This is arguably a bug in LLVM caused by its somewhat irregular handling of symbol interposition (see llvm.org/PR23501). To work around the bug we use noinline to prevent the symbols from being inlined. In the long run we probably want to avoid linking the allocator bits into DSOs altogether. This will save a little space and stop giving DSOs the false impression that they can hook the allocator. BUG= 617732 R=primiano@chromium.org Review-Url: https://codereview.chromium.org/2748083004 Cr-Commit-Position: refs/heads/master@{#457103} [modify] https://crrev.com/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464/base/allocator/allocator_shim_internals.h [modify] https://crrev.com/5ba86e8e4101bf5cd483c0cc6da5264db3bc2464/base/allocator/allocator_shim_override_glibc_weak_symbols.h
,
Aug 16 2017
Is this still being worked on? Shouldn't it be assigned to pcc if so?
,
Aug 24 2017
I think that #23 and #26 worked around the problem (thanks to krasin and pcc) and there is nothing left here, unless I am missing something (possible). Please reopen if that is the case. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by xhw...@chromium.org
, Jun 6 2016