Issue metadata
Sign in to add a comment
|
15.9% regression in blink_perf.{canvas|parser} at 384321 - 384621 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Apr 4 2016
,
Apr 4 2016
,
Apr 4 2016
=== Auto-CCing suspected CL author esprehn@chromium.org === Hi esprehn@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : WTF::ThreadSpecific should use base::ThreadLocalStorage. Author : esprehn Commit description: Let's use the base abstraction to allocate the TLS. It's way less code. Review URL: https://codereview.chromium.org/1849023002 Cr-Commit-Position: refs/heads/master@{#384650} Commit : 56641fdb889c6b5d27997465b622d7432c0ba0b8 Date : Fri Apr 01 19:01:26 2016 ===== TESTED REVISIONS ===== Revision Mean Value Std. Dev. Num Values Good? chromium@384639 33.417577 0.09856 5 good chromium@384646 33.452531 0.033532 5 good chromium@384649 33.292346 0.206809 5 good chromium@384650 29.75967 0.064093 5 bad <- chromium@384651 29.629061 0.227256 4 bad chromium@384652 29.821315 0.134139 5 bad chromium@384665 30.038369 0.099122 5 bad Bisect job ran on: linux_perf_bisect Bug ID: 600271 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --also-run-disabled-tests blink_perf.css Test Metric: SelectorCountScaling/SelectorCountScaling Relative Change: 10.11% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6368 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9016324266525394080 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=600271 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Apr 4 2016
===== BISECT JOB RESULTS ===== Status: completed === Bisection aborted === The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression. Please contact the the team (see below) if you believe this is in error. === Warnings === The following warnings were raised by the bisect job: * Bisect failed to reproduce the regression with enough confidence. ===== TESTED REVISIONS ===== Revision Mean Value Std. Dev. Num Values Good? chromium@384311 132.470249 8.926027 18 good chromium@384648 141.956762 17.609913 18 bad Bisect job ran on: android_nexus5_perf_bisect Bug ID: 600271 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --also-run-disabled-tests blink_perf.canvas Test Metric: putImageData/putImageData Relative Change: 5.38% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus5_perf_bisect/builds/3555 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9016321568622720480 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=600271 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Apr 4 2016
There's a v8 roll and a skia roll in there, when looking at the regressions that's looking pretty likely. This needs another bisect.
,
Apr 4 2016
,
Apr 4 2016
I kicked off a few more bisects.
,
Apr 4 2016
===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : WTF::ThreadSpecific should use base::ThreadLocalStorage. Author : esprehn Commit description: Let's use the base abstraction to allocate the TLS. It's way less code. Review URL: https://codereview.chromium.org/1849023002 Cr-Commit-Position: refs/heads/master@{#384650} Commit : 56641fdb889c6b5d27997465b622d7432c0ba0b8 Date : Fri Apr 01 19:01:26 2016 ===== TESTED REVISIONS ===== Revision Mean Value Std. Dev. Num Values Good? chromium@384639 7.5468 0.476513 5 good chromium@384648 7.6562 0.581595 5 good chromium@384649 7.3844 0.416415 5 good chromium@384650 10.9736 0.728737 5 bad <- chromium@384651 10.7542 0.883665 5 bad chromium@384653 10.7615 0.830193 4 bad chromium@384657 10.4022 0.751494 5 bad chromium@384674 10.297 0.558605 5 bad Bisect job ran on: linux_perf_bisect Bug ID: 600271 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --also-run-disabled-tests blink_perf.shadow_dom Test Metric: DistributionWithMultipleShadowRoots/DistributionWithMultipleShadowRoots Relative Change: 36.44% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/linux_perf_bisect/builds/6372 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9016275075056859584 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=600271 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Apr 4 2016
I reverted my patch in https://codereview.chromium.org/1856793002 Lets see if this recovers. It's hard for me to imagine how using Chromium's TLS abstractions would be slower for linux and android, but not for Mac or Windows, and why that would impact these few benchmarks. Maybe it made Oilpan slower?
,
Apr 4 2016
sullivan@ How do I know what bots run what tests? The perf dashboard seems to list lots of machines, but the tests only run on a couple, ex. putImageData seems to only run on Android now? :/
,
Apr 5 2016
The bots seem to be recovering now that my revert has landed. I'm studying the difference between Chromium TLS and Blink TLS: - Blink uses the pthread API directly on OS X, Android and Linux. - Blink uses a single TLS value on Windows, with a 256 slot array to store values. - Chromium always uses a 256 slot array, on every platform. - Chromium has base::subtle::Atomic32 accesses all over the ThreadLocalStorage code. Both for the slot in each ThreadLocalStorage instance, and for the initialized_ variable which most code, and blink, never uses. Indeed the Chromium TLS stuff is going to be slower if it needs to do atomic math all the time, and Blink uses this for everything from the AtomicString table, to the Oilpan ThreadState. I'm not sure what to do here. The atomics were added recently to fix races too, it's a shame Chromium doesn't have micro benchmarks: https://codereview.chromium.org/1222123002
,
Apr 5 2016
,
Apr 5 2016
,
Apr 5 2016
,
Apr 5 2016
Does perf look better without https://codereview.chromium.org/1222123002 ?
,
Apr 5 2016
Can someone from the base/ team look into what's going on here?
,
Apr 5 2016
Release stores and acquire loads would be slower than non-atomic accesses on ARM but equal speed on x86. Is that the pattern you saw?
,
Apr 5 2016
initialized() appears to only be there for tests, with two possible exceptions? 1. https://code.google.com/p/chromium/codesearch#chromium/src/components/url_formatter/url_formatter.cc&rcl=1459797464&l=382 2. https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/heap_profiler_allocation_context_tracker.cc&rcl=1459797464&l=70 Which both guard calls to Initialize(), maybe those could store the initialize externally?
,
Apr 5 2016
The good news is that reverting my patch is a consistent 17% difference on my Z620 on ShadowDOM/DistributionWithMultipleShadowRoots. I tried just commenting out the initialize_ code and that made no difference. The performance difference must be somewhere else. Could someone who works on base look at that benchmark with and without my patch?
,
Apr 5 2016
Are the Linux perf bots affected by crbug.com/592903 ?
,
Apr 5 2016
Unless the android results were a lie, this affected android which uses libc++ also.
,
Apr 5 2016
It looks like android did not recover when linux did from the revert though: https://chromeperf.appspot.com/report?sid=1c27837a287b6f655d5d36247d96f65f2337184d7fda38b5d03ae00950645ba3&rev=384648 So maybe this is crbug.com/592903 and the android regression is unrelated.
,
Apr 5 2016
I'd like to split the (android) canvas regression to another bug, but I don't seem to have permissions on the perf dashboard to do so.
,
Apr 5 2016
I concur, please break-out the the nexus5 canvas regression into a separate issue, and kick off a bisect job for that specific regression. You can assign the new issue to me for further triage. Seems I don't have permissions to do this either.
,
Apr 5 2016
I split the canvas regression into bug 600786 . Note that there is no ref build for that regression and it only occurred on one bot, so it could be noise. Bisect kicked off. Also note that after the monorail migration, dashboard switched to having users log in with chromium.org accounts; that might be the reason people are having problems with permissions.
,
Apr 5 2016
The slowness is from ThreadLocalStorage::StaticSlot::Get(), it does base::subtle::NoBarrier_Load for every single Get(), and blink didn't need to do that. How can we change base's ThreadLocalStorage to not do atomic ops for every single Get() ?
,
Apr 5 2016
How fast are mutexes/locks these days? The alternative approach here is to perform TLS initialization very early in the startup process (e.g. no threads running around), guaranteeing that the TLS key is set and avoiding the need to NoBarrier_Load it.
,
Apr 5 2016
Indeed, Blink has initialize/shutdown to avoid having to do things like this at runtime. Another alternative is to not use the single shared TLS value with 256 slot array except on Windows. Blink did that and used pthread_key_create directly for all other platforms avoiding the need for any globals except on windows.
,
Apr 5 2016
My understanding is that we also needed this scheme for Android, which is TLS constrained. For Windows, the minimum guaranteed is 64, but can go up to 1088 (don't know which versions offhand).
,
Apr 5 2016
Blink didn't special case Android before, it had two modes, Windows and POSIX. http://mobilepearls.com/labs/native-android-api/#pthreads claims Android supports 59 slots, I wonder how close we are to that in practice. Anyway, it seems like maybe we do want the array, but want separate initialization of the shared slot to avoid the atomics?
,
Apr 5 2016
In either case, I've been looking at this from the implementation side off and on after I stumbled onto two different implementations of the Chrome TLS system: ThreadLocalStorage and ThreadLocal[Pointer|Boolean]. Oddly enough, ThreadLocal[Pointer|Boolean] uses ThreadLocalStorage only on Android. Native implementations are used for Windows and POSIX. ( http://crbug.com/588824 ) Having the system TLS slot set at process initialization makes a lot of sense to me.
,
Apr 5 2016
Comment #27 is issue 592903 , which causes NoBarrier_Load to generate 0: 0f ae f0 mfence 3: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 9 <test_atomic_func()+0x9> 9: 0f ae f0 mfence instead of the 0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <test_racy_func()+0x6> it ought to generate. The problem isn't that there are atomic operations, it's that they're implemented incredibly badly in the gcc-4.6 standard library that we're using on Linux.
,
Apr 5 2016
On x86 (with a functioning compiler and sysroot), NoBarrier_Load results in just a load. No barriers or lock instructions. I haven't looked at the code on ARM to see what it does.
,
Apr 6 2016
Navigated to the Graphs and noticed that the alerts are getting recovered. Could some on please look into the issue and update. Note: Removing Needs Bisect label as of now, please feel free to add if required. Thank You.
,
Apr 6 2016
,
Apr 6 2016
This also regressed two Android bots according to the dashboard: https://chromeperf.appspot.com/report?sid=92d3e9794bd0b21013448e3ddd1ba11bc74cd20e6908864248c7244f77e64d8c&rev=384621 https://chromeperf.appspot.com/report?sid=8b9d507a960ea2ba3e7e319f44f39cab32f09d06df2028a3aa365a903a2af904&rev=384633 We could fork those and run the bisect again, but I why do we think atomics on ARM are also "free" like x86?
,
Apr 6 2016
#37 *relaxed* load/store are cheap on all platforms. Details: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
,
Apr 6 2016
The first android bot recovered but I don't see the related revert for this bug in that run? The second doesn't look like a regression on the graph.
,
May 6 2016
This is fixed, but we can't use base::ThreadLocalStorage until the underlying cause is fixed.
,
Jun 23 2016
Renaming Blink>Architecture to Blink>Internals |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by toyoshim@chromium.org
, Apr 4 2016