Issue metadata
Sign in to add a comment
|
19.5% regression in blink_perf.dom at 402628:402670 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jun 30 2016
=== Auto-CCing suspected CL author esprehn@chromium.org === Hi esprehn@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Use a Vector for the buffer in StringBuilder. Author : esprehn Commit description: The string overallocation strategy of StringBuilder doesn't really make sense since we need to create a copy at the end now that StringImpl::reallocate was removed by https://codereview.chromium.org/1390033003 We probably should add back an optimization that forcibly sets the length of the String and uses overallocation to avoid doing a string copy in every StringBuilder, but until we decide to do that lets switch to using a Vector for the buffer since it can handle growing the buffer automatically for us and makes the code in StringBuilder much simpler. Review-Url: https://codereview.chromium.org/2046353002 Cr-Commit-Position: refs/heads/master@{#402638} Commit : 23e9098e1bfaa078916ed64a4c6b6f6bd7b00d1b Date : Wed Jun 29 01:52:00 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@402627 2.31891 0.0215305 5 good chromium@402633 2.32174 0.0151118 5 good chromium@402636 2.31078 0.0194788 5 good chromium@402637 2.32138 0.016998 5 good chromium@402638 1.87615 0.0138534 5 bad <-- chromium@402649 1.86675 0.0125134 5 bad chromium@402670 1.85971 0.0246205 5 bad Bisect job ran on: android_one_perf_bisect Bug ID: 624642 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --also-run-disabled-tests blink_perf.dom Test Metric: textarea-dom/textarea-dom Relative Change: 19.80% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_one_perf_bisect/builds/1381 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9008469678261742080 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=6439152078290944 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Jun 30 2016
jbroman@ Any ideas here? This textarea parsing text is very confusing, on desktop it spends the majority of the time in these replace() calls: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLTextAreaElement.cpp?q=HTMLtextAreaEle&sq=package:chromium&l=391 On mobile it seems to be very sensitive to what I do in the StringBuilder, though it's not clear why. Your patch to use memchr seems to have improved this benchmark on Android One if you look backwards in time in that graph.
,
Jun 30 2016
This textarea parsing *test*
,
Jun 30 2016
It seems all blink_perf.dom tests have been disabled on desktop for 3 months, sadness: https://bugs.chromium.org/p/chromium/issues/detail?id=601666 So looking at this in Instruments on my Mac I see: Running Time Self (ms) Symbol Name 116218.0ms 100.0% 299.0 blink::NodeV8Internal::appendChildMethodCallbackForMainWorld 115334.0ms 99.2% 44.0 blink::ContainerNode::appendChild ... 73432.0ms 63.1% 196.0 blink::HTMLTextAreaElement::setValueCommon 63541.0ms 54.6% 56.0 WTF::StringImpl::replace(WTF::StringImpl*, WTF::StringImpl*) 63485.0ms 54.6% 63485.0 WTF::StringImpl::find(WTF::StringImpl*, unsigned int) ... 40212.0ms 34.6% 5356.0 blink::HTMLTextAreaElement::defaultValue() const 26812.0ms 23.0% 20405.0 WTF::StringBuilder::append(unsigned char const*, unsigned int) 4859.0ms 4.1% 3924.0 WTF::Vector<unsigned char, 16ul, WTF::PartitionAllocator>::reserveCapacity(unsigned long) ... 4058.0ms 3.4% 53.0 WTF::StringBuilder::toString() 3996.0ms 3.4% 12.0 WTF::String::String(unsigned char const*, unsigned int) 3977.0ms 3.4% 31.0 WTF::StringImpl::create(unsigned char const*, unsigned int) ... 3492.0ms 3.0% 3492.0 WTF::StringBuilder::append(WTF::StringView const&) We're spending the vast majority of the time in: StringBuilder::append 23% StringImpl::find 54.6% this is because the test is appending this super massive string into a textarea. Fwiw we're 2x faster than Safari at this benchmark, and we're 25x faster than Firefox. Even with this regression we're also 25% faster than Stable 51 on my desktop. I wonder if we should revert this or just eat the regression on the microbenchmark for now. The profile makes me wonder if Vector<LChar>::append() is somehow much worse than the memcpy we were doing before... :/
,
Jun 30 2016
Hmm, 20% regression sounds a bit too large to me though. But if you think we're just gaming micro-benchmarks, I'm okay with suppressing the regression.
,
Jun 30 2016
So I figured out what's going on here, apparently VectorTraits<LChar>::canCopyWithMemcpy == false, so Vector is actually looping to append bytes instead of using memcpy. That seems super bad, the parser would be doing the same thing, actually in general it seems the memcpy optimization in WTF is busted. // This fails to compile now. static_assert(VectorTraits<int>::canCopyWithMemcpy, "sadness"); static_assert(VectorTraits<LChar>::canCopyWithMemcpy, "sadness"); this is apparently a bug in the IsAssignable type trait in WTF: static_assert(IsAssignable<int, int&>::value, "sadness"); We really need to fix this before M53 branches, we're not using the memcpy optimization in any of the vectors of primitive values, which are used in performance critical things all over the engine.
,
Jun 30 2016
Looks like this was regressed here: https://chromium.googlesource.com/chromium/src/+/d7225e41c5557ef72e5ca439fcf8287be37e7c0f we just didn't have good perf coverage for it?
,
Jun 30 2016
Interestingly, that revision corresponds to the point the metrics in question were *improved* from the same level as we are now after this regression. Maybe textarea-dom is pretty sensitive to whether the data is memcpy'ed or not.
,
Jun 30 2016
I'm skeptical that manually looping is beating memcpy in the general case. Testing locally memcpy certainly improves that benchmark by 20% on my MacBook which means it improves things like textContent on the web too. I'm not sure what metric you see improved by not using memcpy? Before my recent patch that benchmark didn't use Vector at all. We should definitely fix the TypeTraits to handle primitive values correctly again. :)
,
Jun 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/757e4b7379983a8760fe8f91e40bb3b34d64bde4 commit 757e4b7379983a8760fe8f91e40bb3b34d64bde4 Author: yutak <yutak@chromium.org> Date: Thu Jun 30 09:47:17 2016 WTF::Vector: Fix primitive types identified as not safe to memcpy. The culprit was the implementation of IsAssignable<> traits. Specifically, "std::declval<T>() = X" is invalid if T is a primitive type, because the assignment of primitives is not an overloaded operator and the compilability of this statement for a primitive T is different from that for a non-primitive T. The fix is to simply change the left-hand side of the assignment to "std::declval<T&>". BUG= 624642 Review-Url: https://codereview.chromium.org/2110183005 Cr-Commit-Position: refs/heads/master@{#403131} [modify] https://crrev.com/757e4b7379983a8760fe8f91e40bb3b34d64bde4/third_party/WebKit/Source/wtf/TypeTraits.h [modify] https://crrev.com/757e4b7379983a8760fe8f91e40bb3b34d64bde4/third_party/WebKit/Source/wtf/VectorTest.cpp
,
Jun 30 2016
#3: Seems you've already figured this bug out, but there are a few benchmarks that are mostly or all string operations (find, append, etc.). The memchr fix helped especially on 32-bit ARM, where the compiler was very sensitive to compiling the naive loop badly. (And many platforms have a nicely optimized memchr, just as many have an optimized memcpy that's faster than a bytewise loop.)
,
Jul 1 2016
This looks recovered to me, perhaps with a very tiny regression still. I have another patch that should get another small percentage back here too: https://codereview.chromium.org/2106283004
,
Jul 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/464918ccc62d3a70a976b3c38b5ef6ce6e552f31 commit 464918ccc62d3a70a976b3c38b5ef6ce6e552f31 Author: esprehn <esprehn@chromium.org> Date: Fri Jul 01 04:00:32 2016 Allocate space in StringBuilder for m_string and the new bytes. StringBuilder has an optimization to just store the first String object you append to avoid a copy in cases where you only append one String and then call toString(), for example if there's onlyone Text child and you do element.textContent. When appending the second string we then allocate a Vector buffer and copy the first string (m_string) into it. We weren't allocating extra space for the bytes we were about to append in the second string though, which meant we always did an extra copy immediately following the the buffer creation: String first; String second; StringBuilder builder; // No copy, just stores a pointer to first. builder.append(first); // 1. Creates a Vector // 2. Copies first into it. // 3. Appends second to the Vector which causes a Vector resize. builder.append(second); We solve this by calling reserveInitialCapacity with m_length + the number of bytes we're about to append when we created the buffer. We also inflate the number of bytes to be at least the inline capacity size so that doing: builder.append(first); builder.append('1'); builder.append('2'); doesn't require an immediate resize for appending '2' by ensuring we always overallocate the Vector by at least the inline capacity size so even when appending a String you get 16 chars of extra space to insert something else. BUG= 624642 Review-Url: https://codereview.chromium.org/2106283004 Cr-Commit-Position: refs/heads/master@{#403407} [modify] https://crrev.com/464918ccc62d3a70a976b3c38b5ef6ce6e552f31/third_party/WebKit/Source/wtf/text/StringBuilder.cpp [modify] https://crrev.com/464918ccc62d3a70a976b3c38b5ef6ce6e552f31/third_party/WebKit/Source/wtf/text/StringBuilder.h
,
Jul 2 2016
This looks mostly recovered now, though I do see a very small regression on some of the bots. It's possible the Vector based StringBuilder is slower than the StringImpl based one? :/ Fwiw I can make this benchmark 125% faster with a StringImpl::find optimization: https://codereview.chromium.org/2116533005 We can also choose the right size for the StringBuilder inside HTMLTextAreaElement like we do in Element::textFromChildren which would avoid all of the copies. But that doesn't really address that I made StringBuilder slower... but using a Vector also simplifies the code a ton. Perhaps we should ignore this regression until we see StringBuilder show up in benchmarks elsewhere?
,
Jul 11 2016
Marking WontFix, as per #15. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by sullivan@chromium.org
, Jun 30 2016