New issue
Advanced search Search tips

Issue 624642 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression



Sign in to add a comment

19.5% regression in blink_perf.dom at 402628:402670

Project Member Reported by sullivan@chromium.org, Jun 30 2016

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=624642

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgICg4o-itQsM


Bot(s) for this bug's original alert(s):

android-one
Project Member

Comment 2 by 42576172...@developer.gserviceaccount.com, Jun 30 2016

Cc: esprehn@chromium.org
Owner: esprehn@chromium.org

=== Auto-CCing suspected CL author esprehn@chromium.org ===

Hi esprehn@chromium.org, the bisect results pointed to your CL below as possibly
causing a regression. Please have a look at this info and see whether
your CL be related.


===== BISECT JOB RESULTS =====
Status: completed


===== SUSPECTED CL(s) =====
Subject : Use a Vector for the buffer in StringBuilder.
Author  : esprehn
Commit description:
  
The string overallocation strategy of StringBuilder doesn't really make sense
since we need to create a copy at the end now that StringImpl::reallocate was
removed by https://codereview.chromium.org/1390033003

We probably should add back an optimization that forcibly sets the length of the
String and uses overallocation to avoid doing a string copy in every StringBuilder,
but until we decide to do that lets switch to using a Vector for the buffer since
it can handle growing the buffer automatically for us and makes the code in
StringBuilder much simpler.

Review-Url: https://codereview.chromium.org/2046353002
Cr-Commit-Position: refs/heads/master@{#402638}
Commit  : 23e9098e1bfaa078916ed64a4c6b6f6bd7b00d1b
Date    : Wed Jun 29 01:52:00 2016


===== TESTED REVISIONS =====
Revision         Mean     Std Dev    N  Good?
chromium@402627  2.31891  0.0215305  5  good
chromium@402633  2.32174  0.0151118  5  good
chromium@402636  2.31078  0.0194788  5  good
chromium@402637  2.32138  0.016998   5  good
chromium@402638  1.87615  0.0138534  5  bad    <--
chromium@402649  1.86675  0.0125134  5  bad
chromium@402670  1.85971  0.0246205  5  bad

Bisect job ran on: android_one_perf_bisect
Bug ID: 624642

Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --also-run-disabled-tests blink_perf.dom
Test Metric: textarea-dom/textarea-dom
Relative Change: 19.80%
Score: 99.9

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_one_perf_bisect/builds/1381
Job details: https://chromeperf.appspot.com/buildbucket_job_status/9008469678261742080


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=6439152078290944

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Cc: haraken@chromium.org hajimehoshi@chromium.org jbroman@chromium.org
jbroman@ Any ideas here? This textarea parsing text is very confusing, on desktop it spends the majority of the time in these replace() calls:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/html/HTMLTextAreaElement.cpp?q=HTMLtextAreaEle&sq=package:chromium&l=391

On mobile it seems to be very sensitive to what I do in the StringBuilder, though it's not clear why. Your patch to use memchr seems to have improved this benchmark on Android One if you look backwards in time in that graph.


This textarea parsing *test*
It seems all blink_perf.dom tests have been disabled on desktop for 3 months, sadness:
https://bugs.chromium.org/p/chromium/issues/detail?id=601666

So looking at this in Instruments on my Mac I see:

Running Time    Self (ms)       Symbol Name
116218.0ms  100.0%  299.0       blink::NodeV8Internal::appendChildMethodCallbackForMainWorld
115334.0ms   99.2%  44.0         blink::ContainerNode::appendChild
...
73432.0ms   63.1%   196.0            blink::HTMLTextAreaElement::setValueCommon
63541.0ms   54.6%   56.0              WTF::StringImpl::replace(WTF::StringImpl*, WTF::StringImpl*)
63485.0ms   54.6%   63485.0            WTF::StringImpl::find(WTF::StringImpl*, unsigned int)
...
40212.0ms   34.6%   5356.0           blink::HTMLTextAreaElement::defaultValue() const
26812.0ms   23.0%   20405.0           WTF::StringBuilder::append(unsigned char const*, unsigned int)
4859.0ms    4.1%    3924.0             WTF::Vector<unsigned char, 16ul, WTF::PartitionAllocator>::reserveCapacity(unsigned long)
...
4058.0ms    3.4%    53.0              WTF::StringBuilder::toString()
3996.0ms    3.4%    12.0               WTF::String::String(unsigned char const*, unsigned int)
3977.0ms    3.4%    31.0                WTF::StringImpl::create(unsigned char const*, unsigned int)
...
3492.0ms    3.0%    3492.0            WTF::StringBuilder::append(WTF::StringView const&)

We're spending the vast majority of the time in:

StringBuilder::append 23%
StringImpl::find 54.6%

this is because the test is appending this super massive string into a textarea. Fwiw we're 2x faster than Safari at this benchmark, and we're 25x faster than Firefox. Even with this regression we're also 25% faster than Stable 51 on my desktop.

I wonder if we should revert this or just eat the regression on the microbenchmark for now. The profile makes me wonder if Vector<LChar>::append() is somehow much worse than the memcpy we were doing before... :/
Hmm, 20% regression sounds a bit too large to me though. But if you think we're just gaming micro-benchmarks, I'm okay with suppressing the regression.


Labels: -Pri-2 ReleaseBlock-Beta Pri-1
Owner: yutak@chromium.org
So I figured out what's going on here, apparently VectorTraits<LChar>::canCopyWithMemcpy == false, so Vector is actually looping to append bytes instead of using memcpy. That seems super bad, the parser would be doing the same thing, actually in general it seems the memcpy optimization in WTF is busted.

// This fails to compile now.
static_assert(VectorTraits<int>::canCopyWithMemcpy, "sadness");
static_assert(VectorTraits<LChar>::canCopyWithMemcpy, "sadness");

this is apparently a bug in the IsAssignable type trait in WTF:

static_assert(IsAssignable<int, int&>::value, "sadness");

We really need to fix this before M53 branches, we're not using the memcpy optimization in any of the vectors of primitive values, which are used in performance critical things all over the engine.
Looks like this was regressed here:

https://chromium.googlesource.com/chromium/src/+/d7225e41c5557ef72e5ca439fcf8287be37e7c0f

we just didn't have good perf coverage for it?

Comment 9 by yutak@chromium.org, Jun 30 2016

Interestingly, that revision corresponds to the point the metrics in question
were *improved* from the same level as we are now after this regression.

Maybe textarea-dom is pretty sensitive to whether the data is memcpy'ed or not.
I'm skeptical that manually looping is beating memcpy in the general case.
Testing locally memcpy certainly improves that benchmark by 20% on my
MacBook which means it improves things like textContent on the web too. I'm
not sure what metric you see improved by not using memcpy? Before my recent
patch that benchmark didn't use Vector at all.

We should definitely fix the TypeTraits to handle primitive values
correctly again. :)
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/757e4b7379983a8760fe8f91e40bb3b34d64bde4

commit 757e4b7379983a8760fe8f91e40bb3b34d64bde4
Author: yutak <yutak@chromium.org>
Date: Thu Jun 30 09:47:17 2016

WTF::Vector: Fix primitive types identified as not safe to memcpy.

The culprit was the implementation of IsAssignable<> traits.

Specifically, "std::declval<T>() = X" is invalid if T is a primitive
type, because the assignment of primitives is not an overloaded
operator and the compilability of this statement for a primitive T is
different from that for a non-primitive T.

The fix is to simply change the left-hand side of the assignment to
"std::declval<T&>".

BUG= 624642 

Review-Url: https://codereview.chromium.org/2110183005
Cr-Commit-Position: refs/heads/master@{#403131}

[modify] https://crrev.com/757e4b7379983a8760fe8f91e40bb3b34d64bde4/third_party/WebKit/Source/wtf/TypeTraits.h
[modify] https://crrev.com/757e4b7379983a8760fe8f91e40bb3b34d64bde4/third_party/WebKit/Source/wtf/VectorTest.cpp

#3: Seems you've already figured this bug out, but there are a few benchmarks that are mostly or all string operations (find, append, etc.).

The memchr fix helped especially on 32-bit ARM, where the compiler was very sensitive to compiling the naive loop badly. (And many platforms have a nicely optimized memchr, just as many have an optimized memcpy that's faster than a bytewise loop.)
This looks recovered to me, perhaps with a very tiny regression still. I have another patch that should get another small percentage back here too: 
https://codereview.chromium.org/2106283004
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/464918ccc62d3a70a976b3c38b5ef6ce6e552f31

commit 464918ccc62d3a70a976b3c38b5ef6ce6e552f31
Author: esprehn <esprehn@chromium.org>
Date: Fri Jul 01 04:00:32 2016

Allocate space in StringBuilder for m_string and the new bytes.

StringBuilder has an optimization to just store the first String object
you append to avoid a copy in cases where you only append one String
and then call toString(), for example if there's onlyone Text child
and you do element.textContent.

When appending the second string we then allocate a Vector buffer and
copy the first string (m_string) into it. We weren't allocating extra
space for the bytes we were about to append in the second string
though, which meant we always did an extra copy immediately following
the the buffer creation:

String first;
String second;

StringBuilder builder;
// No copy, just stores a pointer to first.
builder.append(first);
// 1. Creates a Vector
// 2. Copies first into it.
// 3. Appends second to the Vector which causes a Vector resize.
builder.append(second);

We solve this by calling reserveInitialCapacity with m_length + the
number of bytes we're about to append when we created the buffer. We
also inflate the number of bytes to be at least the inline capacity
size so that doing:

builder.append(first);
builder.append('1');
builder.append('2');

doesn't require an immediate resize for appending '2' by ensuring we
always overallocate the Vector by at least the inline capacity size so
even when appending a String you get 16 chars of extra space to insert
something else.

BUG= 624642 

Review-Url: https://codereview.chromium.org/2106283004
Cr-Commit-Position: refs/heads/master@{#403407}

[modify] https://crrev.com/464918ccc62d3a70a976b3c38b5ef6ce6e552f31/third_party/WebKit/Source/wtf/text/StringBuilder.cpp
[modify] https://crrev.com/464918ccc62d3a70a976b3c38b5ef6ce6e552f31/third_party/WebKit/Source/wtf/text/StringBuilder.h

This looks mostly recovered now, though I do see a very small regression on
some of the bots. It's possible the Vector based StringBuilder is slower
than the StringImpl based one? :/

Fwiw I can make this benchmark 125% faster with a StringImpl::find
optimization:
https://codereview.chromium.org/2116533005

We can also choose the right size for the StringBuilder inside
HTMLTextAreaElement like we do in Element::textFromChildren which would
avoid all of the copies.

But that doesn't really address that I made StringBuilder slower... but
using a Vector also simplifies the code a ton. Perhaps we should ignore
this regression until we see StringBuilder show up in benchmarks elsewhere?
Status: WontFix (was: Assigned)
Marking WontFix, as per #15.

Sign in to add a comment