New issue
Advanced search Search tips

Issue 606093 link

Starred by 7 users

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 595492



Sign in to add a comment

String conversion costs between v8 and Blink are very high

Project Member Reported by esprehn@chromium.org, Apr 23 2016

Issue description

Google Chrome	52.0.2714.0 (Official Build) canary (64-bit)
Revision	9aefaad4599f95998436597e00dc07bb3d2a130e-refs/heads/master@{#388647}
OS	Mac OS X 
Blink	537.36 (@9aefaad4599f95998436597e00dc07bb3d2a130e)

ex. of a micro benchmark setting .transform = "translate(1px, 2px)" many thousands of times which is what the Animometer benchmark is doing. While I don't know if real pages are setting transforms thousands of times, they are passing strings across the C++ binding layer very often. It's notable that t

Samples   Total %    Symbol Name
-------------------------------------------
64362     100.0%    blink::V8CSSStyleDeclaration::namedPropertySetterCustom
46967     72.9%       blink::AbstractPropertySetCSSStyleDeclaration::setPropertyInternal
45405     70.5%         blink::MutableStylePropertySet::setProperty
44239     68.7%           blink::CSSParser::parseValue
38582     59.9%             blink::CSSParserImpl::parseValue
11692     18.1%       WTF::String blink::v8StringToWebCoreString<WTF::String>
3969      6.1%          v8::String::WriteOneByte
3294      5.1%            void v8::internal::String::WriteToFlat<unsigned char>
2683      4.1%          v8::String::MakeExternal
1811      2.8%            v8::internal::String::MakeExternal
473       0.7%              OSMemoryBarrier
464       0.7%            v8::internal::Heap::CreateFillerObjectAt
1356      2.1%        WTF::StringImpl::createUninitialized
1146      1.7%        WTF::Partitions::fastMalloc

Note that the single string conversion is 18% of the time to set the transform.

(this is with a patch applied that improves the speed of the CSS parser quite a bit: https://codereview.chromium.org/1920583002 without that patch the string conversion is 9%, which is still high, but not 18% high. :)
 

Comment 1 by jochen@chromium.org, Apr 25 2016

Cc: -jochen@chromium.org haraken@chromium.org joc...@chromium.orgm
I cases where we don't expect the string to be a long living thing shared between v8 and blink, we shouldn't externalize it, but just create a copy

Comment 2 by jochen@chromium.org, Apr 25 2016

Cc: -joc...@chromium.orgm jochen@chromium.org
i.e. using v8::String::Utf8Value to access the raw bytes on v8's heap
Cc: peria@chromium.org yukishiino@chromium.org
But the V8-side string can be gone at an arbitrary timing (when a GC is triggered), right?

I don't think we always need to create an externalized string, but we always need to create a string copy.

Comment 4 by jochen@chromium.org, Apr 25 2016

you can't store the pointer, yes, but as long as you have the v8::String::Utf8Value you can access it.

if you want to pass around the pointer, you can strcpy it - at least you don't have to overhead of externalizing it then.
> you can't store the pointer, yes, but as long as you have the v8::String::Utf8Value you can access it.

Ah, this sounds nice. If the externalization overhead really matters in the wild, we can consider introducing another String/AtomicString type that wraps v8::String::Utf8Value. It will be an amount of work but significantly speed up the V8 string => Blink string conversion.



Comment 6 by jochen@chromium.org, Apr 25 2016

note that v8::String::Utf8Value is a Local<> to the string, so you can't store it on the heap.

But e.g. for .transform = "whatever", I don't think we need to pass the value around, just create a new regular WTFString and copy the raw bytes over in the bindings
Cc: adamk@chromium.org
I was discussing this with adamk@ and haraken@ and we realized that the benefits of string externalization are when there's a string in JS that's being used repeatedly to pass through the bindings.

For example:

// here the property name "transform", and the literal string "translateZ(0)"
// should be eternalized since we're using them a bunch and they're interned
// inside v8 anyway.
for (let element of lotsOfElements) {
  eleemnt.style.transform = "translateZ(0)";
}


for (let element of lotsOfElements) {
  // here we should NOT externalize the rope on the RHS since every iteration
  // is vending a fresh string anyway. This is what Animometer and such are
  // doing.
  element.style.transform = "translateZ(" + getFoo() + ")";
}

I think a good heuristic for this might be to add a getter to v8:String that checks if the string is in the NewSpace, and then only externalize strings that are not in the NewSpace (interned strings, ones that have survived a GC already, etc.)

Then in blink we'd do:

if (string->HasBeenAroundAWhile())
  ExternalizeString(string);

to avoid the cost.
Labels: -Pri-3 Pri-1
Owner: peria@chromium.org
Assigning to peria@.

This may affect (i.e., improve) performance of Animometer, so let me raise the priority.

Comment 9 by peria@chromium.org, May 30 2016

First of all, V8 seems to avoid externalization of short strings on new heap.
https://code.google.com/p/chromium/codesearch#chromium/src/v8/src/api.cc&l=6013&cl=GROK&gsn=CanMakeExternal

and I experimented a perf test with making CanMakeExternal() to return false if the strings is in new space.
https://1ede9c8b4261401ce30e527057727688b46ce0d9.googledrive.com/host/0B4QUVw-AB8wPVXpWVzZ5T1hObGc/no_externalize.html
("before" figures tot, "after" figures not externalize strings on new heaps)
(On Ubuntu 14.04, Z620, run 20 times per page per build.)

mean_frame_time of Animometer is ~41ms, which does not meets 60fps, is what we want to reduce with this change, but it looks not be changed.
I'd just not externalize those strings ever (as described in comment #2)

Comment 11 by peria@chromium.org, May 30 2016

I updated the file to follow Jochen's suggestion.
https://1ede9c8b4261401ce30e527057727688b46ce0d9.googledrive.com/host/0B4QUVw-AB8wPVXpWVzZ5T1hObGc/no_externalize.html
Now we have "tot", "new_heap" and "all" columns.  "before" was renamed to "tot", "after" was renamed to "new_heap", and "all" was newly added.

"all" figures the score without any externalization, and frame_time of Animometer on "all" scored ~9% better than "tot".
It maybe valuable to investigate in what situation we avoid externalizations.
thanks for doing the numbers

I think we should only externalize large strings, or strings we know that we'll share.

For most setters that should not be the case.
Is the 9% improvement a real one? As far as I see the performance graph, it looks like a noise.

As discussed offline, let's measure a result for setAttribute()s and Elliott's parsing benchmark as well.

If you want to look at Animometer you need to run with Fixed Complexity and choose a complexity that makes the script take a while. For example you might choose 800 on your  Z620. Then you want to look at traces to see if the script execution time goes down with your patch applied.
That's running the focus test (see the test-name param), you want to go to:
http://rawgit.com/WebKit/webkit/master/PerformanceTests/Animometer/developer.html

1) Check "Fixed Complexity" on the right hand side.
2) Expand the "Animometer" section on the left side.
3) Check the "Leaves" test.
4) An input will appear next to the test name, enter your desired complexity there.

You can then choose to make the test run for longer than 20 seconds so you can get a better trace. That's the box on the top right.

Then click Run.
Status: Started (was: Untriaged)
Thank you for the steps.

As far as I tested it on my MacOSX, (Leaves, Fixed Complexity 800, 60sec)
- I got similar results (43-44FPS) on 3 types of Chromium(*).
- Stable Chrome 51 (29FPS) runs faster than Safari (17FPS+/-31%), and Chromium 53 (44FPS) runs faster than Chrome 51.
(*) "tot", "new_heap", and "all" in #11

I'm wondering if this benchmark is important for the optimization of string conversions.
Are my experiments' results consistent with Elliots's ones?  Or is there any other reason to focus on this benchmark?
You shouldn't be getting any FPS numbers, you want to look specifically at a large complexity cost and then look at the script running time in a trace and also in a profiler (like perf or Instruments.app). The benchmark is bottle necked on other things, so making the script 20% faster probably won't improve us much, but does demonstrate where the string costs are adding up.

How long does the script take in a real time on your machine? How long does it take after this change? How much time is spent in string conversions when looking at a profiler? How much after this change?

Don't worry about the benchmark score or about FPS.
I'd also expect the GC pauses to go down.
Re #17
Hmm, I feel Animometer is not a good test to work for for this issue, because of several reasons;
1) as you said, the string conversion, what we're interested in, is not the bottleneck of the test
2) string conversions take very short time in the test.  they took <1sec in a 60sec run in my experiment
3) we need other tools to see the metric
4) our future changes can affect on time of V8GC and we need to see its difference, but I think WebGL, which is the main focus of the test, can make larger effects on it.

I think it much better to pick out a part of string conversions from the benchmark, isn't it?
Or do you have any numbers which can be seen only in this test?
BTW, I measured time to run first three tests in [1] with 4 different settings.
I did not concern situations where each V8String is converted into WTFString or AtomicString.

*Settings
A) tot
B) do not externalize any V8Strings
C) do not externalize V8Strings in new heaps
D) do not externalize short(<v8::i::ExternalString::kShortSize) V8Strings

   create / setAttr / appendChild
A) 255ms  / 1459ms  / 672ms
B) 337ms  / 1299ms  / 787ms
C) 250ms  / 1242ms  / 671ms
D) 345ms  / 1665ms  / 777ms

with this simple experiment, I feel the heuristic suggested in #7 is good to work for.
(we also need to take #12 into consideration.)


[1] https://1ede9c8b4261401ce30e527057727688b46ce0d9.googledrive.com/host/0B4QUVw-AB8wPVXpWVzZ5T1hObGc/create_element.html
Yeah, Elliott's suggestion in #7 makes sense to me, if jochen@ is fine with it.

FWIW, v8::i::ExternalString::kShortSize would be too short a threshold to experiment with jochen's suggestion (externalize only large strings) though.

sgtm
MakeExternal morphs a sequential string into an external string. A normal external string has two payload fields: the raw address of the resource object, and a cache to the external backing store.

If the sequential string is very short, it only has space to morph into an external string with only one payload field, so we drop the cache. Without the cache, JIT code in V8 cannot access the string content, because it has to make a virtual method call on the resource object.

The reason the check against v8::i::ExternalString::kShortSize is in place is to avoid this. Removing that check is a trade-off. Making short strings external would make repeated access by the embedder faster, but access by V8 slower.
Thank you for the explanation.
I got the reason why CanMakeExternal() checks the size with kShortSize, and I agree that removing it can be a trade-off.

Then I'd like to know how much merit V8 gets with special handles of short strings, and how likely it is used in the real world.
note that external strings put additional pressure on GC.

What about certain APIs just not even attempting to externalize strings <1k? We could start with all setters for css attributes
> What about certain APIs just not even attempting to externalize strings <1k? We could start with all setters for css attributes

That was what I made in PS1 of the CL being reviewed.  I think we don't need a threshold for the length in the new API.
No, the CL also looked at whether the string is in newspace and stuff. I wouldn't even bother to check this.
I don't want to do that. Our bindings need to be generic and fast by default, you shouldn't need to choose the right string type or annotate calls. Let's start with the new space optimization.
Re: #28
Ah, right.  It did not skip other checks. I'll make another method.

Re: #29
Currently, no V8 API provides if an V8 object is in new/old heap.  We have to
update CanMakeExternal() or make a new API to check strings' generation.
exposing gc state isn't really generic or fast by default as well.

Just not externalizing small strings at all is also generic and doesn't require any kind of annotation.
Hmm, I thought the new space optimization should be done only in setAttribute().
If it can be available everywhere, I don't need to create a new method which exposes inside of V8.
Using the GC state matches the correct semantics of the external string optimization, short strings doesn't. This isn't just about the string copy, it's also about the AtomicString, and 

For example:

setAttribute("class", ...);

We need to atomize "class" which means a hash table lookup.

.style.transform = "none";

the "transform" here also needs to be looked up in a hash table. Note that WTF::Strings cache the hash, so repeated lookups in tables are fast. We also want to externalize the string "none", since it's common and a literal. If this had been the result of a concat like:

.style.transform = "n" + thing;

where thing is "one", we don't want to externalize since we're going to get a fresh string in the bindings every time.

If we copy short strings, we're imposing a WTF::String allocation, a copy, a hash computation and a hash table lookup for AtomicStrings. That's really bad for short literals.

Safari doesn't need to do any of those things, the flattened representation of a JSC String is a WTF::String, so there's no copy, the hash is cached always, and the AtomicString lookup propagates into the JSC strings. Callers also don't need to think about "is this string used much" in the bindings, which isn't a question you can reasonably answer without using the lifetime of the object.

A string that's used repeatedly needs to be externalized, a string that's used once doesn't. How do you know if a string has been around a while? The only way I can come up with is using the GC state of it.

Yeah, I agree with Elliott.

If the V8 team doesn't want to expose isInOldSpace() because it exposes GC states, would it be an option to rename it to isLongLiving() or something? We don't care if it's in the old space or not, what we care is if the string is long-lived or not.



Comment 35 by peria@chromium.org, Jun 10 2016

following up my comment;

#32:
> If it can be available everywhere, I don't need to create a new method which exposes inside of V8.
If so, what we need to do is to remove checking the length of strings in v8::String::CanMakeExternal().  It already has a check of the generation.
v8 also has a hash for its strings. we should make v8 and blink use the same hash function for its strings, then we can avoid recomputing it

Comment 37 by peria@chromium.org, Jun 29 2016

Blocking: 595492
Ping on this.  Did we stall?
Owner: jbroman@chromium.org
Is this still a high-priority performance issue?

- If yes, we should revisit this.

- If not sure, I'd prefer investing our efforts on exporting RuntimeCallStats to the binding layer, identifying the most important performance issues in real-world websites and fix them (rather than looking at this particular micro-benchmark).


Given we already do a generation check (and thus shouldn't wind up externalizing new-generation strings), and internalized strings (like literals) should end up being externalized pretty early on, I'm a little surprised to hear this is that hot, but I can look into it at some point.

#38: Are you seeing this show up a lot in some particular scenario, or just generally curious?

Comment 41 by peria@chromium.org, May 18 2017

Cc: esprehn@chromium.org pdr@chromium.org
 Issue 346154  has been merged into this issue.
> Given we already do a generation check (and thus shouldn't wind up externalizing new-generation strings)

Would you help me understand where the check is done?

v8::String::CanMakeExternal

Comment 44 by peria@chromium.org, May 18 2017

v8::String::CanMakeExternal() does, and it is called in V8StringToWebCoreString().
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/bindings/StringResource.cpp?type=cs&q=V8StringToWebCoreString&l=83


Sign in to add a comment