New issue
Advanced search Search tips

Issue 5144 link

Starred by 39 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
HW: ----
NextAction: ----
OS: ----
Priority: 2
Type: FeatureRequest



Sign in to add a comment

Poor performance in the V8 C++ API functions responsible for processing strings.

Reported by c...@cmunt.demon.co.uk, Jun 22 2016

Issue description

Version: <5.0.71.52>
OS: <Linux>
Architecture: <x64>

What steps will reproduce the problem?
1. See the attached report for code samples and benchmark results.
2.
3.

 
v8PerformanceIssue.txt
6.2 KB View Download
Cc: bmeu...@chromium.org verwa...@chromium.org
Components: Runtime
Labels: -Type-Bug Performance Type-FeatureRequest
Status: Available (was: Untriaged)
This lack of performance will have significant impact on nodeJS adoption by the Veterans Administration (through EWD & eHMP)
Labels: Priority-2
Cc: yangguo@chromium.org
Owner: jochen@chromium.org
Cc: jochen@chromium.org
Owner: ----
Here are some things you can try:

- don't use Utf8 for marshalling but either latin1 or utf-16 (depending on v8::String::IsOneByte())
- v8::String::Value is more efficient than WriteUtf8. If you want a latin1 version of that API, I'm happy to add one
- for efficiently passing strings to v8, use external strings. in your benchmark, you're also repeatedly reading the same string. If that's something that can happen in practice (e.g. the same query string being passed in repeatedly), you should also externalize those strings to make reading them faster.

if you have a complete repro case for the benchmark, I'm happy to run that with a profiler, but please consider first implementing the recommendations
As the open source group that I am involved in is interested in getting a fix for this bug/issue, I have posted a related job on upwork.com if anyone involved in this project is interested.
https://www.upwork.com/ab/applicants/887227294493417472/job-details

thanks in advance for your interest/help/making contact to explore how we could progress.
regards
Tony   
Can we get this issue changed to `priority=1`? 
It has more stars than many current `priority=1` issues.

Kind Regards,
Noel da Costa
www.arc2.co.za

Comment 8 by kaich...@gmail.com, Apr 15 2018

Yes be good to get this fixed please.
This bug is slowing adoption of a V8 backend on several large projects.

jb

Comment 10 by i...@bnoordhuis.nl, Apr 17 2018

@9: That's not useful feedback.  Can you elaborate on what specifically is holding you back, and why?
Converting to and from utf8 is going to be slower than almost all other options, even if we optimized the hell out of it. So I would be very curious of the use cases as well.
@yangguo Does the OP's original v8PerformanceIssue.txt attachment not address the use case sufficiently? If not please advise as to what is missing.
@10 the database implementation described in the OP's original v8PerformanceIssue.txt attachment explains the problem with specifics.

In summary:  

This issue causes a bottleneck for the MUMPs database language and its implementations. This class of DB are high-performance hierarchical data solutions (like Intersystems Caché) and have been around since the 1960's but has received a new lease of life due to nodejs API integrations, opening development up to a new generation of coders. As described in the OP's attachment, the bottleneck in the V8 engine means that DB interactions are 3 - 4 times slower than native implementation.

While this is still performant compared to many databases, it's not what this community is after. For the gains to be made (in industries which include Medical, IOT, BlockChain and Fintech) we would like to see this bottleneck addressed.

Without this performance gain it's harder to motivate these businesses to adopt the V8 back-end for production. At this stage everyone interested is kind-of waiting with bated breath, hoping this will eventually be resolved. There's even a bounty for it, posted by someone above!

I see. The issue is that if you have to convert between UTF8 and Latin1/UC16 when passing strings between DB and V8, performance is going to suffer since that operation is O(n). We could probably make things go a bit faster, and I'm actually addressing some of these in issue 6780, but I don't think you can expect a 3-4x improvement.

Do you think it's feasible to change the DB scheme to use Latin1/UC16 instead of UTF8?
Just to add some more background perspective.  There are 2 main implementations of the class of database we're dealing with in this use-case:  InterSystems Cache (https://www.intersystems.com/products/cache/) and YottaDB (https://yottadb.com). Both are heavily used as embedded databases in healthcare, banking and financial services, but also in many other sectors too.  

Whilst in the past, both databases were accessed using a built-in language (dating from the 60s), they now have high-performance C call-in APIs, allowing the databases to be accessible from modern mainstream languages.  JavaScript / Node.js is a particularly attractive language for these databases, because the database is schema-free and JSON maps almost exactly onto the core database structures, allowing, in effect, the implementation of persistent JSON storage, and/or Document Storage database capabilities.

Practitioners of these databases have high performance expectations - using the native, built-in language, even on standard commodity hardware, insertion rates of over 1 million name/value pairs per second is commonplace, with read access rates even higher. The C interfaces provided by these databases potentially provide equivalent levels of performance from mainstream languages.

However, when accessed via Node.js, via the V8 API, those insertion rates drop to around 150,00 - 200,000 per second.  Respectable, yes, by comparison with other databases, but no comparison to what is potentially possible from these databases. 

With JavaScript being such a perfect fit as a modern alternative to the old (many would say out-dated) native built-in language for these databases, you can perhaps understand the keen desire to resolve the V8 bottleneck.  For the sectors in which these databases are used, there will be huge benefits.

Owner: yangguo@chromium.org
Status: Assigned (was: Available)
I'm the OP (with a new email address) and I guess it's time for me to update this thread.  Many thanks to all who have contributed ideas and have otherwise supported this post.

I note that the discussion has become somewhat side-tracked by the observation that the benchmark (as I left it) is using UTF8 encoding.  It is certainly true that there is a cost in using this scheme.  However, if you replace ...

   str->WriteUtf8(buffer)
      ... with str->WriteOneByte(buffer)

... and ...

   String::NewFromUtf8(isolate, buffer)
      ... with String::NewFromOneByte(isolate, (uint8_t *) a)

... there is a small (expected) improvement in performance - but not by much.

The point I'm trying to make is that, for the server-centric context of Node.js, the performance of string management is *generally* slow compared to other environments.

Take the following simple benchmarks ...

First setting new integers in a loop ...

   function test_integers() {
     var max = 100000000;
     var start_time = new Date().getTime();
     var vx;
     for (var n = 0; n < max; n ++) {
       vx = n + n;
     }
     var diff = Math.abs(start_time - new Date().getTime());
     console.log("diff: " + diff + " secs: " + (diff / 1000));
   }


Second, setting new strings in a loop ...

   function test_strings() {
     var max = 100000000;
     var start_time = new Date().getTime();
     var vx;
     for (var n = 0; n < max; n ++) {
       vx = n + "n";
     }
     var diff = Math.abs(start_time - new Date().getTime());
     console.log("diff: " + diff + " secs: " + (diff / 1000));
   }


I appreciate that there's absolutely no expectation that setting strings in a loop will be anywhere near as fast as setting integers.  It's the performance *differential* I'm trying to demonstrate here.

Here are the ratios I see (similar for both a CentOS 7 VM and Windows 10 VM).

   Strings: 18.2s vs Integers: 0.3s (a difference of 60.67x)

For InterSystems Cache' script running on the same system I see results of the following order:

   Strings: 4.5s vs Integers: 1.7s (a difference of 2.65x)

So while JavaScript beats Cache' in setting integers, it (Cache') is much faster when it comes to setting strings.  Now, I'm not trying to advocate a 'gold standard' here but simply drawing attention to the observation that string management in V8 is particularly costly relative to the rest of the environment.

Others have mentioned that the InterSystems Cache' Database can be used as an 'embedded database' with Node.js (I wrote the driver for this) and in medical applications (a key market for Cache') where string based data is pretty ubiquitous, the performance of passing textual data between the application environment (JavaScript) and the database (Cache') does stand out as an area crying out for improvement.

In terms of things I've tried, I have found that using V8/Node.js buffers to pass data between JS and the database is, not surprisingly, faster.  Some posters have suggested using 'External Strings'.  I didn't have much luck with this approach as it's not clear how you would create an 'external string' in the JS environment - which is the starting point for insertions to the database.  In a C++ extension, I found that pretty much any string I created returned false for 'str->CanMakeExternal()'.  I will probably revisit this at some point as the idea underpinning externals seems to be worth exploring - and is conceptually similar to some of the approaches that I've taken in creating my own V8 builds for research purposes.

Finally, my feeling is that the V8 engine is generally optimized as a client-centric engine rather than a server-side one.  I guess its heritage as the JS engine for Chrome is the reason for this.  However, as uptake of the V8 engine on the server increases, and the ecosystem of embedded extensions increases, I'm pretty sure others will be noticing this same 'performance spike' as they try to optimize their own server-side JS applications.

Many thanks as ever for reading this!


Thanks for this detailed explanation! Given the tests you provided in #0 we probably can find some low hanging fruits in the medium term.
yangguo@chromium.org - has there been any progress on this issue?

Unfortunately I haven't been able to spend much time here. I poked around with this a bit and made some small improvements wrt flattening.

Sign in to add a comment