New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 5338 link

Starred by 10 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Apr 2018
Cc:
Components:
HW: All
NextAction: ----
OS: All
Priority: 2
Type: FeatureRequest



Sign in to add a comment

Isolate::ThreadDataTable O(n) lookup is using 65% of our Node.js server's CPU time

Reported by ken...@corp.sandstorm.io, Sep 1 2016

Issue description

Version: 4.5.103.37 (node.js 4.5.0)
OS: Linux
Architecture: x64

Isolate::ThreadDataTable keeps track of thread-local data for each "thread" (seemingly green threads, not OS threads). The data is stored in a linked list.

The functions Isolate::FindOrAllocatePerThreadDataForThisThread() and Isolate::FindPerThreadDataForThisThread(), in particular, perform lookups on this table.

The popular `fibers` npm module implements "fibers" (cooperative green threads). Each fiber switch calls one or both of the above methods. This seems to mean that fiber switching turns out to be O(n) in the number of fibers, and so use of fibers overall is probably O(n^2).

In practice, we have observed our servers pegging the CPU at 100% while spending 65% of this time in the inner loop of this linked-list lookup. I have attached screenshots from linux-perf showing this. This problem is causing serious issues for us in production.

Does it make sense to switch to a better data structure here?
 
Screenshot from 2016-08-31 19-23-41.png
502 KB View Download
Screenshot from 2016-08-31 19-24-22.png
213 KB View Download
Screenshot from 2016-08-31 19-23-11.png
158 KB View Download
Per  Issue 3777 , it seems that v8 actually never removes items from this table, even when the thread is destroyed, which probably explains why we have so many of them. Apparently a work-around is to increase the node-fibers pool size.
I suppose this issue should be closed since we probably wouldn't have a problem if not for  Issue 3777 , which itself is marked WAI because v8 doesn't care about this use case (however popular it may be).
On further investigation we've determined that our application is in fact creating thousands of concurrent fibers in short spikes, and this is sufficient to cause severe performance problems even if we keep them all in a pool for reuse. Although creating bursts of concurrent fibers is probably a performance problem that needs fixing regardless, it's unfortunate that v8's use of a linked list here transforms a momentary hiccup into a long-lived outage.

Any willingness to switch to a hashtable? Seems like it should be a low-impact change.

Comment 4 by adamk@chromium.org, Sep 1 2016

Cc: fran...@chromium.org ofrobots@google.com
Labels: Hotlist-NodeJS

Comment 5 by adamk@chromium.org, Sep 1 2016

Cc: jochen@chromium.org
Status: Available (was: Untriaged)
Update: We've discovered that our issue was caused by our framework (Meteor) choosing to monkey-patch Promise.then() so that it always runs callbacks in fibers. :(

The monkey-patching had a serious bug fixed here: https://github.com/meteor/promise/pull/11

This solves our fiber spikes for now.

However, it still seems like a good idea to switch to a hashtable here. Had this been a hashtable, we would have seen much less collateral damage. We've prepared a patch to this effect for our own use:

https://github.com/sandstorm-io/node/commit/40d20d91951896d2698fe161e43e232cd1860c8f

Would be great to see something like this applied upstream.

Comment 8 by aml...@gmail.com, Oct 1 2016

I saw this issue via the LWN article.

Why is this data structure global in the first place?  As far as I can tell, the only material use case is the call to FindOrAllocatePerThreadDataForThisThread() in Isolate::Enter().  But FindOrAllocatePerThreadDataForThisThread() is doing something very simple: it's looking up a value keyed by the tuple (isolate, thread), and it's a non-static member of the Isolate class itself!

This suggests two different restructurings.  The table could become a member of the Isolate class (so it would be keyed only by the thread id), in which case it would be much smaller and wouldn't force global synchronization between non-interacting Isolates.  Or it could become per thread, which would avoid needing synchronization entirely, but that might make tearing down an Isolate rather more complicated.
@amluto - FWIW in the Node.js case there is only one Isolate so having a table per-Isolate probably doesn't fix our problem.

Comment 10 by i...@zenhack.net, Nov 28 2016

Any interest from upstream in merging the patch that Kenton linked? Right now sandstorm needs a patched version, which is causing build headaches for me trying to contribute -- I'd like to be able to just use my distro's package.
Labels: Priority-2
Hello. Me again, at a different company, working on a completely different project.

I ran into this problem again!

The new project creates lots and lots of isolates per process (whereas the old project had lots of fibers but only one isolate). Hence, for the new project, having a separate table per-Isolate would make lots of sense, as it would reduce lock contention.

Is there some subtle reason it's not done that way?
Cc: -jochen@chromium.org jarin@chromium.org yangguo@chromium.org
Components: Runtime
Labels: Performance HW-All OS-All
Hello again!

It seems other people are running into this problem: https://github.com/meteor/meteor/issues/9796

I wonder if this performance problem is actually very widespread, but isn't well-understood. Under normal profiling, it just looks like an across-the-board performance problem. You have to use a lower-level profiler like Linux `perf` to see the real issue. Maybe people are seeing a lot of slowness but blaming it on JavaScript being slow in general, and throwing more servers at it...
As @kenton just mentioned it is very likely that this issue might be causing problems for more users than those who are actually aware that this is one of the factors contributing to their issues.

@V8 team is there any reason NOT to implement the patch as suggested by @kenton?
> is there any reason NOT to implement the patch as suggested by @kenton?

FWIW, I've attached the version of the patch that I've been maintaining for Cloudflare Workers, which is more correct that the older one I wrote for Sandstorm/Node (the Sandstorm one didn't account for multiple isolates, since Node doesn't use multiple isolates).

With that said, I suspect this patch is not the way the V8 team would prefer to write this code. I don't know enough about V8 internals to know how to do this "right".

(That said, if the V8 people actually do like my patch as-is, let me know and I'll figure out how to formally submit it.)
v8-thread-data-hashtable.patch
4.8 KB Download
kenton you can follow this guide: http://dev.chromium.org/developers/contributing-code

but for a clone of v8 and you will be able to submit a patch
Project Member

Comment 19 by bugdroid1@chromium.org, Apr 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/b49206ded97c4eaac7c273ce004d840a0185d40e

commit b49206ded97c4eaac7c273ce004d840a0185d40e
Author: Kenton Varda <kenton@cloudflare.com>
Date: Tue Apr 24 10:01:26 2018

ThreadDataTable: Change global linked list to per-Isolate hash map.

For use cases with a large number of threads or a large number of isolates (or
both), ThreadDataTable can be a major performance bottleneck due to O(n)
lookup time of the linked list. Switching to a hash map reduces this to O(1).

Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed
spending the majority of CPU time iterating over the ThreadDataTable.
See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story

Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server
framework built on V8 (but not Node), creates large numbers of threads and
isolates per-process. It saw a 34x improvement in throughput when we applied
this patch.

Cloudflare has been using a patch in production since the Workers launch which
replaces the linked list with a hash map -- but still global.

This commit builds on that but goes further and creates a separate hash map
and mutex for each isolate, with the table being a member of the Isolate
class. This avoids any globals and should reduce lock contention.

Bug:  v8:5338 
Change-Id: If0d11509afb2e043b888c376e36d3463db931b47
Reviewed-on: https://chromium-review.googlesource.com/1014407
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52753}
[modify] https://crrev.com/b49206ded97c4eaac7c273ce004d840a0185d40e/src/isolate.cc
[modify] https://crrev.com/b49206ded97c4eaac7c273ce004d840a0185d40e/src/isolate.h
[modify] https://crrev.com/b49206ded97c4eaac7c273ce004d840a0185d40e/src/v8.cc

Labels: Merge-Request-6.6 Merge-Request-6.7
As per user request on the Node.js side (https://github.com/nodejs/node/issues/20083), requesting merge to V8 6.6, 6.7.
Status: Fixed (was: Available)
Switching state to 'fixed' so that it shows up on the merge review list. 
Labels: -Type-Bug -Merge-Request-6.6 -Merge-Request-6.7 Merge-Rejected-6.6 Merge-Rejected-6.7 Type-FeatureRequest
Given that this is a feature request I am not feeling comfortable of merging it to a release branch.
Considering the performance issue the current implementation is showing I wouldn't call this a "feature request", rather a bug fix. 
I'm not sure how "feature request" is defined for V8, but I would intuitively describe this as a bug fix, not a feature request. Seemingly a lot of Node users (especially people using Meteor.js) are hitting this bug leading to production problems. (Admittedly Chromium is probably not affected, though.)
Cc: hablich@chromium.org
IC. Is this a regression?
No. It's a thing that never worked well.
IC. In that case 6.7 sounds okay, but 6.6 is on stable already so that would be uncomfortable.
Labels: -Merge-Rejected-6.7 Merge-Approved-6.7
Project Member

Comment 29 by bugdroid1@chromium.org, May 16 2018

Labels: merge-merged-6.7
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/b6531503bf06e13ceffb5579e935171e0bcd0ea7

commit b6531503bf06e13ceffb5579e935171e0bcd0ea7
Author: Ali Ijaz Sheikh <ofrobots@google.com>
Date: Wed May 16 19:21:41 2018

Merged: ThreadDataTable: Change global linked list to per-Isolate hash map.

Revision: b49206ded97c4eaac7c273ce004d840a0185d40e

BUG= v8:5338 
LOG=N
NOTRY=true
NOPRESUBMIT=true
NOTREECHECKS=true
R=hablich@chromium.org, yangguo@chromium.org

Change-Id: I060f28ffe7a0141b005eab6919eed66056f555e8
Reviewed-on: https://chromium-review.googlesource.com/1058046
Reviewed-by: Michael Hablich <hablich@chromium.org>
Commit-Queue: Ali Ijaz Sheikh <ofrobots@google.com>
Cr-Commit-Position: refs/branch-heads/6.7@{#69}
Cr-Branched-From: 8457e810efd34381448d51d93f50079cf1f6a812-refs/heads/6.7.288@{#2}
Cr-Branched-From: e921be5c4f2c6407936bde750992dedbf47c1016-refs/heads/master@{#52547}
[modify] https://crrev.com/b6531503bf06e13ceffb5579e935171e0bcd0ea7/src/isolate.cc
[modify] https://crrev.com/b6531503bf06e13ceffb5579e935171e0bcd0ea7/src/isolate.h
[modify] https://crrev.com/b6531503bf06e13ceffb5579e935171e0bcd0ea7/src/v8.cc

Sign in to add a comment