New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 639948 link

Starred by 17 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment

Reduce CPU usage when many (200 or more) PeerConnections are run simultaneously

Reported by igor.kro...@gmail.com, Aug 22 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2835.0 Safari/537.36

Steps to reproduce the problem:
1. Go to https://kroitor.github.io/p2p/
2. Open chrome://webrtc-internals in a parallel tab
3. Watch the number of WebRTC connections grow
4. Depending on CPU/memory noticeable lags will appear after about two hundred connections or so.

What is the expected behavior?
The Javascript application itself does the following:
1. It creates a pool of 20 'nodes', and assigns a 'central' node
2. Each node opens a data channel to each other node by sending SDP offer/answer through the 'central' node.
3. All nodes become interconnected one-by-one (one at a time) - a total of 20 nodes where each node is connected to all the other nodes. Having 20 * 20 links is equal to 400 connections in a single tab.

What went wrong?
After opening several hundreds of WebRTC data channels, a noticeable lag appears and starts to accumulate, though the application is idle during that time - it does nothing, except having a lot of open connections (no data transmission takes place, the channels are open and idle). The Google Chrome Helper process starts eating 200% CPU.
That is very strange as if the browser was continuously polling all of the open connections. I cannot find any other reasonable explanation for the lags, as there is a clear direct dependency between the amount of open connections and the time it takes just to handle all that internally. This is even more strange considering battery life on mobile devices. Does it really have a design limitation on the amount of open WebRTC connections in a single Javascript application? What is wrong with WebRTC scalability (I suppose, having a thousand open UDP sockets is not a big deal with async IO APIs in modern OSes)?

Did this work before? N/A 

Chrome version: 54.0.2835.0  Channel: canary
OS Version: OS X 10.11.6
Flash Version: Shockwave Flash 23.0 r0
 
Components: Blink>WebRTC
Labels: -OS-Mac OS-All
I uploaded a simpler test for easier recreation of the problem...
The steps to reproduce are the same:
1. Go to https://kroitor.github.io/p2p/, open js console to watch peers being created and exchange their SDP
2. Open chrome://webrtc-internals in a parallel tab
3. Watch the number of established WebRTC connections grow a pair after another
4. Depending on CPU/memory noticeable lags will appear after about two hundred connections or so.

The application now creates 200 pairs of RTCPeerConnections (400 total), one pair at a time, once in a 1000ms interval, and interlinks them with their SDP descriptions.
After about 200 connections noticeable lags appear and CPU usage breaks through the roof of the building. At 400 connections the browser FPS almost freezes (how is gpu/ rendering connected to networking?).
I've tested it on another more powerful machine running Windows...

Chrome Canary 54.0.2836.0
Windows 8.1 (6.3) 64 bit
Intel i7 4770k @ 3.5GHz / 16GB RAM

And it does not use that much CPU as OS X version does. In fact, it is idle. And that's exactly how it should be, because nobody is sending anything, 400 connections get established and that's it. After establishing those 400 connections, the app is idle.

So, I suppose, this problem is Mac-related or related to how network IO is handled programmatically on OS X.

What's also interesting, is that opening chrome://webrtc-internals having 400 established WebRTC connections raises CPU usage from idle to about 40% and keeps it there...
Screenshot (270).png
608 KB View Download
Screenshot (271).png
618 KB View Download
Cc: hta@chromium.org
Thanks igor.kroitor@ for the detailed report and all the data. It sounds reasonable to me that such high numbers of data channels and connections would put a high load on CPU and memory - after all it's more to it than just a socket for each. 

That said, I think it would be interesting to explore use cases like this and make the APIs perform well for them too. We've been talking about setting up similar high-load tests in the past, but the more normal use cases have been prioritized so far.

hta: What's your view on this? 
What I am trying to implement is a serverless DHT which would work from within a browser tab without having to connect to a central party. It's a distributed p2p system which does not use any signalling server at all. I am currently only using data channels (no media streaming). 

Unfortunately, the architecture of WebRTC is not really as p2p-friendly, as it claims to be. I would say, it is friendly, but only as long as both peers trust an intermediary (STUN/TURN) between them. Even when using a UDP transport the WebRTC framework requires having an 'open and established connection' on both sides. That's actually the requirement of a TCP transport, while the UDP transport is datagram-based, meaning that you can send directly to anyone without having to actually 'connect'. But WebRTC over UDP still requires you to establish a connection. This kills any ability for apps to implement multicasting. You're not able to send and receive on a single UDP port without overloading the system with useless connection primitives.

Having a possibility to receive UDP datagrams on the same public UDP port is actually the key principle which allows modern distributed network architectures like Chord / KAD / Kademlia / etc to exist... All of them utilize a routing table of up to several thousand peers, which does not cost much in terms of CPU or memory. But having a thousand UDP address/port pairs is not the same as having a thousand open WebRTC one-to-one connections. Direct UDP sending and receiving enables discoverability of peers on the internet in distributed decentralized networks. The only technical problem with dicoverability is that you have to pinhole NAT to be available to public. 

I use several public STUN servers merely to obtain my public (dynamic) IP. What's very uncomfortable with WebRTC API is that it requires a user to 'establish a connection'. What's even more uncomfortable, is that I have to use SDP offers/answers. You have to negotiate the offer/answer directly with each peer (and that involves exchanging SDP offer/answer and some signalling) before you're able to send or receive anything. You cannot just start sending to a known UDP port. This is a one-to-one design, it forbids any multicasting and true distributed architectures, or at least it makes decentralized p2p concepts very hard to implement (managing all those connections and negotiations) with this kinda-'p2p' WebRTC API.

So I am trying to implement a connection pool and relay the offers/answers to new peers through already connected ones. This is how a peer in a distributed network would usually join the network in principle - by crawling the nodes of the network one by one and by getting to know the surrounding network better. Although I still have to use that freaking STUN mechanism to initiate the join process! It is definitely NOT friendly in terms of decentralised p2p systems.
What's even worse the WebRTC API requires starting several threads per each RTCPeerConnection + a socket/port pair for each of them even if using UDP transport. This is indeed a very high overhead for UDP communication. And especially - for mobile devices (which is very important, I think). A more reasonable approach would be to use a single UDP socket with UDP transport and multiple sockets with TCP (as it is usually done the Berkeley sockets way). Or to make that configurable, or controllable from userland, at least. 

Instead of that, the WebRTC opens a separate socket/port and a bunch of management threads for each UDP data channel, which seems like an overkill itself. In terms of scaling – this almost does not scale at all, limiting the WebRTC technology to online-calls (with questionable QoS). It could be much more useful if it gave more freedom to users to set up the underlying networking APIs on their own.

Comment 7 by hta@chromium.org, Aug 23 2016

Opening UDP ports is not a very expensive operation. I think we need to look elsewhere to find the problem.

Note: The reason you need public STUN servers (and in some cases TURN servers) is to get through NAT boxes. You can connect to a system with public addresses without these servers.

Given that the current Internet doesn't make each endpoint on the Internet reachable by default, "establish a connection" is needed in order to communicate for all cases except when one of the endpoints has a public IP address. Since peer-to-peer was a high priority design goal for WebRTC, we chose to require "establish a connection" in order to communicate.

There is more information available in the WEBRTC core documents, such as draft-ietf-rtcweb-overview and draft-ietf-rtcweb-security-arch. I advise reading them in order to understand the design goals.


Cc: tkonch...@chromium.org
Labels: Needs-Feedback
Tested the same on mac 10.11.6 chrome version 54.0.2837.0 with steps in comment #2 - The application created 200 pairs of RTCPeerConnections and then it was idle and the cpu was increasing as shown in the screenshot


igor.kroitor@, Could you please confirm if this is the issue you are facing

Screen Shot 2016-08-24 at 12.13.11 PM.png
704 KB View Download
tkonch...
I've just updated to most recent 54.0.2838 on mac os 10.11.6
And here is what I get:

Screen Shot 2016-08-24 at 11.38.53.png
875 KB View Download
tkonch... Yes, this seems to be the same issue (only applicable to mac).
Ok, to help confirm this issue I've uploaded an even simpler test. It is plain ES5, with no dependencies, less than 50 lines of JS code.

The steps to reproduce are the same, except that the urls is a little different:
1. Go to https://kroitor.github.io/p2p/test.html, open js console to watch peers being connected
2. Open chrome://webrtc-internals in a parallel tab
3. Watch the number of established WebRTC connections grow a pair after another
4. Depending on CPU/memory noticeable lags will appear after about two hundred connections or so.


The source code for the test is VERY simple:

//-----------------------------------------------------------------------------

var i = 0
var interval

interval = setInterval (tick, 1000)

function tick () { return (i++ < 200) ? pair () : clearInterval (interval) }

function pair () {

    var peer1 = new webkitRTCPeerConnection ({
        iceServers: [{ urls: [ 'stun:stun.ideasip.com', 'stun:stun.schlund.de' ] }]
    })

    peer1.onicecandidate = function (event) {

        if (event.candidate) return; // do nothing, wait for more candidates
            
        var peer2 = new webkitRTCPeerConnection ({
            iceServers: [{ urls: [ 'stun:stun.ideasip.com', 'stun:stun.schlund.de' ] }]
        })

        peer2.onicecandidate = function (event) {
            if (event.candidate) return; // do nothing, wait for more candidates
            peer1.setRemoteDescription (peer2.localDescription)
        }

        peer2.onnegotiationneeded = function () {
            peer2.setRemoteDescription (peer1.localDescription).then (function () {
                return peer2.createAnswer ().then (function (answer) {
                    return peer2.setLocalDescription (answer)
                })    
            })
        }

        var channel2 = peer2.createDataChannel ('data')
    }

    peer1.onnegotiationneeded = function () {
        return peer1.createOffer ().then (function (offer) {
            return peer1.setLocalDescription (offer)
        })
    }

    peer1.ondatachannel = function (event) {
        var channel = event.channel
        channel.onopen = function () { if (channel.readyState === 'open') { console.log (i + ' connected') }}
    }

    var channel1 = peer1.createDataChannel ('data')
}

//-----------------------------------------------------------------------------

As you can see, 200 pairs of connections are created here and nothing else is done, after connecting those 200 connections the app should be idle because it does not send or receive anything. But it still eats all CPU (and there is also a thousand threads in chrome helper processes, several threads per each connection)

Comment 12 by hta@chromium.org, Aug 25 2016

Thanks for investing the time to make this very simple demo of the issue!
(can't look at it immediately, but want to make sure I say thank you!)

Comment 13 by hta@chromium.org, Aug 25 2016

Cc: tommi@chromium.org jansson@chromium.org
Christoffer, can you take a look at this? It seems like an excellent scaling test.
Copying Tommi since it's a scaling issue.

I also retried this short test multiple times on my Windows system with Canary 54.0.2838.2 and, unfortunately, I have to say that the Windows build is also subject to this issue, it was not very obvious before, because my Windows setup is much faster, than my Mac laptop.
I also found an interesting blog entry along with many others). It's from another developer researching the same topic who came to same conclusions after all:
http://blog.daviddias.me/2014/12/20/webrtc-ring

This is what he says in the end of his research on implementing a distributed (true p2p) infrastructure over WebRTC:

> It was observed that opening 25 browser tab instances, creating an individual node in each, which requires 2 WebRTC data channels, one for the predecessor and another for the successor, caused the browser to become incredibly slow and unresponsive, this has led me to understand that even though there were several things competing for machine resources, implementing a full Chord algorithm might be unpractical, since it requires 160 connections, or so called fingers, to another peers, which would result in ~320 data channels per each node.

> One of the questions I’m currently presented is to identify if there would be the consequences from reducing the number of bits available per nodeID (currently 160) and therefore, reducing the number of fingers needed to implement a version more close to the Chord routing algorithm, or if there is a way to adjust the size of the address space depending on the service needs, creating smaller rings that are easier to manipulate and propagate messages.

The guy finally came out with the following reduced implementation:
http://blog.daviddias.me/2015/03/22/enter-webrtc-explorer

But he notes that there are design limitations implicated by bad scaling of WebRTC connection architecture. 

I am currently a member of a team researching the possibility of laying a DHT infrastructure over the WebRTC API. We have enough experience in c/c++/objc/js/html/http/networking/tcp/ip/nat/unix/etc to study the implementation of WebRTC in Chromium and to propose changes to its internals.

A brief overview of the source code revealed that WebRTC creates several threads per each connection in peerconnection / peerconnectionfactory code. This can only scale up to some limited number of threads (which is usually the number of CPU cores multiplied by some low integer constant). It's actually a serious implementation fault in terms of scaling of the networking subsystem. Starting two or three threads per each connection primitive is very expensive, because it scales only to such extent that allows a very limited number of voice/video/intensive data channels and no more. Practically speaking, that means you can have a p2p-chat of 10-15 people, but you cannot afford having 100-150 people online with you (on a single line) without a server, which is very uhm... that's weird, at least our days. And, if you are transmitting data very frequently, like, say, playing an online game with a team of your friends or having a conference with your colleagues, or whatever your use case might be, you are limited to even lower numbers of connections if you want to have any computational power and battery left for userland tasks.

This kind of implementation which threads separate long-living tasks is actually an indication of the fact that nobody had taken scalability issues into account when the source code was created. And I can understand the dev team, because the concept of mass p2p decentralisation is very far away from the concept of direct voice / video calls between two users. These issues can be fixed by redesigning the peerconnection management algorithm. A common solution to this class of problems would be to create a pool of worker threads which would be shared across the pool of connections instead of creating multiple threads per each connection. This fix does not require any changes to the API or the W3C draft and would only change the implementation internals seamlessly. This could be a lightweight fix that could've corrected the problem and would allow the browser to remain responsive even after a thousand or two of RTCPeerConnections. WebRTC would become much more practical and scalable. This is not infinite scaling, though, but at least this raises the limit of connections high enough for most of practical needs.

But the 'implentation fix' is not the main issue here. The main issue is requiring a user to 'establish a connection' and that itself implies the need for connection management. Although UDP hole-punching does not require a connection, WebRTC still does. After a NAT mapping is established, anyone can send to a mapped public port and that transmission will be forwarded through. So, having a connection is not really needed for UDP (a datagram connectionless transport), even to traverse through NAT. But the design of WebRTC does require a connection. 

This requirement hits mobile devices really hard. Holding even 10 people on the same 'line' on a mobile device would be a pain with such a cpu-demanding connection management strategy. And I don't think this is affordable nowadays in phylosophical terms. Mobile devices are the base for p2p. They should be, at least. Having a p2p-design which requires a lot of connection management is really inefficient, costly and anti-mobile – it hurts the battery, power management, the speed. That really damages the adoption of WebRTC and restricts the growth of mobile userbase/appbase using WebRTC as a technology, as a whole. 

It's not really a peer-to-peer design being only practical for one-to-one calls (roughly speaking). I agree that calling is very important, and yes, I know that design goals of WebRTC are shifted more into media streaming, not into scalability / multiuser aspects nor decentralisation aspects of p2p. But this design implication hits other very interesting and very wanted use cases of true distributed peer-to-peer communication, such as data infrastructures over virtual networks, free VPN, DHTs, torrent-like systems, crypto-currencies, decentralised CDNs, conference calls, multicam/multiuser video, public streaming via browser, public contract evaluation, etc, etc...

So, current design leaves a user with occasional one-to-one calls. Maybe a chess game. Or a direct messenger. Or a newsfeed. That's it (nothing conceptually new comes out of direct one-to-one connections, as there is nothing here that wasn't there already on the scene implemented in some form or another, while modern web is all about many people communicating and collaborating directly to each other simultaneously in small and big groups). Instead of restricting the design to one-to-one cases WebRTC by its existence could've unified the distributed p2p-space with the web-space effectively revolutionizing the concept of the web by making it truly decentralised and multi-peer-to-peer for the first time in the history of humanity which by itself opens possibilities for solving global problems that seemed impossible previously and that can change our entire world! ))

So after all, I think at least one of the two should be done:

1. An implementation-level fix which would revise current code responsible for peer connection management and threading policy. This does not require any change to API or to the WebRTC W3C spec draft. It does not require a big effort to recode that part. A thread pool is a very common and well-known concept, reimplemented a million times. It doesn't even have to be fixed exactly that way though - any strategy can be used that addresses scaling at least somehow. Current implementation does not really address this issue at all. The code of peerconnection and its factory is ok, and would not require any change, but would only need a thread management strategy which enqueues and reuses threads in a more resource-effective way with mobile devices and power consumption in mind.

2. A design-level fix which could really lead to the web of tomorrow with only a slight change to the draft spec, giving more freedom over UDP transport to the user (just a tiny little bit of change would dramatically improve the adoption of WebRTC).

In my opinion, both of these steps should be taken as they don't contradict (in fact, they complement) each other. Please forgive it, if this is not the kind of bug discussion/reporting you're used to.
I think this bug needs to more clear about what it's asking for.  

Is it asking for being able to run 200-400 simultaneous PeerConnections?   

Is it asking for being able to run WebRTC without ICE (the "connectivity management")?

Is it asking to be able to run WebRTC in 25 different tabs simultaneously?


By the way, when I run that test code on Linux, it uses about 300% CPU for 400 PeerConnections. 

Comment 17 by hta@chromium.org, Aug 30 2016

I read the most important point as being able to run 200 PeerConnections simultaneously. The other points seem to have been "raised in passing".

pthatcher: The most important issue is scalability, ofcourse. Modern p2p-infrastructures demand more than a hundred of connections and that's a usual and regular thing, actually. A limit of several dozens of connections is not reasonable at all in modern world, this kind of approach (why would you ever need more?) is somewhat 'prehistoric'. Bill Gates once said, that nobody will ever need more than 640KB of memory... But we all know what scalability means.

There's also a thorough explanation, what the bug is about. It is about building a true serverless p2p-network of browsers. That requires a lot of connections among them (several hundred). As a web developer you have to use the WebRTC API which has scalability problems. All other alternative low-level networking APIs are now deprecated in Chrome and devs only have WebRTC left. A real p2p-network can be built on top of WebRTC by having the WebRTC scalability problems corrected, or by changing the WebRTC standard and giving more freedom over low-level socket API to developers. This problem can be solved either way - by having hundreds of connections or by changing the standard. But it's better to do it both ways, I think.
Summary: Reduce CPU usage when many (200 or more) PeerConnections are run simultaneously (was: WebRTC scalability issues)
OK, I have changed the bug title to match
Labels: -Needs-Feedback
Status: Available (was: Unconfirmed)
This bug does not affect Firefox.

That is to say Firefox returns to idle after creating hundreds of connections, as expected.
I refined the test to factor out generating hundreds of unique crypto certs, and also logging large amounts of text to the console is itself problematic in Chrome.

Each peer connection is wasting 1-2% CPU which is unacceptably broken for a dozen, let alone hundreds of peers.
rtc_stress_test.html
1.9 KB View Download
Thanks, acmesqua...
Firefox is indeed not affected by this. I've ran both tests (mine and yours) on Firefox, and I can confirm that Firefox really does return to idle, no problem.
So, it is a Chrome/ium-specific bug.

Owner: guidou@chromium.org
Status: Assigned (was: Available)
Hello! Is there any progress on this? It's been a year now.
No progress yet, but it's in our queue.

Comment 27 by tommi@chromium.org, Aug 20 2017

One thing that has already landed is that a relatively low limit has been
put on the number of threads allowed for logging. Previously it was a
thread per pc instance but we don't allocate those after reaching the
threshold.
tommi: That's great to know! Hope to see this fixed soon as well! Most people don't yet realise how much they need it really ;)
Status: Available (was: Assigned)
Cc: guidou@chromium.org hbos@chromium.org
Components: -Blink>WebRTC Blink>WebRTC>PeerConnection
Owner: ----

Sign in to add a comment