WebSockets receiving in chunks if sub 20ms latency
Reported by
vans...@gmail.com,
Feb 14 2017
|
|||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36 Steps to reproduce the problem: 1. Feed a stream of data through websockets at 16.7ms delay between frames (60fps). 2. Notice how sometimes the frames proc the completion with 0-1ms difference in chunks of 2-3. 3. Do more things on the main thread now like send input, decoding or glswapbuffer. Now notice how receive message completion gets called in chunks of 5-8 messages with 0-1ms between them. What is the expected behavior? The expected behavior is to process the websocket frames as they come with minimal overhead and delay. What went wrong? I think the priority is not so realtime for websocket connections. There is some kind of head of line blocking that happens sporadically and randomly, destroying the realtime properties making websockets not suitable for lower latency things. Did this work before? N/A Does this work in other browsers? N/A Chrome version: 55.0.2883.75 Channel: n/a OS Version: Debian Sid Flash Version:
,
Feb 16 2017
If the main thread is busy the onmessage callback will be delayed. If the messages are larger than 128KB then there may be further delays until all parts have been received. I cannot explain the behaviour when the main thread is not busy. Are the server and client on the same network? What happens in other browsers? There is no such thing as realtime priority in the browser. There is only best-effort.
,
Feb 16 2017
I am not sure if the mainthread is busy. This is in a NaCL plugin and everything on the main thread is not allowed to block, also I am using callback everywheres. I am not sure how the main thread works here, if the processing still happens on the main thread or if a another thread takes care things. I tried to move the websocket connection to another thread with blocking on the receive but I had the same result. The messages are around 3000-4000bytes with some larger messages that could be just over 128kb. I will repeat the test again enforcing a smaller size. This test was done on the localhost with 64kb MTU. Wireshark capture confirmed the websocket frames leave the localhost server about 17~ms apart. I can try to create a more solid test case if required. It would probably be a server+client app, the server streams 4kb ws frames 17ms apart, the client is a NaCL plugin that asyncronously receives.
,
Feb 20 2017
,
Feb 20 2017
Thanks for the details. Is the client a smart phone? If it's a PC, we appreciate if you could do the Wireshark test also on the receiver side.
,
Feb 21 2017
This cannot be triaged from TE-end, adding "TE-NeedsTriageHelp" label for further triage.
,
Mar 2 2017
I created a test environment for this and I am seeing 4 types of cases. Case #1 sending 100 ws frames of size 1. Case #2 sending 100 ws frames of size 1300. Case #3 sending 1 ws frame of size 130000. Case #4 sending 1 ws frames of size 65000. All tests done on localhost MTU 65536, every tick interval is 17 milliseconds, tests were done for 30 seconds. nagles algo was disabled on the socket. A failing case is when time_ws_frame - last_ws_frame > 20. Both the server and client measure. Case #1. Fails: 5 on server, 33 on client. min 21, max 91, mean 40, median 26 Case #2. Fails: 3 on server, 31 on client. min 21, max 77, mean 39, median 35 Case #3. Fails: 1 on server, 14 on client. min 21, max 30, mean 24, median 24 Case #4. Fails: 5 on server, 14 on client. min 21, max 29, mean 26, median 26 These test cases show that sending many smaller frames produces a bigger studder then large frames. Also this test was done while mostly idle. If the chromium tab process starts doing more things like decoding video and handling input/mouse events the studder becomes much more prominent. Let me know if this test satisfies or tweaks I need to consider. Attached is the index.html and c websocket server. deploy.sh has instructions how to build it.
,
Mar 3 2017
Assigning to myself. I will try to look at it next week.
,
Apr 18 2017
Thank you for the test case. It was very useful. Sorry for the delay in looking at this. I was able to reproduce it using the test code in Chrome, but also in Firefox and Safari. I am going to investigate it further, but it may get interrupted by other work again.
,
Apr 19 2017
I ported part of server.c to pywebsocket and adapted it to perform the described tests. I did the same to index.html, renamed to many-frame-sender.html. I didn't make any special effort to make these results stable, so they should be taken as very rough. Here are the results from Chrome 57: Case 100 x 1. Fails: 0 on server, 33 on client. min 21, max 132, mean 52.9, median 45 Case 100 x 1300. Fails: 0 on server, 35 on client. min 21, max 150, mean 52.5, median 40 Case 1 x 130000. Fails: 0 on server, 4 on client. min 21, max 29, mean 23.0, median 21 Case 1 x 65000. Fails: 0 on server, 3 on client. min 34, max 119, mean 69.0, median 54 Here are the results from Firefox 52: Case 100 x 1. Fails: 0 on server, 16 on client. min 22, max 82, mean 43.0, median 42 Case 100 x 1300. Fails: 0 on server, 10 on client. min 21, max 118, mean 61.2, median 64 Case 1 x 130000. Fails: 0 on server, 17 on client. min 22, max 93, mean 41.8, median 37 Case 1 x 65000. Fails: 0 on server, 9 on client. min 22, max 40, mean 30.1, median 28 Here are the results after switching to binaryType = "arraybuffer". The per-message overhead for Blob is high with Chrome WebSockets. Chrome (arraybuffer): Case 100 x 1. Fails: 0 on server, 0 on client. min 0, max 0, mean 0, median 0 Case 100 x 1300. Fails: 0 on server, 1 on client. min 33, max 33, mean 33.0, median 33 Case 1 x 130000. Fails: 0 on server, 17 on client. min 21, max 25, mean 21.6, median 21 Case 1 x 65000. Fails: 0 on server, 11 on client. min 21, max 38, mean 24.7, median 22 Firefox (arraybuffer): Case 100 x 1. Fails: 0 on server, 20 on client. min 21, max 73, mean 38.0, median 39 Case 100 x 1300. Fails: 0 on server, 10 on client. min 22, max 39, mean 30.7, median 30 Case 1 x 130000. Fails: 0 on server, 12 on client. min 23, max 70, mean 41.7, median 45 Case 1 x 65000. Fails: 0 on server, 9 on client. min 28, max 216, mean 64.9, median 37 Note: statistics only summarise the messages that were late, where late is defined as >0.02s since the previous message. Since 98% of messages arrive on time, the statistics do not represent the distribution of messages as a whole. My interpretation is that Chrome does better than Firefox except when there are many messages with binaryType "arraybuffer". It appears that if you attempt to try to receive messages at 60Hz, any browser will have some jank once every 2 or 3 seconds.
,
Apr 20 2017
Thanks for this. In my use case I noticed it when using NaCL websocket implementation. Not sure if its relevant to chromium but NaCL does not expose a distinction between setting arraybuffer on the websocket. I want to confirm the difference between NaCL and js websocket with arraybuffer set. Also if this is the case I want to try a chunked/open http request. I wonder if that studder is just the GC kicking in, and theres nothing we can do about it. Though using NaCL we avoid the javascript GC, and should be interoping directly with the WebSocket implementation of chromium. My first guess at the problem was the chromium eventloop, second would be GC. Any thoughts? If you think a fix is in sight I could try to get some traces starting from recv recv, then on critical functions as the websocket frame goes to onmessage/nacl functions. I would have some time next week or two to test all this.
,
Apr 20 2017
I have checked, and NaCL always uses the ArrayBuffer type. NaCL avoids the javascript GC, but the DOMArrayBuffer objects that are created still need to be collected by the Oilpan Blink GC. There's some other overhead incurred by NaCL: at least three extra copies of the message, and an extra process hop. Based on the difference between ArrayBuffer and Blob, GC seems the most likely culprit. HTTP in NaCL has probably seen more optimization work than WebSocket, so it may well be less janky to use a hanging GET request for high frequency, large transmissions.
,
Apr 25 2017
,
Jul 14 2017
,
Jan 15
We are experiencing something similar. I'm not sure if it helps, but this quick&dirty experiment reveals a similar problem: https://github.com/WonderMediaProductions/MotionJpegLatencyTest When running the experiment on a fast PC without any network, one would expect very low latency between the render-image requests send by the client and the image-response as an arraybuffer over websockets by the server. Indeed, running this with Microsoft EDGE on my Windows 10 PC, hardly any latency is noticed. Firefox has a bit of latency. In the canvas timeline at the top, one can observe that the render requests (gray bars) are immediately processed by the server after the requests are send from the client (green bars). However, when using Google Chrome <sub><sup>(I tested 71.0.3578.98 and 73.0.3672.0)</sup></sub>, latency is higher, and strange stuff is observed in the timeline... Notice how the server receives (gray bar) render requests **way to late**, while these are just very small websocket messages. It is as if Chrome is queuing the messages. Note that when not transmitting the image from the server, no delay is noticed in the timeline, as if Chrome's websocket is not full-duplex.
,
Jan 15
#15 This might also be related to issue 456476. If so, then moving the WebSocket to a Worker thread will probably help.
,
Jan 15
#16 Oh dear, I should have profiled the experiment, as Chrome seems to decode the JPEG on the main UI thread. I naively assumed that image decoding in browsers was always done on a dedicated thread. Instead of moving the websocket to a worker, I moved the image decoding to a pool of worker, using createImageBitmap. Now indeed the delay between request send and receive is much smaller and latency is more stable, but after a while Chrome crashes with "Aw, snap". So something fishy is still going on. Most likely I'm allocating too many bitmaps, I will try to use a pool of those, and see if that helps. Moving the websocket to a worker might help too. Thanks for the tip!
,
Jan 15
#17 Explicitly closing the transferred image bitmap solves this. Nevertheless strange that the images are not getting garbage collected. Thanks for the feedback. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by eroman@chromium.org
, Feb 15 2017