New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 618412 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

webrtc video can be delayed 5-10 seconds on delayed/lantent network

Reported by dmcletc...@gmail.com, Jun 8 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36

Steps to reproduce the problem:
1. Run a network bandwidth restricting tool (I'm using wanem) or run on a slow network (1 Mbit or less)
2. place calls browser to browser using apprtc
3. observe that sometimes, there is a lot of delay on video stream.

What is the expected behavior?
I think that the design is that the video quality should degrade, either in frame rate or resolution, but it should still be displayed in a timely manor.

What went wrong?
Once the video is delayed more than a few seconds, it's perceived as broken.  I have seen over 10 seconds of delay in testing.  I think that the video data is present & ready to play - but chrome is intentionally adding seconds of delay to the stream - which is incorrect.

Did this work before? N/A 

Chrome version: 51.0.2704.84  Channel: stable
OS Version: 10.0
Flash Version: Shockwave Flash 21.0 r0

I'm using the tool (wanem) to simulate & reproduce issues that we are experiencing.

I have tried looking into this a bit.  I think that the video delay is added intentionally.  I think that the video delay is calculated using the audio delay, which is calculated from the audio jitter buffer.

Could the audio jitter buffer be too big?  Is there a way to limit it so that it can't grow beyond 75 or 100 ms so that it will act like an IP phone? (just as a workaround)

I attached a screen shot of the internals screen shows the googTargetDelayMs holding steady around 3,000 ms.
 
Capture.PNG
196 KB View Download
Components: Blink>WebRTC>Network
Cc: juberti@chromium.org
Cc: mflodman@chromium.org
Owner: holmer@chromium.org
Status: Assigned (was: Unconfirmed)
Can you share the wanem settings that you used?
TLDR - I'm 90% sure that the setting that I used for the above screen grab was bandwidth restricted to 700kbps & that is it.

(It may have been at 1000 kbps.  I'm pretty sure that I did not have setting for delay/jitter/loss at that time. I was enabling delay at 20 ms, jitter at 20 ms, and loss at 2% to try to narrow down what was causing the video delay.  but i'm pretty sure that it was only a bandwidth restriction at the time of this screen grab.  Sorry for the long story)

This morning I just set the bandwidth to 1000 kbps, and on one end of the apprtc call the target delay spiked to 3,000ms and on the other end it's plateauing around 750-1,000 ms.  I'll attach a screen capture of this one since I'm 100% sure of the wanem settings.

My setup for testing is to use wanem as a one-armed-router, hosted in virtual box on my desktop.  So, it might have added jitter and or delay because of the environment.  Here's some terrible ascii art:

                    .---------.
              .---> | Desktop |
              |     |         |
              |     |.-------.|
.--------.    '---> || vbox  ||
| laptop | <------> || wanem ||
'--------'          |'-------'| 
                    '---------'

where chrome is on the laptop & the Desktop.

Let me know if there's any more info that I can provide, or if you want me to push it back up the hill & test something again.  I can try with canary if you want.

Again - sorry for the long story - I don't want to miss something that might be important.
Capture.PNG
253 KB View Download
Thanks for reporting!

When reproducing this, could you also capture an rtc event log (http://www.tokbox.com/blog/how-to-get-a-webrtc-diagnostic-recording-from-chrome-49/)? That would provide us with more information.
Here is a capture from another call, same setup with wanem at 1,000 kbps.  The delay is up over 4 seconds.  The audio is not as delayed as the video (they are not in sync).  This call has been up for about 45 minutes hour.  The ice state disconnected/reconnected many times (maybe 20 times) but has not failed.
Capture2.PNG
248 KB View Download
Sure - I'll make another call with an event log.
It looks like this call started out at 3,000 ms of delay...
webrtc_internals_dump (1).zip
20.8 KB Download
Capture3.PNG
179 KB View Download

Comment 9 by holmer@chromium.org, Jun 10 2016

The attached dump is actually not an rtc event log, but a webrtc internals dump. Still useful, but not as useful as an rtc event log. See the link in #5 for instructions of how to record an rtc event log.
Looking at the webrtc internals dump (using https://fippo.github.io/webrtc-dump-importer/ to visualize it), I can see the following:

- The inactive candidates are reporting 3s RTT, which is just their way of saying that they don't have an estimated RTT.

- The true RTT is pretty high from 4 - 14 seconds into the call, but then it recovers and we're able to keep it low for the rest of the call.

- The video jitter buffer seems to suffer from the high RTT, and takes a long time to recover. This is where we can improve.

If we have an rtc event log we should be able to easily debug this issue.
I'm having a problem with the event recording.
if I:
1  go to chrome://webrtc-internals/
2  I select "Enable diagnostic packet and event recording".
3  in the pop-up, I select a new file
4  I make a call with apprtc
5  wait a few minutes
6  hang up the apprtc call
then I do not get any output file.  I've tried with a new file & an existing file.  Sorry to be so thick ... I just don't see the data.

I repeated the process & have a chrome_debug.log - maybe you can see why the event log didn't get recorded?  I'll try again by running chrome as an administrator, and maybe with canary.  Maybe it's just a file permissions thing.
chrome_debug.zip
153 KB Download
If I enable both the audio diagnostic recording & the diagnostic packet and event recording, then I get 2 files:
audio_debug.7764.aec_dump.3 and audio_debug.7764.source_input.3.wav

So I don't think that it's a permissions thing ... I think that something is wrong with the event recording.
I have an event log.  I have to start the call, and then select "Enable diagnostic packet and event recording".  Maybe I am not reading the directions correctly... 
Sorry for all of the comments on the ticket.  Thanks for working on this - we really appreciate all that you guys are doing.
Capture.PNG
212 KB View Download
event_log.1948.event_log.zip
758 KB Download
Cc: holmer@chromium.org
Owner: terelius@chromium.org
terelius, I was unable to plot the contents of the event log in #13. Could you look into what might wrong?
Components: -Blink>WebRTC>Network Blink>WebRTC>Video
The event log starts with the bytes 0a 00 which signifies a length delimited protobuf event of length 0. This would correspond to an empty event with no type, no timestamp and no data. This triggers an assertion when parsing the file, since type and timestamp are required fields in all events.

The log file looks reasonable apart from the first two bytes, so I'll attach the output of tail -c +3 event_log.1948.event_log.2
event_log.1948.event_log.2.repaired
2.1 MB Download
The latency (relative to the first packet) is very interesting. It seems to "oscillate" with increasing amplitude and a period of ~15 seconds until the peak 2750 ms latency. Given this packet delay, it is not surprising that the jitter buffer size and target delay increases to ~3 seconds. I would have expected the BWE to back off faster/more though.

dmcletchie, do you know if there is a maximum queue size in wanem? is it reproducible on a real network, or does it only happen with the emulator?
latency.png
47.7 KB View Download
inc_bitrate.png
70.3 KB View Download
Thanks for working on this.  I'm not 100% sure how wanem works.  It does have a configuration value for packet limit.  this defaults to 1000 & I left it at that.  This is what the documentation says:
"Most network devices like routers and switches maintain a queue for each interface to forward packets. When the queue exceeds it limit packets are dropped. The entire netword can be considered as one network device with one such queue. This is what the “Limit” input field in the GUI emulates. The limit is specified in number of packets. If the output forwarding queue for the selected interface exceeds this limit then WANem drops packets. By default it will be set to 1000 bytes. If you would not want to limit packets this way then set this to a large number say 99999999. "  (their typo on netword)

So - I think that yes - there is a maximum queue size, and it's 1000. 

I have seen / received reports of delayed video streams on real networks.  The networks are usually wi-fi, since our use case involves tablets.  That's why I started playing with wanem, to try to recreate/ reproduce / debug the issue.  I'm not 100% sure if it is the same issue though - that's tough to say for sure.  I do capture & save some statistics in log files & I have seen the target delay over 1000 ms on a wi-fi LAN network.

I really don't understand why a jitter buffer that can grow to 3 seconds though... for real time communications, isn't it better to drop the latent data?  If there were a secret parameter, i'd probably want to set the max jitter buffer to 100 or 200 ms.

Do you know another way to emulate restricted bandwidth for webrtc?  I don't think that the network throttling in the chrome developer tools works with webrtc. 

Again - thanks a ton for working on this!
I just found & used a tool called netlimiter.  I restricted the bandwidth to 1000 Kbps (they use bytes, so it's 125 KBps).  I believe that I had the same results as earlier.  Here's another screen grab of the internals, and an event log (I had the same problem with the event log - I have to place the call, and then start it.  I hope that does not mess things up.)
I hope that this helps.
--Doug
event_log.5944.event_log.2
590 KB Download
Capture.PNG
171 KB View Download
Thanks for the report.

Starting the log after you start the call is fine, and it will capture the entire call including the beginning as long as you start the log within ~10 seconds. I still need to remove the first two bytes from the file though, and I don't understand why.
Cc: philipel@chromium.org
Concerning the size of the jitter buffer: To ensure smooth playback despite packet loss, the jitter buffer needs to store the packets for up to one RTT. (The RTT is used to send a NACK and then wait for the retransmitted packet). Even under optimal conditions it takes light about 300 ms to travel to the opposite point on the earth and back again, so capping the jitter buffer to 200 ms would not work in general.

In your case, the packets are actually delayed for 3 seconds on the network, so the large target delay seems to working as intended.

As far as I can tell, there are two problems here:
1) The sender seems to send at a too high bitrate despite the increasing latency. (The packets also also arrive in a very bursty pattern, but that might be caused by the network emulator.) I will continue to look into this.
2) Once the target delay has increased, it decreases very slowly. Philip, do you know if the target delay could be reduced faster when the RTT is low?
I've been able to reproduce a similar oscillating latency, but with a much lower link capacity. I will continue to investigate this next week.

Sign in to add a comment