New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719980 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

NACK'd packet retransmission is off by one

Reported by gbrownew...@gmail.com, May 9 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Safari/537.36

Steps to reproduce the problem:
1. Establish a session with VP8, nack, rtx
2. capture traffic long enough, and on a connection that drops packets occasionally, so that nacks are sent.
3. observe that the wrong packet is RTX'd in response to each NACK. Typically off by one, but sometimes RTX packets come in batches of very old packets, but always triggered by a NACK.

What is the expected behavior?
The correct packet is RTX'd in response to a NACK.

What went wrong?
NACK a missed packet. call it packet A.
Later, NACK another missed packet, call it packet B.
Packet A is RTX'd in response to packet B NACK.
NACK packet C.
packet B is RTX'd in response to packet C NACK.
RTX'd packets are always off by one.

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 58.0.3029.96  Channel: stable
OS Version: 10.0
Flash Version: 

Some examples of timings of the NACKS and RTX packets. I let the stream run for a very long time in order to capture these...
[03:29:19.668]-->NACK 0x682BF27B/0xC2E472BD pid:14801 blp:0x0000
[03:29:19.683]<--RTX  0xC2E472BD/0x00000000 osn:2604
[03:50:16.718]-->NACK 0x682BF27B/0xC2E472BD pid:33680 blp:0x0000
[03:50:16.734]<--RTX  0xC2E472BD/0x00000000 osn:14801
[04:07:16.268]-->NACK 0x682BF27B/0xC2E472BD pid:48994 blp:0x0000
[04:07:16.268]<--RTX  0xC2E472BD/0x00000000 osn:33680
[06:48:45.357]-->NACK 0x682BF27B/0xC2E472BD pid:9197 blp:0x0000
[06:48:45.357]<--RTX  0xC2E472BD/0x00000000 osn:48994
[07:26:54.098]-->NACK 0x682BF27B/0xC2E472BD pid:31690 blp:0x0000
[07:26:54.098]<--RTX  0xC2E472BD/0x00000000 osn:9197
[07:40:20.147]-->NACK 0x682BF27B/0xC2E472BD pid:19759 blp:0x0000
[07:40:20.147]<--RTX  0xC2E472BD/0x00000000 osn:31690
[07:48:05.852]-->NACK 0x682BF27B/0xC2E472BD pid:50018 blp:0x0000
[07:48:05.867]<--RTX  0xC2E472BD/0x00000000 osn:19759
[08:43:05.095]-->NACK 0x682BF27B/0xC2E472BD pid:23467 blp:0x0000
[08:43:05.111]<--RTX  0xC2E472BD/0x00000000 osn:50018
 
Labels: Needs-Triage-M58
Components: -Blink>WebRTC Blink>WebRTC>Network
Did some more testing with a native client build, couple days old on a much worse connection. This was about 1 minute of stream...
[15:13:39.945]-->NACK 0xA483ACD0/0x6C76C25C pid:64670 blp:0x0001
[15:13:39.961]<--RTX  0x6C76C25C/0x00000000 osn:64406
[15:13:57.633]-->NACK 0xA483ACD0/0x6C76C25C pid:483 blp:0x0001
[15:13:57.758]-->NACK 0xA483ACD0/0x6C76C25C pid:498 blp:0x0000
[15:13:58.242]<--RTX  0x6C76C25C/0x00000000 osn:64670
[15:13:58.242]<--RTX  0x6C76C25C/0x00000000 osn:64671
[15:13:58.258]<--RTX  0x6C76C25C/0x00000000 osn:483
[15:13:58.258]<--RTX  0x6C76C25C/0x00000000 osn:484
[15:14:14.742]-->NACK 0xA483ACD0/0x6C76C25C pid:1834 blp:0x0000
[15:14:14.836]<--RTX  0x6C76C25C/0x00000000 osn:498
[15:14:16.305]-->NACK 0xA483ACD0/0x6C76C25C pid:1974 blp:0x0001
[15:14:16.320]<--RTX  0x6C76C25C/0x00000000 osn:1834
[15:14:30.742]-->NACK 0xA483ACD0/0x6C76C25C pid:3095 blp:0x0000
[15:14:30.758]<--RTX  0x6C76C25C/0x00000000 osn:1974
[15:14:30.758]<--RTX  0x6C76C25C/0x00000000 osn:1975

Labels: TE-NeedsTriageHelp
This issue seems to be out of TE-scope. Hence, adding TE-NeedsTriageHelp for further investigation.

Thanks...!!
Ping for triage.
Cc: danilchap@chromium.org holmer@chromium.org
Components: -Blink>WebRTC>Network Blink>WebRTC>Video
Since this deals with the RTP level, changing component to "Video" and CCing holmer@/danilchap@.
Owner: holmer@chromium.org
Status: Assigned (was: Unconfirmed)
holmer, please assign as appropriate.
Tried to reproduce it with video_loopback built from M58 - run it over night with 5% packet loss. Nack response seems still correct in the morning.
Looking through code (between nack received and response rtx packet sent back) can't guess what could cause described behaviour.

Are all nack responses off by 1 after some time or just some?

I'll test again right away.
Yes, its still happening, but perhaps its not accurate to say it is simply "off by one." Rather, a NACK is triggering a RTX of something, but not the NACK'd packet. Here is another chunk from a log for a stream that was experiencing big bursts of loss.

Here is an example of how the retransmissions are way behind and being triggered by later NACKs..

[12:51:27.986]-->NACK 0x6A47A222/0xB9E47647 pid:3888 blp:0x0001
[12:51:30.314]-->NACK 0x6A47A222/0xB9E47647 pid:3923 blp:0x0000
[12:51:30.314]-->NACK 0x6A47A222/0xB9E47647 pid:3925 blp:0x0000
[12:51:30.392]<--RTX  0xB9E47647/0x00000000 osn:3888

3888 was NACK'd at 12:51:27, no RTX. About 3 seconds later, packets 3923 and 3925 were NACK'd and that seems to have triggered the RTX of 3888.

A half second later, some more packets are NACK'd, but no RTX...

[12:51:30.830]-->NACK 0x6A47A222/0xB9E47647 pid:3928 blp:0x000F
[12:51:30.830]-->NACK 0x6A47A222/0xB9E47647 pid:3934 blp:0x0007

...then 20 seconds later 13506 is NACK'd and that appears to trigger the RTX of 3923, which was NACK'd back at 12:15:30.314 (35 seconds ago).

[12:54:50.472]-->NACK 0x6A47A222/0xB9E47647 pid:13506 blp:0x0000
[12:54:50.488]<--RTX  0xB9E47647/0x00000000 osn:3923

6 seconds later 13811-13814 are NACK'd and that appears to trigger the RTX of  13506, which was NACK'd almost 6 seconds ago.

[12:54:56.113]-->NACK 0x6A47A222/0xB9E47647 pid:13811 blp:0x0003
[12:54:56.129]<--RTX  0xB9E47647/0x00000000 osn:13506

These logs are from a session with Chrome (60.0.3112.90)

With almost no loss, just an occasional packet here and there, the behavior was quite strange.

60771, 39095 and 58928 were all NACK'd with no retransmissions, then when 46821 is NACK'd over 30 minutes after 60771, retransmissions of 60771, 39095 and 58928 arrive.

[17:24:21.823]-->NACK 0xE92C3ABD/0x955AC1D2 pid:60771 blp:0x0000
[17:41:47.778]-->NACK 0xE92C3ABD/0x955AC1D2 pid:39095 blp:0x0000
[17:43:13.841]-->NACK 0xE92C3ABD/0x955AC1D2 pid:58928 blp:0x0000
[17:56:35.968]-->NACK 0xE92C3ABD/0x955AC1D2 pid:46821 blp:0x0000
[17:56:36.015]<--RTX  0x955AC1D2/0x00000000 osn:60771
[17:56:36.015]<--RTX  0x955AC1D2/0x00000000 osn:39095
[17:56:36.015]<--RTX  0x955AC1D2/0x00000000 osn:58928

25 minutes later, 57336 is NACK'd and Chrome retransmits 46821, which was NACK'd 25 minutes ago.

[18:21:06.284]-->NACK 0xE92C3ABD/0x955AC1D2 pid:57336 blp:0x0000
[18:21:06.284]<--RTX  0x955AC1D2/0x00000000 osn:46821

20 minutes later another NACK triggers an RTX of another old packet.

[18:40:34.115]-->NACK 0xE92C3ABD/0x955AC1D2 pid:63820 blp:0x0000
[18:40:34.130]<--RTX  0x955AC1D2/0x00000000 osn:57336

Is it possible that your test had far too high of loss such that you are just getting so many RTX packets, it not obvious that NACKs are triggering RTX of the wrong packets?
Also, to answer danilchap's question, all RTX packets are "off", usually by 1 packet like the examples above, except for the cases where it seems to "save" several and then RTX them all at once.
Closer inspection revealed that the strange behavior only occurred when the NACKs were sent too long after the packets should have been received. In some cases as much as 250ms was passing before a NACK was sent. Changing that part so that they are NACKd much sooner eliminates the problem completely.

[20:31:06.112]-->NACK 0xAA7A5584/0xFDDC2E69 pid:52612 blp:0x0007
[20:31:06.127]<--RTX  0xFDDC2E69/0x00000000 osn:52612
[20:31:06.127]<--RTX  0xFDDC2E69/0x00000000 osn:52613
[20:31:06.127]<--RTX  0xFDDC2E69/0x00000000 osn:52614
[20:31:06.127]<--RTX  0xFDDC2E69/0x00000000 osn:52615

[20:31:10.346]-->NACK 0xAA7A5584/0xFDDC2E69 pid:53485 blp:0x0000
[20:31:10.361]<--RTX  0xFDDC2E69/0x00000000 osn:53485

[20:31:35.815]-->NACK 0xAA7A5584/0xFDDC2E69 pid:59015 blp:0x0003
[20:31:35.830]<--RTX  0xFDDC2E69/0x00000000 osn:59015
[20:31:35.830]<--RTX  0xFDDC2E69/0x00000000 osn:59016
[20:31:35.830]<--RTX  0xFDDC2E69/0x00000000 osn:59017

[20:33:56.518]-->NACK 0xAA7A5584/0xFDDC2E69 pid:24007 blp:0x0000
[20:33:56.534]<--RTX  0xFDDC2E69/0x00000000 osn:24007

[20:34:09.674]-->NACK 0xAA7A5584/0xFDDC2E69 pid:26850 blp:0x0000
[20:34:09.690]<--RTX  0xFDDC2E69/0x00000000 osn:26850

Still looking at this trying to figure out what is going on. On occasion, I see this behavior with the first few NACKs, so far every time I've captured this, it was just like below with the 4th NACK triggering retransmission of all 4 previously NACKd packets. But, that pattern of 4 may be coincidence as I've only capture this 3 times since the changes to send the NACKs much sooner.

[09:43:44.629]-->NACK 0xF28581A0/0x2B4D0F26 pid:39323 blp:0x0000
[09:51:40.912]-->NACK 0xF28581A0/0x2B4D0F26 pid:8481  blp:0x0000
[09:52:28.584]-->NACK 0xF28581A0/0x2B4D0F26 pid:11861 blp:0x0000
[09:58:15.522]-->NACK 0xF28581A0/0x2B4D0F26 pid:37045 blp:0x0000
[09:58:15.538]<--RTX  0x2B4D0F26/0x00000000 osn:39323
[09:58:15.538]<--RTX  0x2B4D0F26/0x00000000 osn:8481 
[09:58:15.538]<--RTX  0x2B4D0F26/0x00000000 osn:11861
[09:58:15.538]<--RTX  0x2B4D0F26/0x00000000 osn:37045

After this occurs, every subsequent NACK/RTX is correct. This does not happen every time. Sometimes the stream is correct all the way through.
After fixing the problem with the NACKs being sent way to late, the next problem was that the RTX stream was on "probation" until a minimum number of packets were received. This is, of course, not appropriate for the RTX stream.

Thanks for the assistance! All is working as expected now.
Does this mean that there's nothing to fix here and that the bug can be closed?
Yes, it should be closed. I think there is nothing to fix since the strange behavior can only be triggered if NACKs are sent too long after the packet should have arrived, which is not useful anyway.
Status: WontFix (was: Assigned)

Sign in to add a comment