NACK'd packet retransmission is off by one
Reported by
gbrownew...@gmail.com,
May 9 2017
|
||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Safari/537.36 Steps to reproduce the problem: 1. Establish a session with VP8, nack, rtx 2. capture traffic long enough, and on a connection that drops packets occasionally, so that nacks are sent. 3. observe that the wrong packet is RTX'd in response to each NACK. Typically off by one, but sometimes RTX packets come in batches of very old packets, but always triggered by a NACK. What is the expected behavior? The correct packet is RTX'd in response to a NACK. What went wrong? NACK a missed packet. call it packet A. Later, NACK another missed packet, call it packet B. Packet A is RTX'd in response to packet B NACK. NACK packet C. packet B is RTX'd in response to packet C NACK. RTX'd packets are always off by one. Did this work before? N/A Does this work in other browsers? Yes Chrome version: 58.0.3029.96 Channel: stable OS Version: 10.0 Flash Version: Some examples of timings of the NACKS and RTX packets. I let the stream run for a very long time in order to capture these... [03:29:19.668]-->NACK 0x682BF27B/0xC2E472BD pid:14801 blp:0x0000 [03:29:19.683]<--RTX 0xC2E472BD/0x00000000 osn:2604 [03:50:16.718]-->NACK 0x682BF27B/0xC2E472BD pid:33680 blp:0x0000 [03:50:16.734]<--RTX 0xC2E472BD/0x00000000 osn:14801 [04:07:16.268]-->NACK 0x682BF27B/0xC2E472BD pid:48994 blp:0x0000 [04:07:16.268]<--RTX 0xC2E472BD/0x00000000 osn:33680 [06:48:45.357]-->NACK 0x682BF27B/0xC2E472BD pid:9197 blp:0x0000 [06:48:45.357]<--RTX 0xC2E472BD/0x00000000 osn:48994 [07:26:54.098]-->NACK 0x682BF27B/0xC2E472BD pid:31690 blp:0x0000 [07:26:54.098]<--RTX 0xC2E472BD/0x00000000 osn:9197 [07:40:20.147]-->NACK 0x682BF27B/0xC2E472BD pid:19759 blp:0x0000 [07:40:20.147]<--RTX 0xC2E472BD/0x00000000 osn:31690 [07:48:05.852]-->NACK 0x682BF27B/0xC2E472BD pid:50018 blp:0x0000 [07:48:05.867]<--RTX 0xC2E472BD/0x00000000 osn:19759 [08:43:05.095]-->NACK 0x682BF27B/0xC2E472BD pid:23467 blp:0x0000 [08:43:05.111]<--RTX 0xC2E472BD/0x00000000 osn:50018
,
May 9 2017
,
May 11 2017
Did some more testing with a native client build, couple days old on a much worse connection. This was about 1 minute of stream... [15:13:39.945]-->NACK 0xA483ACD0/0x6C76C25C pid:64670 blp:0x0001 [15:13:39.961]<--RTX 0x6C76C25C/0x00000000 osn:64406 [15:13:57.633]-->NACK 0xA483ACD0/0x6C76C25C pid:483 blp:0x0001 [15:13:57.758]-->NACK 0xA483ACD0/0x6C76C25C pid:498 blp:0x0000 [15:13:58.242]<--RTX 0x6C76C25C/0x00000000 osn:64670 [15:13:58.242]<--RTX 0x6C76C25C/0x00000000 osn:64671 [15:13:58.258]<--RTX 0x6C76C25C/0x00000000 osn:483 [15:13:58.258]<--RTX 0x6C76C25C/0x00000000 osn:484 [15:14:14.742]-->NACK 0xA483ACD0/0x6C76C25C pid:1834 blp:0x0000 [15:14:14.836]<--RTX 0x6C76C25C/0x00000000 osn:498 [15:14:16.305]-->NACK 0xA483ACD0/0x6C76C25C pid:1974 blp:0x0001 [15:14:16.320]<--RTX 0x6C76C25C/0x00000000 osn:1834 [15:14:30.742]-->NACK 0xA483ACD0/0x6C76C25C pid:3095 blp:0x0000 [15:14:30.758]<--RTX 0x6C76C25C/0x00000000 osn:1974 [15:14:30.758]<--RTX 0x6C76C25C/0x00000000 osn:1975
,
May 15 2017
This issue seems to be out of TE-scope. Hence, adding TE-NeedsTriageHelp for further investigation. Thanks...!!
,
Jun 20 2017
Ping for triage.
,
Jun 21 2017
Since this deals with the RTP level, changing component to "Video" and CCing holmer@/danilchap@.
,
Jun 21 2017
holmer, please assign as appropriate.
,
Aug 2 2017
Tried to reproduce it with video_loopback built from M58 - run it over night with 5% packet loss. Nack response seems still correct in the morning. Looking through code (between nack received and response rtx packet sent back) can't guess what could cause described behaviour. Are all nack responses off by 1 after some time or just some?
,
Aug 4 2017
I'll test again right away.
,
Aug 4 2017
Yes, its still happening, but perhaps its not accurate to say it is simply "off by one." Rather, a NACK is triggering a RTX of something, but not the NACK'd packet. Here is another chunk from a log for a stream that was experiencing big bursts of loss. Here is an example of how the retransmissions are way behind and being triggered by later NACKs.. [12:51:27.986]-->NACK 0x6A47A222/0xB9E47647 pid:3888 blp:0x0001 [12:51:30.314]-->NACK 0x6A47A222/0xB9E47647 pid:3923 blp:0x0000 [12:51:30.314]-->NACK 0x6A47A222/0xB9E47647 pid:3925 blp:0x0000 [12:51:30.392]<--RTX 0xB9E47647/0x00000000 osn:3888 3888 was NACK'd at 12:51:27, no RTX. About 3 seconds later, packets 3923 and 3925 were NACK'd and that seems to have triggered the RTX of 3888. A half second later, some more packets are NACK'd, but no RTX... [12:51:30.830]-->NACK 0x6A47A222/0xB9E47647 pid:3928 blp:0x000F [12:51:30.830]-->NACK 0x6A47A222/0xB9E47647 pid:3934 blp:0x0007 ...then 20 seconds later 13506 is NACK'd and that appears to trigger the RTX of 3923, which was NACK'd back at 12:15:30.314 (35 seconds ago). [12:54:50.472]-->NACK 0x6A47A222/0xB9E47647 pid:13506 blp:0x0000 [12:54:50.488]<--RTX 0xB9E47647/0x00000000 osn:3923 6 seconds later 13811-13814 are NACK'd and that appears to trigger the RTX of 13506, which was NACK'd almost 6 seconds ago. [12:54:56.113]-->NACK 0x6A47A222/0xB9E47647 pid:13811 blp:0x0003 [12:54:56.129]<--RTX 0xB9E47647/0x00000000 osn:13506
,
Aug 5 2017
These logs are from a session with Chrome (60.0.3112.90) With almost no loss, just an occasional packet here and there, the behavior was quite strange. 60771, 39095 and 58928 were all NACK'd with no retransmissions, then when 46821 is NACK'd over 30 minutes after 60771, retransmissions of 60771, 39095 and 58928 arrive. [17:24:21.823]-->NACK 0xE92C3ABD/0x955AC1D2 pid:60771 blp:0x0000 [17:41:47.778]-->NACK 0xE92C3ABD/0x955AC1D2 pid:39095 blp:0x0000 [17:43:13.841]-->NACK 0xE92C3ABD/0x955AC1D2 pid:58928 blp:0x0000 [17:56:35.968]-->NACK 0xE92C3ABD/0x955AC1D2 pid:46821 blp:0x0000 [17:56:36.015]<--RTX 0x955AC1D2/0x00000000 osn:60771 [17:56:36.015]<--RTX 0x955AC1D2/0x00000000 osn:39095 [17:56:36.015]<--RTX 0x955AC1D2/0x00000000 osn:58928 25 minutes later, 57336 is NACK'd and Chrome retransmits 46821, which was NACK'd 25 minutes ago. [18:21:06.284]-->NACK 0xE92C3ABD/0x955AC1D2 pid:57336 blp:0x0000 [18:21:06.284]<--RTX 0x955AC1D2/0x00000000 osn:46821 20 minutes later another NACK triggers an RTX of another old packet. [18:40:34.115]-->NACK 0xE92C3ABD/0x955AC1D2 pid:63820 blp:0x0000 [18:40:34.130]<--RTX 0x955AC1D2/0x00000000 osn:57336 Is it possible that your test had far too high of loss such that you are just getting so many RTX packets, it not obvious that NACKs are triggering RTX of the wrong packets?
,
Aug 5 2017
Also, to answer danilchap's question, all RTX packets are "off", usually by 1 packet like the examples above, except for the cases where it seems to "save" several and then RTX them all at once.
,
Aug 8 2017
Closer inspection revealed that the strange behavior only occurred when the NACKs were sent too long after the packets should have been received. In some cases as much as 250ms was passing before a NACK was sent. Changing that part so that they are NACKd much sooner eliminates the problem completely. [20:31:06.112]-->NACK 0xAA7A5584/0xFDDC2E69 pid:52612 blp:0x0007 [20:31:06.127]<--RTX 0xFDDC2E69/0x00000000 osn:52612 [20:31:06.127]<--RTX 0xFDDC2E69/0x00000000 osn:52613 [20:31:06.127]<--RTX 0xFDDC2E69/0x00000000 osn:52614 [20:31:06.127]<--RTX 0xFDDC2E69/0x00000000 osn:52615 [20:31:10.346]-->NACK 0xAA7A5584/0xFDDC2E69 pid:53485 blp:0x0000 [20:31:10.361]<--RTX 0xFDDC2E69/0x00000000 osn:53485 [20:31:35.815]-->NACK 0xAA7A5584/0xFDDC2E69 pid:59015 blp:0x0003 [20:31:35.830]<--RTX 0xFDDC2E69/0x00000000 osn:59015 [20:31:35.830]<--RTX 0xFDDC2E69/0x00000000 osn:59016 [20:31:35.830]<--RTX 0xFDDC2E69/0x00000000 osn:59017 [20:33:56.518]-->NACK 0xAA7A5584/0xFDDC2E69 pid:24007 blp:0x0000 [20:33:56.534]<--RTX 0xFDDC2E69/0x00000000 osn:24007 [20:34:09.674]-->NACK 0xAA7A5584/0xFDDC2E69 pid:26850 blp:0x0000 [20:34:09.690]<--RTX 0xFDDC2E69/0x00000000 osn:26850
,
Aug 8 2017
Still looking at this trying to figure out what is going on. On occasion, I see this behavior with the first few NACKs, so far every time I've captured this, it was just like below with the 4th NACK triggering retransmission of all 4 previously NACKd packets. But, that pattern of 4 may be coincidence as I've only capture this 3 times since the changes to send the NACKs much sooner. [09:43:44.629]-->NACK 0xF28581A0/0x2B4D0F26 pid:39323 blp:0x0000 [09:51:40.912]-->NACK 0xF28581A0/0x2B4D0F26 pid:8481 blp:0x0000 [09:52:28.584]-->NACK 0xF28581A0/0x2B4D0F26 pid:11861 blp:0x0000 [09:58:15.522]-->NACK 0xF28581A0/0x2B4D0F26 pid:37045 blp:0x0000 [09:58:15.538]<--RTX 0x2B4D0F26/0x00000000 osn:39323 [09:58:15.538]<--RTX 0x2B4D0F26/0x00000000 osn:8481 [09:58:15.538]<--RTX 0x2B4D0F26/0x00000000 osn:11861 [09:58:15.538]<--RTX 0x2B4D0F26/0x00000000 osn:37045 After this occurs, every subsequent NACK/RTX is correct. This does not happen every time. Sometimes the stream is correct all the way through.
,
Aug 8 2017
After fixing the problem with the NACKs being sent way to late, the next problem was that the RTX stream was on "probation" until a minimum number of packets were received. This is, of course, not appropriate for the RTX stream. Thanks for the assistance! All is working as expected now.
,
Aug 9 2017
Does this mean that there's nothing to fix here and that the bug can be closed?
,
Aug 9 2017
Yes, it should be closed. I think there is nothing to fix since the strange behavior can only be triggered if NACKs are sent too long after the packet should have arrived, which is not useful anyway.
,
Aug 9 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by manoranj...@chromium.org
, May 9 2017