New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 754748 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Chrome crashing frequently with WebRTC sessions

Reported by gbrownew...@gmail.com, Aug 11 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. WebRTC session VP8/Opus
2. Eventually Chrome crashes, sometimes 10 seconds in, sometimes 10 minutes. Seems to happen quicker if there is some packet loss. 

What is the expected behavior?
Chrome doesn't crash

What went wrong?
Some machines get these crashes and others never do. Those that crash will crash quite frequently. All crashes show similar entries at the end of the debug log...

[42688:5708:0811/102015.306:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=1
[42688:5708:0811/102015.326:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=2
[42688:5708:0811/102015.347:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=3
[42688:5708:0811/102015.367:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=4
[42688:5708:0811/102015.387:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=5
[42688:5708:0811/102015.407:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=6
[42688:5708:0811/102015.421:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=7
[42688:5708:0811/102015.421:WARNING:audio_sync_reader.cc(185)] ASR: No room in socket buffer.: The pipe is being closed. (0xE8)
[42688:5708:0811/102015.421:WARNING:audio_sync_reader.cc(202)] AudioSyncReader::Read timed out, audio glitch count=8
[42688:23408:0811/102015.429:INFO:user_input_monitor_win.cc(157)] RegisterRawInputDevices() failed for RIDEV_REMOVE: The parameter is incorrect. (0x57)

Did this work before? N/A 

Is it a problem with Flash or HTML5? HTML5

Does this work in other browsers? No
 Chrome 59 win and mac, Firefox 54 win and mac

Chrome version: 60.0.3112.90  Channel: stable
OS Version: 10.0
Flash Version: 

Contents of chrome://gpu:
 
Components: Blink>WebRTC
Labels: Needs-Triage-M60
gbrownewell@, thank you for the report. Do you have any sample test case to reproduce this?
Sir, please forgive my ignorance. What is a "test case?" I will provide anything I can. I can provide pcaps of the session, though the media will be SRTP encrypted so not likely to be helpful. I can also provide hex dumps of the entire sessions or media streams after decryption. I can provide anything you need.

For context, Chrome is one end of a session with our MCU. Video stream is libvpx encoding and audio stream is libopus encoding. I am in the process of moving everything to the newest versions of both libraries. The current system displaying the problem is using versions of those 2 libraries dated March 12, 1017.

Can you confirm that the crash is likely related to the audio stream being sent to Chrome as the logs seem to suggest?

opus is negotiated in the sdp with offer from Chrome...

a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1

and answer...

a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=20;useinbandfec=1


Forcing PCMU seems to eliminate the problem but PCMU is not an acceptable quality. I really must have Opus, nothing else is good enough.
2017, of course, not 1017 :)

Comment 4 by guidou@chromium.org, Aug 14 2017

Components: -Internals>Media -Blink>WebRTC Blink>WebRTC>Network Blink>WebRTC>Video
Can you provide crash IDs for the problematic sessions?
You can find them in chrome://crashes

Comment 5 by guidou@chromium.org, Aug 14 2017

Components: -Blink>WebRTC>Video -Blink>WebRTC>Network Blink>WebRTC

Comment 6 by guidou@chromium.org, Aug 14 2017

Labels: Needs-Feedback
These 2 are from 60.0.3112.90 running on a macbook air. Captured Friday but uploaded today.

98b46cc5-1917-4a6f-a5b9-859032090251
fb22325e-0316-4dea-92d6-c61e1d38a08a

I will gather the others quickly.
Project Member

Comment 8 by sheriffbot@chromium.org, Aug 14 2017

Cc: guidou@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "guidou@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 9 by guidou@chromium.org, Aug 14 2017

Labels: Needs-Feedback
Those crash IDs look like local crash IDs.
Please send the crash IDs after they are reported as uploaded in chrome://crashes
The first two are the two I posted a few minutes ago. I see they get another ID after uploading so I included them again with the uploaded crash ID.

macbook air 60.0.3112.90
4596ecd1da92f427 (Local Crash ID: 98b46cc5-1917-4a6f-a5b9-859032090251)
3ea212111b2a9844 (Local Crash ID: fb22325e-0316-4dea-92d6-c61e1d38a08a)

win10 60.0.3112.90
b364dec74f5153db (de919c9c-47ae-403e-9c1e-d4e59b09a68c)
dd52025d0ca1c145 (0f932296-c5a3-4b61-a031-64bfd7afb3b1)
bac2e74df40bd5e9 (b74f1c66-5797-418f-bc5b-976d88246a9e)
e729ccc9d68f9c29 (1e2c7bb3-550c-463c-9f26-9b1eb5e1697e)
dff2273bf172d3c1 (985c91af-5810-4dc8-8fdb-2a8885b2d837)
7d8733e7fca9e774 (241064a2-9450-45d9-b53c-6bc600b1e5de)

I have several more on the way. Do you need the just the uploaded crash ID, or both values?

Project Member

Comment 11 by sheriffbot@chromium.org, Aug 14 2017

Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "guidou@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
gbrownewell@: we need only the uploaded crash IDs.
Another from mac 60.0.3112.90
0f13de102ada78ae
Cc: ste...@webtrc.org
Labels: -Type-Bug -Pri-2 M-60 Pri-1 Type-Bug-Regression
Owner: holmer@chromium.org
Status: Assigned (was: Unconfirmed)
Stefan, Can you take a look at this?

All the crashes reported by gbrownewell@ point to something related to NTP/time issues, with webrtc::RemoteNtpTimeEstimator::Estimate

Since the problems apparently started with Chrome 60, the most suspect CL is
https://codereview.webrtc.org/2963133003/
In the comment above I meant "with webrtc::RemoteNtpTimeEstimator::Estimate appearing frequently in the reports".
I've been recreating this as often as possible, trying to find a pattern. While it is certainly inconsistent, the crash seems to happen much more often right after the user begins talking after a period of silence.

Also, I can upload more crash reports and IDs if that is useful. Should I keep generating them and provide the IDs here?
I think new reports will look similar to the existing ones, and the ones you have already uploaded point to a specific part of the code. 
Maybe it's better to wait to see if holmer@ needs more feedback from you.
Guidou, what makes you think it started with M60? I seem to be able to find crashes at least as early as M58. I don't find many examples though, not in M60 or earlier.

gbrownewell, you seem to be able to consistently reproduce this. Could you share your repro steps? It may be possible to record an unencrypted pcap if you start Chrome with the flag --disable-webrtc-encryption. We might be able to use that to reproduce the problem, but it could also be a race condition causing this.
Note that --disable-webrtc-encryption only works on Chrome Canary and Dev channels.
Labels: Needs-Feedback
holmer@: what makes me think it started with M60 is that I understood gbrownewell to mean that it is easy to reproduce the crash on 60 but not on 59 (or Firefox).

gbrownewell@, can you confirm this?

Sirs, I am being told that it has been happening since version 58 though I do not have access to any of the crash information for those incidents. Also, we have seen firefox crash rarely but I can't confirm its related.

I can reproduce it frequently is because there are a couple machines here that are prone to it. The vast majority never experience the issue. One of them is a new macbook pro and the other is a win10 laptop. Strangely, the win10 machine is a model that we have dozens of and the others don't seem affected to the same degree, with the users reporting that they may have seen "aw snap" once or twice in the last 6 months.

I will work on getting that pcap.
gbrownewell@: In order to discard that crashes are more frequent in M60 and discard https://codereview.webrtc.org/2963133003/ as culprit, can you reproduce the crashes in M59 on the problematic machines as easily as with M60?

Labels: -Pri-1 -Type-Bug-Regression Pri-2 Type-Bug
Owner: ----
Status: Untriaged (was: Assigned)
gbrownewell@: we have found several other similar crash reports from 59 and 58, so no need for you to try to reproduce with older versions.
Marking this bug as Untriaged again while we continue to try to find the cause.

Also, changing priority to 2 since it is not a regression in 60, but at least as old as 58.
Labels: -Needs-Feedback
Owner: holmer@chromium.org
Status: Assigned (was: Untriaged)
holmer@ will continue the investigation since he is more familiar with that part of the code.

Comment 25 by wonz...@gmail.com, Aug 18 2017

"AW.Snap!" will pop-up when received a MMS during play a streaming video. Android platform.
Getting an unencrypted packet capture is proving quite challenging, primarily because I do not have full-time access to the problematic machines. I expect to have one of them in my possession soon.

Any good news to report?
This issue is shows that the OS is "Windows", but I can assure you it is happening on Mac, in fact, it may be more prevalent on Mac. I have 1 Windows machine that is problematic, but now that I have the whole company on the lookout for these, it seems to be worse on Mac.

Here are 3 crashes from today, minutes apart, from one of the problematic Macs...
0241b8c77acd933d
c98a897875074bb2
77a3014ba17541a5


Another set of crashes from the same session on a different Mac. I'm being told that one of the participants had over 5% packet loss on audio and video streams.

30ffb499165a9154
9953924fc5e37d50
14a1a327cd0298ff

Don't know if that's significant, but these two crashed 6 times in about 1 minute during that period of high packet loss.
Labels: Restrict-View-SecurityTeam
Cc: nisse@chromium.org
Cc: tommi@chromium.org
Cc: eladalon@chromium.org

Comment 33 by nisse@chromium.org, Aug 24 2017

Hi, to debug this, we'd need to be able to reproduce. Do you think we could have one or all of a packet dump, rtc_event_log, or a client test account so we could receive streams from the MCU?

Comment 34 by nisse@chromium.org, Aug 24 2017

Cc: solenberg@chromium.org
I can set up a test account this morning. It will not take long. Once complete, how can I provide you the details to access it? I see this issue is now restricted, is it safe to post the information here?

Comment 36 by nisse@chromium.org, Aug 24 2017

Please send account details via mail, to holmer@google.com. This issue may be opened some time after the bugs are fixed.

In the mean time, we've found one of the recorded crashes included a stack trace clearly identifying an infinite recursion.  Fix at https://codereview.webrtc.org/3004553002/. Bug triggered if UlpFEC is used to protect media packets including the RED encapsulation (when Chrome generates FEC packets, FEC is applied to the media packets as they appear *before* RED encapsulation).

Comment 37 Deleted

Project Member

Comment 38 by bugdroid1@chromium.org, Aug 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/external/webrtc.git/+/41476e014c8364adc15b90238d54a8aef91d7f56

commit 41476e014c8364adc15b90238d54a8aef91d7f56
Author: nisse <nisse@webrtc.org>
Date: Fri Aug 25 16:08:44 2017

When Ulpfec recovers a packet, set |returned| flag earlier.

This avoids infinite recursion in case the recovered packet carries a
RED header.

BUG= chromium:754748 

Review-Url: https://codereview.webrtc.org/3004553002
Cr-Commit-Position: refs/heads/master@{#19525}

[modify] https://crrev.com/41476e014c8364adc15b90238d54a8aef91d7f56/webrtc/modules/rtp_rtcp/source/ulpfec_receiver_impl.cc

Cc: huib@chromium.org blum@chromium.org

Comment 40 by nisse@chromium.org, Aug 28 2017

I was trying to write a testcase for FEC on top of RED, when I realized that maybe that's not possible. Consider the receive side. When we get a media packet, we want to pass it to the FEC machinery because it may be part of a FEC block. But then we'd need to know if the FEC machinery should see the RED packet or the decapsulated media packet, because if we don't do it in exactly the same way as on the sending side, reconstructed packets will get garbled.

Looking at the spec, rfc5109, section 10.3 (example) and 14.2 (more normative, but unclear if it's really describe out case) seem to be those that provide some clues on how FEC is supposed to work. The latter says

   The FEC MUST protect only
   the main codec, with the payload of FEC engine coming from virtual
   RTP packets created from the main codec data.

I don't find any definition of "virtual RTP packet", but my best guess is that it means media packets without the RED encapsulation.

Do you agree? Then fixing the processing in your MCU to do FEC before RED is essential to get FEC to work correctly.

When my fix from Friday gets into Chrome canary (hopefully by tomorrow), it would be interesting to try to use it to connect to the version of your MCU used when the problem was discovered. From my current understanding, I would expect Chrome to not crash, but display somewhat garbled media because the packets supposedly recovered via FEC are corrupt.

On the chrome side, we obviously need to not crash on any network input. We should probably have handling of recovered packets bypass RED decapsulation, so that any recovered packet which happens to carry the RED payload type is treated like any other packet with unknown payload type, and never sent back into the FEC machinery.
I agree our MCU needed to change. Now that have more clarity on how it should have been done, it makes perfect sense. I think your understanding of rfc5109 is correct.

I have completed the work but I am still able to test the previous MCU build to confirm that the upcoming Chrome canary handles the stream without crashing. I understand that this testing is only to confirm that Chrome does not crash given this flawed input.
Latest Chrome Canary has the fix. Can you help verifying it?
I will attempt to verify later tonight. I'll report my findings here.
I was able to devote just over an hour to the testing. No crashes! The video becomes badly damaged, but eventually is repaired, though it often took quite a while for the browser to request the key frame. I understand that behavior is expected. I created the condition that would have crashed previously many times without crashing.
Thanks a lot for the testing!
Labels: -Restrict-View-SecurityTeam
Status: Fixed (was: Assigned)

Sign in to add a comment