Reference time calculation in WebAudioMediaStreamSource::DeliverRebufferedAudio() assumes 10 ms intervals between calls |
||||||
Issue descriptionWebAudio MediaStream Destination rendering works like this: AudioDestination::render() render buffers of output device buffer size. To do so, it pulls 128-frame buffers out of WebAudio graph as many times as required to fill the output buffer. The 128 frame buffers it pulls out of the graph are also pushed into WebAudioMediaStreamSource, where they are rebuffered to 10 ms buffers and then pushed to the media stream consumer. Reversed call stack (media::AudioDeviceThread::ThreadMain() drives the execution): ------- 10 ms buffer (may be e.g. 160) content::PepperMediaStreamAudioTrackHost::AudioSink::OnData(media::AudioBus const&, base::TimeTicks) content::MediaStreamAudioDeliverer<content::MediaStreamAudioSink>::OnData(media::AudioBus const&, base::TimeTicks) 5content::MediaStreamAudioTrack::OnData(media::AudioBus const&, base::TimeTicks) content::MediaStreamAudioDeliverer<content::MediaStreamAudioTrack>::OnData(media::AudioBus const&, base::TimeTicks) content::MediaStreamAudioSource::DeliverDataToTracks(media::AudioBus const&, base::TimeTicks) content::WebAudioMediaStreamSource::DeliverRebufferedAudio(media::AudioBus const&, int) ------- 128 frames buffer media::AudioPushFifo::Push(media::AudioBus const&) content::WebAudioMediaStreamSource::consumeAudio(blink::WebVector<float const*> const&, unsigned long) blink::ConsumerWrapper::consumeAudio(blink::AudioBus*, unsigned long) blink::MediaStreamSource::consumeAudio(blink::AudioBus*, unsigned long) blink::MediaStreamAudioDestinationHandler::process(unsigned long) blink::AudioHandler::processIfNecessary(unsigned long) blink::DeferredTaskHandler::processAutomaticPullNodes(unsigned long) blink::AudioDestinationHandler::render(blink::AudioBus*, blink::AudioBus*, unsigned long, blink::AudioIOPosition const&) ------- Output device buffer (may be e.g. 512) blink::AudioDestination::render(blink::WebVector<float*> const&, unsigned long, double, double, unsigned long) content::RendererWebAudioDeviceImpl::Render(base::TimeDelta, base::TimeTicks, int, media::AudioBus*) + 688 media::AudioOutputDevice::AudioThreadCallback::Process(unsigned int) media::AudioDeviceThread::ThreadMain() The problem: Reference time calculation in WebAudioMediaStreamSource::DeliverRebufferedAudio() assumes that DeliverRebufferedAudio() is called with approximately 10 ms intervals: frame delay offset is calculated from |current_reference_time_| obtained on the last push. But it is not, if output buffer size is larger than 10 ms: in such a case DeliverRebufferedAudio() is called side by side multiple times with a larger interval between the bursts of calls. The difference between two adjacent |current_reference_time_| values varies from several microseconds to tens of milliseconds. This results in non-increasing sequence of reference time stamps which makes the media stream consumers unhappy. Overall, the consumers of WebAudio as MediaStream are not guaranteed to receive audio buffers at 10 ms intervals. In most cases it does not really show up though, because output buffer size is usually close to or smaller than 10 ms at common sample rate
,
Mar 14 2017
,
Apr 11 2017
Relevant e-mail discussion on this issue, between me and olka@... My copy-and-paste from Gmail didn't preserve the "who said what" formatting; but it's pretty easy to follow along, regardless. -------------------------------------------- Sounds like we're on the same page, except for whether the reference timestamps are so inaccurate that we would consider the current impl broken. CIL... -------------------------------------------- As I mentioned above, the call stack is driven by AudioOutputDevice -> WebAudioDeviceImpl -> AudioDestination call which requests a buffer of platform output buffer size. While pulling the graph to fill that buffer, AudioDestination also pushes 128 frame buffers into WebAudioMediaStreamSource which rebuffers the audio to 10 ms using this fifo: https://cs.chromium.org/chromium/src/content/renderer/media/webaudio_media_stream_source.h?l=67. Audio format is not propagated further that this fifo. The audio format *is* propagated after this point: MediaStreamSource/Track/Sink provide this information in GetAudioParameters(), so all downstream consumers have it: https://cs.chromium.org/chromium/src/content/renderer/media/media_stream_audio_source.h?rcl=82d4b66fd8df31637f3a9cdbb4ada8a00b6b3781&l=90 Did you mean to imply that consumers should be aware that the audio has been rebuffered from 128 to 441 (or 480 or whatever 10 ms is)? I mean that the audio has been rebuffered twice: from audio device buffer size, to 128 and then to 10 ms. And WebAudioMediaStreamSource does not take it into account while doing reference time stamp calculation here https://cs.chromium.org/chromium/src/content/renderer/media/webaudio_media_stream_source.cc?type=cs&q=WebAudioMediaStreamSource::consumeAudio&l=104. Well, sort of "yes" and sort of "no." The main problem here is my flawed "pragmatic heuristic." We are snapshotting the reference clock at the wrong place in the code. That's not taking into account the first of the two rebufferrings. However, the code on L104 *does* take into account the 128-->441 rebufferring. It works backwards, under the assumption that |current_reference_time_| is associated with one sample past the audio being pushed into the FIFO. It uses |frame_delay| to determine what the reference time should have been for the first sample coming out of the FIFO (into the 10 ms buffer). That said, at best, this is all an approximation; and should be fixed to achieve better accuracy (hence, my TODO comment). -------------------------------------------- BTW--I'm thinking we can just remove this rebuffering FIFO. It was there for legacy reasons: The MediaStream code used to have a hard-coded 10 ms assumption everywhere. Since my refactoring, that's no longer a requirement since the WebRTC adapter sink will rebuffer audio to 10 ms if needed. -------------------------------------------- So the buffers are 10 ms, but they do not necessary come at uniform 10 ms intervals (though their average is still 10 ms). If two calls to WebAudioMediaStreamSource::DeliverRebufferedAudio() are issued with 1 ms interval, we may get a reference time stamp which precedes the previously calculated one, because frame delay is milliseconds is subtracted from the time the call is issued at, which is only 1 ms later than the previous call. Yes, I agree that's possible and probably happens often in some situations (my graph from an earlier e-mail showed this). My TODO comment in the code (where TimeTicks::Now() is called) is basically saying, "sampling the clock here is a big source of inaccuracy." However, this shouldn't break end-to-end functionality: Reference timestamps are approximations. They are flawed in the short-term, but average-out to an "A/V synchronization goal" in the longer term. As I've said before, if downstream consumers need precise, exact-to-the-frame, monotonically increasing timestamps, they should be tracking the number of samples consumed and dividing by sampling rate to get a precise timestamp. It's only when the reference timestamp skews too far from the "media timestamp" that a consumer might need to take action (e.g., detecting gaps/skips in the media, and/or stretching the signal to compensate and re-sync to video). That said, I *do* agree that the reference timestamps can and should be made more accurate. It seems the reference timestamps should be generated upstream of WebAudioMediaStreamSource, inside the Blink WebAudio impl, as that's where the realtime thread lives that paces and generates the audio signal. However, I don't agree that the current impl is totally broken. It's just inaccurate, and may or may not be harming performance downstream (e.g., false detection of gaps/skips, and/or premature signal stretching causing a "wah" effect at the listener). It's also worth mentioning that the problem may be worse that you think: On Windows systems on hardware older than 6 years, the resolution of the TimeTicks clock could sometimes be as poor as 15.6 ms! ;-) So, I think we should be treating this as a performance issue, not a broken impl issue. And, there is an obvious improvement to be made: basically, address all my TimeTicks::Now() TODO's in content/renderer/media. We should be able to make the reference timestamps monotonically increasing everywhere. Does this sound right to you? I haven't said it's *totally* broken. It does not work correctly in certain situations which I described :) Non-increasing reference time stamps do not look like a performance problem to me: it's a contract violation, isn't it? If we were talking about the media timestamps, it would definitely be a contract violation to go backwards. For reference timestamps, I'm not sure whether it's a contract violation. I would say that, ideally, they should be monotonic. A good answer to this question would be found in my RTP book (it's a very thorough academic-style summary of the RTP protocol, which introduces all the "multiple clock" and "synchronization" concepts I've been talking about)...but somebody borrowed it and I can't find it right now. ;-) BTW--Your team should have a copy, if you don't already: https://www.amazon.com/RTP-Audio-Internet-Colin-Perkins/dp/0672322498/ref=mt_hardcover?_encoding=UTF8&me= There might be a newer edition...
,
Apr 11 2017
Marking available...
,
Apr 12 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 13 2018
,
Apr 23 2018
[Re-triage] olka@ and miu@, is this still relevant? Can you close, assign, or set back to available please?
,
May 4 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by bugdroid1@chromium.org
, Mar 13 2017