Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 310838 webgl_conformance_tests became flaky on Win7 Debug (NVIDIA)
Starred by 1 user Project Member Reported by kbr@chromium.org, Oct 23 2013 Back to list
Status: Fixed
Owner: dalecur...@chromium.org
Closed: Jan 2014
Cc: tommi@chromium.org, bajones@chromium.org, dalecur...@chromium.org, ccameron@chromium.org, henrika@chromium.org, zmo@chromium.org
Components:
OS: Windows
Pri: 2
Type: Bug

Blocking:
issue 313834


Sign in to add a comment
Version: TOT
OS: Windows 7

On October 22 around 22:57 Pacific time the webgl_conformance_tests started being flaky on the Win7 Debug (NVIDIA) bot. See:

http://build.chromium.org/p/chromium.gpu/builders/Win7%20Debug%20%28NVIDIA%29?numbuilds=100

The first build failing in this way was:

http://build.chromium.org/p/chromium.gpu/builders/Win7%20Debug%20%28NVIDIA%29/builds/12575

Unfortunately, the failure is not only flaky, but the blamelist isn't quite correct (being worked on separately). However, analysis of recent builds comes up with the following approximate regression range:

http://build.chromium.org/f/chromium/perf/dashboard/ui/changelog.html?url=/trunk&range=230323%3A230341

which contains in particular:

http://src.chromium.org/viewvc/chrome?revision=230334&view=revision

and *all* of the failing builds contain errors coming from audio_low_latency_output_win.cc.

I suspect that CL and plan to revert it to see whether it clears up the flakiness.

P1 because this is a regression causing unreliability in these tests.

 
Comment 1 by kbr@chromium.org, Oct 23 2013
dalecurtis@ points out the following CLs as possibly suspect as well:

https://codereview.chromium.org/26069003/
https://codereview.chromium.org/27181005/

Owner: henrika@chromium.org
I think none of these are the actual culprits, they've just exposed a lurking problem.

Assigning to our WASAPI expert henrika@ to investigate.  I'll also take a look.

Errors:
[3124:4024:1023/081611:ERROR:audio_low_latency_output_win.cc(334)] Failed to prepare endpoint buffers with silence.

[4108:2732:1022/233636:ERROR:audio_low_latency_output_win.cc(558)] Failed to use rendering audio buffer: 8889000b
[4108:2732:1022/233636:ERROR:audio_low_latency_output_win.cc(488)] WASAPI rendering failed.: The operation completed successfully.

The new AudioOutputDispatcher changes make it more likely that we'll reuse a stream before it's Close()'d. I suspect there's a bug where Stop() isn't completely resetting the state, so when the next proxy gets the old stream and tries to start it up again, things go bad.
Comment 3 by kbr@chromium.org, Oct 23 2013
It may well be the case that there's a preexisting problem but it isn't workable to have new sources of flakiness entering the tree. I've reverted r230334 in https://codereview.chromium.org/38533002 while this is being investigated.

Project Member Comment 4 by bugdroid1@chromium.org, Oct 23 2013
------------------------------------------------------------------------
r230543 | kbr@chromium.org | 2013-10-23T23:05:27.432390Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/audio_output_proxy_unittest.cc?r1=230543&r2=230542&pathrev=230543
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/audio_output_dispatcher_impl.cc?r1=230543&r2=230542&pathrev=230543
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/audio_output_dispatcher.h?r1=230543&r2=230542&pathrev=230543
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/audio_output_dispatcher_impl.h?r1=230543&r2=230542&pathrev=230543

Revert 230334 "Improve and simplify AudioOutputDispatcher."

Likely cause of flakiness on Win7 Debug (NVIDIA) bot.

BUG= 310838 
> Improve and simplify AudioOutputDispatcher.
> 
> AudioOutputDispatcher does a number of things which no longer make
> sense in a world where we don't create tons of physical streams:
> 
>     - Streams are not immediately recycled but instead inserted
>     into a paused queue.  This was done to improve cycle time
>     when physical streams were rapidly paused and played.
> 
>     - If any stream was paused previously and then closed, every
>     new stream opened will open two streams.  Again this was to
>     improve cycle time for paused and played streams.
> 
> For HTML5, this has been handled by AudioRendererMixer for some time.
> WebRTC, WebAudio, and PPAPI generally don't start and stop streams
> repeatedly in a way that this code is beneficial.
> 
> As such, all of that code has been removed.  As a hedge, one stream
> is left open after Close() until the close timer fires.  This allows
> cycle time to be improved in cases of interleaved Start/Stop cycles.
> 
> I've additionally cleaned up the code to use std::algorithm where
> applicable, removed unnecessary WeakPtr setup, and cleaned up the
> unittests.
> 
> BUG=none
> TEST=audio plays, unittests pass.
> 
> Review URL: https://codereview.chromium.org/27605002

TBR=dalecurtis@chromium.org

Review URL: https://codereview.chromium.org/38533002
------------------------------------------------------------------------
The error its throwing is: AUDCLNT_E_BUFFER_OPERATION_PENDING presumably both errors are due to the same thing.

Henrik, should FillRenderEndpointBufferWithSilence() be executed before the render thread is started up?

It also looks like we're not resetting audio_samples_render_event_ between Stop/Start sessions. So my hunch is the event is active after Stop() and as soon as the render thread is started, it calls GetBuffer() which races with FillRenderEndpointBufferWithSilence(). 
Labels: -Pri-1 Pri-2
Since reverted, Pri-2
Cc: henrika@chromium.org
Owner: tommi@chromium.org
Dale, I tried to come up with a new unit test (https://codereview.chromium.org/39813002/) which would verify your theory:

"The new AudioOutputDispatcher changes make it more likely that we'll reuse a stream before it's Close()'d. I suspect there's a bug where Stop() isn't completely resetting the state, so when the next proxy gets the old stream and tries to start it up again, things go bad."

But I was not able to provoke any issues.

Both your ideas:

- FillRenderEndpointBufferWithSilence() before starting thread and
- Resetting audio_samples_render_event_ between Stop/Start sessions

seems like good ideas but I am currently unable to say: -"Yes, it will solve the issues" since I can't reproduce the problem locally. Guess I would have to reinstall your failing patch and take it from there.

The _real_ WASAPI expert tommi@ has stepped up and volunteered to crack this one while I am OOO and I am convinced that he will fix it.
The way your test is structured, the event will always end in the unsignaled state.  Probably the easiest way to verify this is to just manually signal the event during Start() and see if the symptoms match.
Will do so tomorrow and provide feedback.
Was not able to trigger issue by starting with a set event.

AUDCLNT_E_BUFFER_OPERATION_PENDING  means "Buffer cannot be accessed because a stream reset is in progress" and we do call IAudioClient::Reset() in Stop(). But, if it would have failed:

// Flush all pending data and reset the audio clock stream position to 0.
hr = audio_client_->Reset();
if (FAILED(hr)) {
    LOG_GETLASTERROR(ERROR)
        << "Failed to reset streaming: " << std::hex << hr;
   callback->OnError(this);
}

a log is added and OnError is called so it does not seem to be the case here.

One could perhaps call Reset in a while loop while it returns AUDCLNT_E_BUFFER_OPERATION_PENDING to ensure that Reset har _really_ finalized.
Blocking: chromium:313834
The AOD changes I made remove the cycle time between streams, so it's possible stream reuse is contributing to the issue.

tommi@, had a chance to look at this?
Comment 13 by tommi@chromium.org, Nov 13 2013
Cc: -henrika@chromium.org tommi@chromium.org
Owner: henrika@chromium.org
No, sorry I haven't.  Henrik can you take a look at this?
Sure, I can do that tomorrow when I am in the office.

Dale, in the mean time, can you add some more feedback on what you think triggers this? Can I somehow reactive your parts that triggered the problem? It might be difficult to solve "analytically" otherwise and I would like to be able to cause the same behavior from a unit test in media if possible.
It seems likely its related to stream reuse where none previously existed. The old dispatcher moved streams into a waiting room (see pause_delay_ in AudioOutputDispatcherImpl) before reusing them. My CL removes the pause delay (since Stop() should be synchronous) and additionally keeps one stream open until the close timer fires (see close_delay_) for better reuse. 

You should be able to patch https://codereview.chromium.org/27605002/ in and run the conformance tests in debug mode to trigger the issue.
Tried the following tests:

python.exe content\test\gpu\run_gpu_test webgl_conformance --show-stdout --output-format=gtest --browser=debug -v --webgl-conformance-version=1.0.1 

with filter for the failing test:

python.exe content\test\gpu\run_gpu_test webgl_conformance --show-stdout --output-format=gtest --browser=debug -v --webgl-conformance-version=1.0.1 --page-filter=tex-image-and-sub-
image-2d-with-video-rgba4444.html

after patching in

https://codereview.chromium.org/27605002 (had to use 3-way merge plus some manual fixes)

but was not able to reproduce. I can verify that the the test does use WASAPIAudioOutputStream by adding a CHECK in the ctor. However, I am unable to force a debug output like what we see on the bots:

[ RUN      ] WebglConformance.conformance_textures_tex_image_and_sub_image_2d_with_video_rgba4444
[4108:5004:1022/233635:INFO:CONSOLE(33)] "Harness injected.", source:  (33)
[4108:2732:1022/233636:ERROR:audio_low_latency_output_win.cc(558)] Failed to use rendering audio buffer: 8889000b
[4108:2732:1022/233636:ERROR:audio_low_latency_output_win.cc(488)] WASAPI rendering failed.: The operation completed successfully.

even if I modify the code and add DLOG(ERROR) in places I know that we reach. 
Also created more complicated unit tests in media but failed to reproduce that way as well.

dalecurtis@: I have no good ideas on what to improve here and I would like to be able to guarantee that it works. If not, we might have to revert your patch again.

Would it be possible for you to (1) see if you can reproduce on your side, and (2) figure out how to see the DLOGs from Chrome in the test output. If that was possible I might be able to get more clues.
Or do you prefer if I try to make some changes that just might help so you can try your patch again even it might mean that your CL must be reverted again?
Comment 20 Deleted
Found a way to add more logging (see below). I was not able to find anything unusual in the calling sequence for the failing test but I assume that it might be better to check the sequence(s) in the test(s) before: 

D:\src\chromium\src>python.exe content\test\gpu\run_gpu_test webgl_conformance --show-stdout --output-format=gtest --browser=debug --extra-browser-args="--enable-logging --vmodule=*/media/*=1" -v --webgl-conformance-version=1.0.1 --page-fil
age-and-sub-image-2d-with-video-rgba4444.html

[3448:5172:1114/140912:VERBOSE1:audio_renderer_host.cc(309)] AudioRendererHost@225E5550::OnCreateStream(stream_id=1, render_view_id=1, session_id=0)
[8692:9024:1114/140912:VERBOSE1:gpu_video_decoder.cc(152)] (Re)initializing GVD with config: codec: 6 format: 1 profile: 11 coded size: [80,128] visible rect: [0,0,80,128]
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(651)] channel_layout   : 3
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(652)] sample_rate      : 44100
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(653)] bits_per_sample  : 16
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(654)] frames_per_buffer: 441
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(85)] WASAPIAudioOutputStream::WASAPIAudioOutputStream()
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(651)] channel_layout   : 3
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(652)] sample_rate      : 44100
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(653)] bits_per_sample  : 16
[3448:5916:1114/140912:VERBOSE1:core_audio_util_win.cc(654)] frames_per_buffer: 441
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(134)] Number of bytes per audio frame  : 4
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(135)] Number of audio frames per packet: 441
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(136)] Number of bytes per packet       : 1764
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(137)] Number of milliseconds per packet: 10
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(154)] WASAPIAudioOutputStream::Open()
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(339)] SetVolume(volume=1)
[3448:5916:1114/140912:VERBOSE1:audio_low_latency_output_win.cc(242)] WASAPIAudioOutputStream::Start()
[3448:7332:1114/140912:ERROR:audio_low_latency_output_win.cc(499)] =====
[3448:7332:1114/140912:ERROR:audio_low_latency_output_win.cc(499)] =====
[3448:7332:1114/140912:ERROR:audio_low_latency_output_win.cc(499)] =====
Sorry didn't have time to look at this today.  If you have ideas on what should be cleaned up or done better, that'd be great.  I can always retry the patch at a later date.

Aside from the potential thread safety issues, I'm not sure what else to try either.
Project Member Comment 23 by bugdroid1@chromium.org, Nov 28 2013
------------------------------------------------------------------------
r237694 | dalecurtis@chromium.org | 2013-11-28T05:51:57.853952Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/win/audio_low_latency_output_win.cc?r1=237694&r2=237693&pathrev=237694
   M http://src.chromium.org/viewvc/chrome/trunk/src/media/audio/win/audio_low_latency_output_win.h?r1=237694&r2=237693&pathrev=237694

Don't start WASAPI render thread until setup completes.

Creating the render thread before setting up the |audio_client_|
may lead to race conditions when repeatedly starting and stopping
a stream.

With this patch, nothing touches |audio_client_| until IAudioClient
triggers the render event for data.

As a bonus, it fixes a destruction issue where |audio_clock| was
being destructed on the wrong thread (render vs audio), see docs:

http://msdn.microsoft.com/en-us/library/windows/desktop/dd370873.aspx

BUG= 310838 
TEST=none

Review URL: https://codereview.chromium.org/89663004
------------------------------------------------------------------------
The original CL is relanding tonight (hopefully).  I'll check the bots tomorrow to see if any flakiness has cropped up.
Cc: henrika@chromium.org
Owner: dalecur...@chromium.org
Assigning to dalecurtis@ for further input.
Status: Fixed
The original change has been relanded and didn't break the bots, so closing.
Sign in to add a comment