New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 633016 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

Layout Test http/tests/media/media-source/mediasource-errors.html is flaky (again)

Project Member Reported by tapted@chromium.org, Aug 1 2016

Issue description

Started in 

https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Win10/builds/13024

then flakes and makes the build red about 3/4 of the time.

Only vaguely related change is r408830
Changed by	alokp@chromium.org
Changed at	Fri 29 Jul 2016 18:18:01
Revision	12386d605430eaec8d7b5da7d7e3816eea3aa2b8

Make PipelineImpl state change tasks consistent.

Review-Url: https://codereview.chromium.org/2091893003
Cr-Commit-Position: refs/heads/master@{#408830}
Changed files

media/base/pipeline_impl.cc
media/base/pipeline_impl.h
media/base/pipeline_impl_unittest.cc


Has been flaky in the past -  Issue 488184 

e.g. in https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Win10/builds/13027 in succeeded on retries

Output is like


20:27:12.854 7472 worker/1 http/tests/media/media-source/mediasource-errors.html output stderr lines:
20:27:12.854 7472   [5724:2736:0729/202712:1379296:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 273
20:27:12.854 7472   [5192:4676:0729/202712:1379671:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open
20:27:12.854 7472   [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed
20:27:12.854 7472   [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open
20:27:12.854 7472   [5724:2736:0729/202712:1379687:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:12.854 7472   [5724:2736:0729/202712:1379687:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:12.854 7472   [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested decode error on eos
20:27:12.854 7472   [5724:2736:0729/202712:1379703:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:12.854 7472   [5724:2736:0729/202712:1379703:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:12.854 7472   [5192:4676:0729/202712:1379703:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed
20:27:12.855 7472   [5724:2736:0729/202712:1379703:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:12.855 7472   [5724:2736:0729/202712:1379703:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:12.855 7472   [5192:4676:0729/202712:1379703:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested network error on eos
20:27:12.918 6592 [3/489] http/tests/media/media-source/mediasource-errors.html failed unexpectedly (asserts failed)
20:27:12.914 7472 worker/1 http/tests/media/media-source/mediasource-errors.html failed:
20:27:12.915 7472 worker/1  asserts failed


but a passing run isn't much different:


20:27:51.312 7812 worker/0 http/tests/media/media-source/mediasource-errors.html output stderr lines:
20:27:51.312 7812   [7256:5968:0729/202750:1417890:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 273
20:27:51.312 7812   [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open
20:27:51.312 7812   [7256:5968:0729/202751:1418156:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:51.312 7812   [7256:5968:0729/202751:1418156:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:51.312 7812   [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open
20:27:51.312 7812   [7256:5968:0729/202751:1418156:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:51.312 7812   [7256:5968:0729/202751:1418156:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:51.312 7812   [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed
20:27:51.312 7812   [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed
20:27:51.312 7812   [7256:5968:0729/202751:1418171:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561
20:27:51.313 7812   [7256:5968:0729/202751:1418171:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11
20:27:51.313 7812   [7840:4800:0729/202751:1418171:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested decode error on eos
20:27:51.313 7812   [7840:4800:0729/202751:1418171:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested network error on eos
20:27:51.317 7812 worker/0 http/tests/media/media-source/mediasource-errors.html passed
20:27:51.317 7812 worker/0 killing secondary driver
 
Project Member

Comment 1 by bugdroid1@chromium.org, Aug 1 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b62c782b8664ef8c7e270bfb728fb729dd56153b

commit b62c782b8664ef8c7e270bfb728fb729dd56153b
Author: tapted <tapted@chromium.org>
Date: Mon Aug 01 01:30:25 2016

Mark LayoutTest mediasource-errors.html FLAKY

BUG= 633016 
TBR=alokp@chromium.org
NOTRY=true

Review-Url: https://codereview.chromium.org/2197003002
Cr-Commit-Position: refs/heads/master@{#408898}

[modify] https://crrev.com/b62c782b8664ef8c7e270bfb728fb729dd56153b/third_party/WebKit/LayoutTests/TestExpectations

Labels: MSEscrubbed
alokp@ - can you comment on your change as a potential cause for this flake? If you don't suspect please assign to me.

Comment 3 by alokp@chromium.org, Aug 22 2016

Yeah back in office today. Will investigate this week.

Comment 4 by alokp@chromium.org, Aug 22 2016

Status: Started (was: Assigned)

Comment 5 by alokp@chromium.org, Aug 25 2016

Cc: chcunningham@chromium.org sande...@chromium.org servolk@chromium.org dalecur...@chromium.org
The flakiness is indeed caused by r408830, where we started using SerialRunner to run the sequence of tasks to start the pipeline. ChunkDemuxer posts init_cb to the media thread. When init_cb gets run, it posts ReportMetadata. If ChunkDemuxer posts an error before init_cb gets run, we shutdown the pipeline and never report the metadata. There is a race between ChunkDemuxer::ReportError_Locked, which runs on the main thread and init_cb on the media thread.

There are couple of ways to fix this:
1. The most simple way is to not post init_cb. But I feel it is hacky and fragile. 
https://codereview.chromium.org/2273273002/

2. Another more involved way is to let Demuxer report metadata. This seems much more robust to me but requires changes much more code and changes to DemuxerHost interface.
https://codereview.chromium.org/2272293002/

I am not too familiar with Demuxer code. So I would appreciate any feedback on which approach to pursue.
Cc: xhw...@chromium.org
I think either is okay. There are several other instances where we have to be careful not to BindToCurrentLoop to avoid double trampolines. +xhwang for opinions. So long as a test can be added it seems okay to me.

The second approach uses AutoUnlock which is always a bit of a smell to me.

Comment 7 by alokp@chromium.org, Aug 25 2016

In the first patch, init_cb will be run on main thread (SerialRunner will post the actual task to the media thread though). We BindToCurrentLoop all other callbacks in chunk_demuxer.cc, so this will be different from others - just FYI.

AutoUnlock is easy to remove by posting the task to media thread, but again rest of the file calls demuxer_host_ on the main thread. I do not know if this is desired or we should change it to call demuxer_host_ on media thread everywhere. FWIW I have a TODO in pipeline_impl.cc to clarify threading for DemuxerHost:
https://cs.chromium.org/chromium/src/media/base/pipeline_impl.cc?l=491

Comment 8 by alokp@chromium.org, Aug 29 2016

xhwang@: Opinion on which of the two options in comment #5 to pursue?

Comment 9 by alokp@chromium.org, Sep 12 2016

xhwang@: ping :)
Sorry for the delay!

It seems to me the cause is that callbacks are posted but DemuxerHost calls are not posted. This is okay for most cases but can be tricky here. In particular, if error happens, we don't wait or care about pending callbacks (including the init_cb) and will start the destruction process. This is similar to the case where the demuxer initialization takes long time and we start to destroy the pipeline, which is actually a legit case.

Now, in our pipeline, we only report metadata after demuxer initialization succeeds [1], which still makes sense to me. So if error happens before the demuxer init_cb gets a chance to run, we won't report the metadata.

So what does the test expect? Is it expecting the metadata report even though we know error will happen? Are we violating any spec text?

[1] https://cs.chromium.org/chromium/src/media/base/pipeline_impl.cc?rcl=0&l=244

Comment 11 by alokp@chromium.org, Sep 12 2016

The test expects metadata to be reported if initialization segment is valid, which I think is reasonable. The error happens *after* demuxer has successfully initialized and posted init_cb. It just did not get a chance to run.
Do you know when/how the "error" happens? Typically how we deal with this in media/ is that initialization errors is only reported through the init_cb, not the host call (or an error_cb). Then the pipeline will not ask the demuxer (or decoder) to do any further work (hence no further error) before the init_cb is received. Will that model work here?

If not, either solution works for me, but in either case, we should clarify the expectations of our API, otherwise the code will be harder to understand in the future.


Comment 13 by alokp@chromium.org, Sep 12 2016

This is *not* an initialization error. This is a demuxer error which happens *after* initialization, but gets reported earlier than init_cb because of reasons explained in comment #5.

I think we should go with solution #1 since it requires minimal change. It is also correct because it executes the callback directly instead of posting it, which is consistent with how demuxer error is reported (DemuxerHost::OnDemuxerError).
So from the pipeline's perspective, we do start demuxing before init_cb is received by the pipeline. This is how it's different from the rest of the pipeline.

Solution #1 is good for now with some comments and preferably a test. In the future, we should tighten the demuxer API a bit more so we don't surprise people.

Comment 15 by alokp@chromium.org, Sep 12 2016

You bring up a good point. Is there a Demuxer API that pipeline uses to "start" demuxing? Is it Demuxer::Seek or DemuxerStream::Read?

May be we do not need DemuxerHost::OnDemuxerError? All demuxer errors can be reported by one of the above APIs?
I don't think there's an API for the pipeline to start demuxing. For the case of ChunkDemuxer, it seems the demuxer will start demuxing by itself (given there's data appended).

There's definitely some improvement we can do to clean up these APIs. Could you please file a bug on that? For now, we can just go with #1 if you also agree.

Comment 17 by alokp@chromium.org, Sep 16 2016

OK. Please see new patch with updated tests:
https://codereview.chromium.org/2273273002/
Cc: posciak@chromium.org
@#16 demuxing means multiple things: parsing input media into some notion of a buffered timeline, and feeding such buffers out of the timeline when requested.
The app is in control of at least appending the info necessary to reach normal "init_cb_" PIPELINE_OK execution. The pipeline and renderers control when reads are requested of the demuxer.

What cleanup of the "start demuxing" notion, at least in case of MSE implementation, would you suggest?
Project Member

Comment 20 by bugdroid1@chromium.org, Oct 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0c4e92a05559f716367fdef9928ae6df0110ff65

commit 0c4e92a05559f716367fdef9928ae6df0110ff65
Author: alokp <alokp@chromium.org>
Date: Thu Oct 06 07:06:47 2016

Directly call ChunkDemuxer::Initialize completion callback.

ChunkDemuxer posts init_cb while calls DemuxerHost::OnDemuxerError
directly, which creates a race between the two. In some cases if
there is an error right after a successful intialization, the error
will be reported before init_cb has a chance to run which violates
upstream expectations.

BUG= 633016 

Review-Url: https://codereview.chromium.org/2273273002
Cr-Commit-Position: refs/heads/master@{#423463}

[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.cc
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.h
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer_unittest.cc
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/third_party/WebKit/LayoutTests/TestExpectations

Status: Fixed (was: Started)
Flakiness dashboard indicates that this is fixed post r423463.
Project Member

Comment 22 by bugdroid1@chromium.org, Oct 27 2016

Labels: merge-merged-2840
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0c4e92a05559f716367fdef9928ae6df0110ff65

commit 0c4e92a05559f716367fdef9928ae6df0110ff65
Author: alokp <alokp@chromium.org>
Date: Thu Oct 06 07:06:47 2016

Directly call ChunkDemuxer::Initialize completion callback.

ChunkDemuxer posts init_cb while calls DemuxerHost::OnDemuxerError
directly, which creates a race between the two. In some cases if
there is an error right after a successful intialization, the error
will be reported before init_cb has a chance to run which violates
upstream expectations.

BUG= 633016 

Review-Url: https://codereview.chromium.org/2273273002
Cr-Commit-Position: refs/heads/master@{#423463}

[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.cc
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.h
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer_unittest.cc
[modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/third_party/WebKit/LayoutTests/TestExpectations

Cc: -posciak@chromium.org bustamante@chromium.org
Vs the bot in #22, I *don't* see this having merged to branch M54/2840. I only see "mark the broken test as flaky" commit in #1 in the M54/2840 branch.

bustamante@: FYI - I assume this doesn't make the M54 merge bar, right? How did that merge-merged-2840 label get assigned incorrectly?
I am not sure how merge-merged-2840 got attached to this bug. I did not intend to merge the fix.

Comment 26 by dimu@google.com, Nov 4 2016

Labels: -merge-merged-2840
[Automated comment] removing mislabelled merge-merged-2840

Sign in to add a comment