Layout Test http/tests/media/media-source/mediasource-errors.html is flaky (again) |
|||||||||
Issue descriptionStarted in https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Win10/builds/13024 then flakes and makes the build red about 3/4 of the time. Only vaguely related change is r408830 Changed by alokp@chromium.org Changed at Fri 29 Jul 2016 18:18:01 Revision 12386d605430eaec8d7b5da7d7e3816eea3aa2b8 Make PipelineImpl state change tasks consistent. Review-Url: https://codereview.chromium.org/2091893003 Cr-Commit-Position: refs/heads/master@{#408830} Changed files media/base/pipeline_impl.cc media/base/pipeline_impl.h media/base/pipeline_impl_unittest.cc Has been flaky in the past - Issue 488184 e.g. in https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Win10/builds/13027 in succeeded on retries Output is like 20:27:12.854 7472 worker/1 http/tests/media/media-source/mediasource-errors.html output stderr lines: 20:27:12.854 7472 [5724:2736:0729/202712:1379296:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 273 20:27:12.854 7472 [5192:4676:0729/202712:1379671:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open 20:27:12.854 7472 [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed 20:27:12.854 7472 [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open 20:27:12.854 7472 [5724:2736:0729/202712:1379687:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:12.854 7472 [5724:2736:0729/202712:1379687:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:12.854 7472 [5192:4676:0729/202712:1379687:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested decode error on eos 20:27:12.854 7472 [5724:2736:0729/202712:1379703:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:12.854 7472 [5724:2736:0729/202712:1379703:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:12.854 7472 [5192:4676:0729/202712:1379703:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed 20:27:12.855 7472 [5724:2736:0729/202712:1379703:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:12.855 7472 [5724:2736:0729/202712:1379703:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:12.855 7472 [5192:4676:0729/202712:1379703:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested network error on eos 20:27:12.918 6592 [3/489] http/tests/media/media-source/mediasource-errors.html failed unexpectedly (asserts failed) 20:27:12.914 7472 worker/1 http/tests/media/media-source/mediasource-errors.html failed: 20:27:12.915 7472 worker/1 asserts failed but a passing run isn't much different: 20:27:51.312 7812 worker/0 http/tests/media/media-source/mediasource-errors.html output stderr lines: 20:27:51.312 7812 [7256:5968:0729/202750:1417890:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 273 20:27:51.312 7812 [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open 20:27:51.312 7812 [7256:5968:0729/202751:1418156:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:51.312 7812 [7256:5968:0729/202751:1418156:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:51.312 7812 [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR demuxer: could not open 20:27:51.312 7812 [7256:5968:0729/202751:1418156:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:51.312 7812 [7256:5968:0729/202751:1418156:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:51.312 7812 [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed 20:27:51.312 7812 [7840:4800:0729/202751:1418156:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: append failed 20:27:51.312 7812 [7256:5968:0729/202751:1418171:ERROR:mf_helpers.cc(12)] Error in dxva_video_decode_accelerator_win.cc on line 561 20:27:51.313 7812 [7256:5968:0729/202751:1418171:ERROR:gpu_video_decode_accelerator.cc(365)] HW video decode not available for profile 11 20:27:51.313 7812 [7840:4800:0729/202751:1418171:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested decode error on eos 20:27:51.313 7812 [7840:4800:0729/202751:1418171:ERROR:render_media_log.cc(25)] MediaEvent: PIPELINE_ERROR chunk demuxer: application requested network error on eos 20:27:51.317 7812 worker/0 http/tests/media/media-source/mediasource-errors.html passed 20:27:51.317 7812 worker/0 killing secondary driver
,
Aug 18 2016
alokp@ - can you comment on your change as a potential cause for this flake? If you don't suspect please assign to me.
,
Aug 22 2016
Yeah back in office today. Will investigate this week.
,
Aug 22 2016
,
Aug 25 2016
The flakiness is indeed caused by r408830, where we started using SerialRunner to run the sequence of tasks to start the pipeline. ChunkDemuxer posts init_cb to the media thread. When init_cb gets run, it posts ReportMetadata. If ChunkDemuxer posts an error before init_cb gets run, we shutdown the pipeline and never report the metadata. There is a race between ChunkDemuxer::ReportError_Locked, which runs on the main thread and init_cb on the media thread. There are couple of ways to fix this: 1. The most simple way is to not post init_cb. But I feel it is hacky and fragile. https://codereview.chromium.org/2273273002/ 2. Another more involved way is to let Demuxer report metadata. This seems much more robust to me but requires changes much more code and changes to DemuxerHost interface. https://codereview.chromium.org/2272293002/ I am not too familiar with Demuxer code. So I would appreciate any feedback on which approach to pursue.
,
Aug 25 2016
I think either is okay. There are several other instances where we have to be careful not to BindToCurrentLoop to avoid double trampolines. +xhwang for opinions. So long as a test can be added it seems okay to me. The second approach uses AutoUnlock which is always a bit of a smell to me.
,
Aug 25 2016
In the first patch, init_cb will be run on main thread (SerialRunner will post the actual task to the media thread though). We BindToCurrentLoop all other callbacks in chunk_demuxer.cc, so this will be different from others - just FYI. AutoUnlock is easy to remove by posting the task to media thread, but again rest of the file calls demuxer_host_ on the main thread. I do not know if this is desired or we should change it to call demuxer_host_ on media thread everywhere. FWIW I have a TODO in pipeline_impl.cc to clarify threading for DemuxerHost: https://cs.chromium.org/chromium/src/media/base/pipeline_impl.cc?l=491
,
Aug 29 2016
xhwang@: Opinion on which of the two options in comment #5 to pursue?
,
Sep 12 2016
xhwang@: ping :)
,
Sep 12 2016
Sorry for the delay! It seems to me the cause is that callbacks are posted but DemuxerHost calls are not posted. This is okay for most cases but can be tricky here. In particular, if error happens, we don't wait or care about pending callbacks (including the init_cb) and will start the destruction process. This is similar to the case where the demuxer initialization takes long time and we start to destroy the pipeline, which is actually a legit case. Now, in our pipeline, we only report metadata after demuxer initialization succeeds [1], which still makes sense to me. So if error happens before the demuxer init_cb gets a chance to run, we won't report the metadata. So what does the test expect? Is it expecting the metadata report even though we know error will happen? Are we violating any spec text? [1] https://cs.chromium.org/chromium/src/media/base/pipeline_impl.cc?rcl=0&l=244
,
Sep 12 2016
The test expects metadata to be reported if initialization segment is valid, which I think is reasonable. The error happens *after* demuxer has successfully initialized and posted init_cb. It just did not get a chance to run.
,
Sep 12 2016
Do you know when/how the "error" happens? Typically how we deal with this in media/ is that initialization errors is only reported through the init_cb, not the host call (or an error_cb). Then the pipeline will not ask the demuxer (or decoder) to do any further work (hence no further error) before the init_cb is received. Will that model work here? If not, either solution works for me, but in either case, we should clarify the expectations of our API, otherwise the code will be harder to understand in the future.
,
Sep 12 2016
This is *not* an initialization error. This is a demuxer error which happens *after* initialization, but gets reported earlier than init_cb because of reasons explained in comment #5. I think we should go with solution #1 since it requires minimal change. It is also correct because it executes the callback directly instead of posting it, which is consistent with how demuxer error is reported (DemuxerHost::OnDemuxerError).
,
Sep 12 2016
So from the pipeline's perspective, we do start demuxing before init_cb is received by the pipeline. This is how it's different from the rest of the pipeline. Solution #1 is good for now with some comments and preferably a test. In the future, we should tighten the demuxer API a bit more so we don't surprise people.
,
Sep 12 2016
You bring up a good point. Is there a Demuxer API that pipeline uses to "start" demuxing? Is it Demuxer::Seek or DemuxerStream::Read? May be we do not need DemuxerHost::OnDemuxerError? All demuxer errors can be reported by one of the above APIs?
,
Sep 14 2016
I don't think there's an API for the pipeline to start demuxing. For the case of ChunkDemuxer, it seems the demuxer will start demuxing by itself (given there's data appended). There's definitely some improvement we can do to clean up these APIs. Could you please file a bug on that? For now, we can just go with #1 if you also agree.
,
Sep 16 2016
OK. Please see new patch with updated tests: https://codereview.chromium.org/2273273002/
,
Sep 26 2016
,
Sep 27 2016
@#16 demuxing means multiple things: parsing input media into some notion of a buffered timeline, and feeding such buffers out of the timeline when requested. The app is in control of at least appending the info necessary to reach normal "init_cb_" PIPELINE_OK execution. The pipeline and renderers control when reads are requested of the demuxer. What cleanup of the "start demuxing" notion, at least in case of MSE implementation, would you suggest?
,
Oct 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0c4e92a05559f716367fdef9928ae6df0110ff65 commit 0c4e92a05559f716367fdef9928ae6df0110ff65 Author: alokp <alokp@chromium.org> Date: Thu Oct 06 07:06:47 2016 Directly call ChunkDemuxer::Initialize completion callback. ChunkDemuxer posts init_cb while calls DemuxerHost::OnDemuxerError directly, which creates a race between the two. In some cases if there is an error right after a successful intialization, the error will be reported before init_cb has a chance to run which violates upstream expectations. BUG= 633016 Review-Url: https://codereview.chromium.org/2273273002 Cr-Commit-Position: refs/heads/master@{#423463} [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.cc [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.h [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer_unittest.cc [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/third_party/WebKit/LayoutTests/TestExpectations
,
Oct 6 2016
,
Oct 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0c4e92a05559f716367fdef9928ae6df0110ff65 commit 0c4e92a05559f716367fdef9928ae6df0110ff65 Author: alokp <alokp@chromium.org> Date: Thu Oct 06 07:06:47 2016 Directly call ChunkDemuxer::Initialize completion callback. ChunkDemuxer posts init_cb while calls DemuxerHost::OnDemuxerError directly, which creates a race between the two. In some cases if there is an error right after a successful intialization, the error will be reported before init_cb has a chance to run which violates upstream expectations. BUG= 633016 Review-Url: https://codereview.chromium.org/2273273002 Cr-Commit-Position: refs/heads/master@{#423463} [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.cc [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer.h [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/media/filters/chunk_demuxer_unittest.cc [modify] https://crrev.com/0c4e92a05559f716367fdef9928ae6df0110ff65/third_party/WebKit/LayoutTests/TestExpectations
,
Nov 4 2016
Vs the bot in #22, I *don't* see this having merged to branch M54/2840. I only see "mark the broken test as flaky" commit in #1 in the M54/2840 branch. bustamante@: FYI - I assume this doesn't make the M54 merge bar, right? How did that merge-merged-2840 label get assigned incorrectly?
,
Nov 4 2016
I am not sure how merge-merged-2840 got attached to this bug. I did not intend to merge the fix.
,
Nov 4 2016
[Automated comment] removing mislabelled merge-merged-2840 |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by bugdroid1@chromium.org
, Aug 1 2016