WebM parser reports video frame durations incorrectly for 60fps videos |
||||||
Issue descriptionI have noticed that when playing high-framerate 60fps videos on YouTube the frame durations of the frames emitted from the webm stream parser are incorrect and seem to indicate an effective frame rate of 62.5 fps. How to repro: play any 60fps video, e.g. https://www.youtube.com/watch?v=U5dDG__Db5A and look at the video frame durations reported in the log. Here is an example of what I see: [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 47521 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.699 dur 0.016 kf 0 size 47517 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 3228 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.716 dur 0.016 kf 0 size 3224 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 2741 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.732 dur 0.016 kf 0 size 2737 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 3675 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.749 dur 0.016 kf 0 size 3671 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 1845 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.766 dur 0.016 kf 0 size 1841 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 3355 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.782 dur 0.016 kf 0 size 3351 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 2751 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.799 dur 0.016 kf 0 size 2747 [1:1:0805/103406:VERBOSE3:webm_parser.cc(514)] WebMParseElementHeader() : id a3 size 3144 [1:1:0805/103406:VERBOSE2:webm_cluster_parser.cc(663)] AddBuffer() : 1 ts 15.816 dur 0.016 kf 0 size 3140 Note that durations are 16.0ms (0.016s) which indicates 1000/16 = 62.5fps. For 60fps the durations should be 16.667ms. The timestamps seems to be correct, i.e. the timestamps indicate that actual frame durations are 16.667ms. For example note that the ts has actually increased by 0.017s between the first and last pairs of frames in the log quoted above.
,
Aug 8 2016
I'll take a look
,
Oct 18 2016
I think this may be a very big issue as it may be causing dropped frames across the board for 720p60 and 1080p60 playback on Chrome. I can't get a 60fps youtube video to play w/o dropped frames but Microsoft Edge rarely has issues (consistenly 0 dropped). See attached comparison of the stats for this 720p60 Youtube video (https://www.youtube.com/watch?v=hXeH0i0GtkQ) played completely by Chrome and Edge. Notice how the total frames is 1 different for Chrome but the dropped is much >0. Is this bug a potential cause for these dropped frames? If so can this issue be reviewed further? I'd started a reddit discussion here which based on upvotes seemed to suggest others also cannot watch 60fps videos on youtube in Chrome w/o dropped frames: https://www.reddit.com/r/youtube/comments/5823am/does_chrome_always_drop_frames_during_youtube/
,
Oct 18 2016
I don't know if incorrect frame durations could cause frame drops, but I've looked a bit further into WebM parser trying to understand why durations are incorrect, and I believe I've found the culprit. For some reason we are using PrecisionCappedDefaultDuration when calculating default frame durations for video track (see https://cs.chromium.org/chromium/src/media/formats/webm/webm_tracks_parser.cc?rcl=0&l=114) and that rounds the precise duration value read from WebM container (kWebMIdDefaultDuration which is read at https://cs.chromium.org/chromium/src/media/formats/webm/webm_tracks_parser.cc?rcl=0&l=309, I see the value of 16683333ns ~= 0.01668s). So the question now is: why are we using PrecisionCappedDefaultDuration instead of the precise duration value in WebM parser. Matt, looks like that code was added by you. Any ideas? Can we use precise duration values instead of rounded?
,
Oct 19 2016
FWIW I've tried removing rounding for default duration values for audio and video tracks and I still see some frames being dropped when playing YouTube 60fps content. But I haven't noticed any other side effects of this change, besides ~4 media_unittests failing (which is expected, I guess). So I'm wondering if rounding was really necessary here. Perhaps we can remove it. Matt, WDYT?
,
Oct 21 2016
If the video frame durations were slightly undercalculated by the parser, there shouldn't be frame drops. Only if decode doesn't keep up, or there are overlapped-appends should there be any frame drops by the media pipeline. Visible frame drops caused by OS compositing delay might also occur; perhaps that's what's distinct on Windows Edge vs Chrome. If durations were severely undercalculated, buffered range gaps or coded-frame-group-discontinuities could occur, but the sequence in the original post doesn't indicate that's occurring either. Frame drops? I suspect there might be something around the decode->render->composition portion of the media pipeline that's causing frames to be dropped more on Chrome vs Edge. Or, perhaps Edge *isn't* correctly accounting dropped frames. See also crbug 657560, where our test team is investigating verification of at least Chrome's accounting of dropped frames. PrecisionCappedDefaultDuration is simply to afford other timecodescales to be used and mapped into Chrome's microsecond-granularity base::TimeDelta, with no greater precision than allowed by the timecodescale itself (limited also by maximum of 1 microsecond precision). I would really be surprised if this precision capping alone were causing any kind of frame dropping. For instance, the default/common webm timecodescale is 1,000,000 -> which means the unit of time is 1 millisecond. It would make no sense to have a 1.5 millisecond default duration when timestamps of the frames alone are no greater precision than 1 millisecond. So long as we don't round-up to 2 milliseconds, we shouldn't get unexpected overlap appends. Also, non-default-durations (those specified in BlockGroups) are *already limited to timecodescale granularity*. If there's a compelling reason to always use 1-microsecond precision for default durations, we could reconsider, but I don't see frame drops resulting from the current default duration precision capping behavior.
,
Oct 21 2016
Matt, I agree that incorrect frame durations are probably not causing the issue described in comment #3 (at least not directly). But I still think we don't need to round default duration to 1 millisecond. I can see that frame timestamps are actually calculated with greated precision, even when timecodescale is 1ms. For 60fps videos I can see that the first 4 frames have pts of 0ms, 17ms, 33ms, 50ms - note that this is consistent with frame durations being 16.683ms, and not 16ms or 17ms!
,
Oct 21 2016
Sergey, I thought a bit more and I believe found the reasoning why we originally did this precision-capped-duration logic for webm default-block-durations. Imagine the following: timecodescale is like usual, resulting in timestamps with 1ms granularity, and the following sequence of blocks, "pcdd" = precisionCappedDefaultDuration, "rawdd" = uncapped-default-duration, "{pcdd,rawdd}-fet" = timestamp + duration of the coded frame/block:
(for simplicity here, all times in fractional milliseconds):
timestamp pcdd pcdd-fet rawdd rawdd-fet
0 1 1 1.2 1.2 (really 0-1.2)
1 1 2 1.2 2.2 (really 1.2-2.4)
2 1 3 1.2 3.2 (really 2.4-3.6)
3* 1 4 1.2 4.2 (really 3.6-4.8)
or
4** 1 5 1.2 5.2 (really 3.6-4.8)
4* 1 5 1.2 5.2 (really 4.8-6)
or
5** 1 6 1.2 6.2
Note that, since the *granularity of timestamps* is already precision-capped, and the MSE coded frame processing algorithm operates on per-coded-frame timestamp and duration values (and splices/removes/or assumes discontinuity any presentation interval overlaps, precision capping the default duration eliminates possible spurious overlaps which really aren't in the original media, at the expense of tiny gaps (which the algorithm shouldn't normally detect as discontinuities). Note that rawdd-fet frequently is overlapped by the next frame's timestamp. That's the core of why we precision-cap the duration.
The muxer just needs to be consistent in which of * or ** routes it chooses (truncate or round) to precision-cap the timestamps.
,
Oct 21 2016
Correction to c#8: remove the substring "/or assumes discontinuity" from my comment. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by servolk@chromium.org
, Aug 5 2016