New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 859604 link

Starred by 5 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 639174



Sign in to add a comment

Add uncompressed video recording option

Project Member Reported by emir...@chromium.org, Jul 2

Issue description

y4m can be a better option for accessing the actual pixels for the video. Currently, only way to do the equivalent is to paint it on <canvas> and access data which has extra steps.

Problems to be solved are 1) figuring out mime-type 2) deciding what to do with audio as y4m has no audio.
 
Cc: tomfinegan@chromium.org
A potential solution would be to use the Matroska's V_UNCOMPRESSED CodecID [1]
(where we'd save the actual FourCC in the ColourSpace [2] field); this would 
solve both problems at the same time: we could use video/x-matroska;codecs=raw-video 
or some such for mime and audio could be PCM or opus-encoded... The problem would
be that libwebm doesn't wire the 0x2EB524 ID.


[1] https://www.matroska.org/technical/specs/codecid/index.html
[2] https://matroska.org/technical/specs/index.html#ColourSpace
Cc: hta@chromium.org
Summary: Add uncompressed video recording option (was: Add y4m recording option)
Cc: katherinewu@chromium.org
Got it running on a playground CL, couple of comments:

- There doesn't seem to be a mime-type for raw, so I'm using
 "video/x-matroska;codec=yuv", see https://codepen.io/miguelao/full/LgWyGP

- Needs to patch to libwebm to wire ColourSpace:
 https://matroska.org/technical/specs/index.html#ColourSpace
 essentially wiring this libmatroska symbol:
 https://github.com/Matroska-Org/libmatroska/blob/HEAD/src/KaxSemantic.cpp#L388
 (I'm attaching the patch here just in case)

- The resulting file can be played in my VLC (gLinux one,
 3.0.3 Vetinari (revision 3.0.3-1-0-gc2bb759264)). Attached is also
 a recording.

- VideoTrackRecorder+MediaRecorderHandler communicates the codec 
 to WebmMuxer using media's enum VideoCodec [1] which doesn't 
 have an entry to represent "uncompressed", so the CL uses 
 kUnknownVideoCodec  :-/

Playground CL: https://chromium-review.googlesource.com/c/chromium/src/+/1273562

[1] https://cs.chromium.org/chromium/src/media/base/video_codecs.h?type=cs&sq=package:chromium&g=0&l=20
libwebm.KMkvColourSpace.patch
2.2 KB Download
test.webm
23.3 MB Download
Actually libwebm already has the code in the patch (landed in [1]),
so we just need to roll it in Chromium. Will do soon.


[1] https://chromium.googlesource.com/webm/libwebm/+/c2dcd8213949169e87ad116052fed54ea72ab166
Project Member

Comment 7 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/de34e70c29e24daaed6df41d93191ec5613327b7

commit de34e70c29e24daaed6df41d93191ec5613327b7
Author: Miguel Casas <mcasas@chromium.org>
Date: Wed Oct 10 23:29:48 2018

Roll src/third_party/libwebm 01c1d1d7..e4931ebc0

This CL rolls libwebm to use some recent functionality
(concretely, the ColourSpace CL).  libwebm is only used
from WebmMuxer which is turn is used for MediaRecorder
only; if the unit tests and content_browsertests are
passing we should be good.

* e4931eb - (HEAD -> crbug859604__colour_space_fourcc, origin/master, origin/HEAD, master) AUTHORS.TXT: dos2unix (7 days ago) <Johann>
* 8e1ae49 - Fix undefined behavior due to unassigned action (13 days ago) <Michael Bradshaw>
* f1fe631 - clang-format v6.0.1 (2 weeks ago) <Johann>
* c2dcd82 - Add ability to write and read track ColourSpace (3 weeks ago) <Kyle Sunderland>
* 361aec0 - Improve AV1 support. (7 weeks ago) <Tom Finegan>
* 055a84d - muxer_sample: Fix VP9 profile/level handling. (7 weeks ago) <Tom Finegan>


Bug: 859604
Change-Id: I40c5adefc571335a27b569e9aa40b2cadbf59e4e
Reviewed-on: https://chromium-review.googlesource.com/c/1274565
Reviewed-by: Tom Finegan <tomfinegan@chromium.org>
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#598567}
[modify] https://crrev.com/de34e70c29e24daaed6df41d93191ec5613327b7/DEPS

A few points for the broader topic then:
- WebmMuxer should probably be renamed to MkvMuxer then.
- We won't add src=video support for this format.
- How sure are you that this actually works? Your prototype CL is just writing out our in-memory representation of YUV420, not some specified YUV420 format. I.e. AllocationSize() has a bunch of per Chrome specific tweaks which are not codified anywhere.
Also why? Clients can already get this via WebGL/Canvas? What's the value in recording it to a file?
Thanks for the comments #8 #9, 
> 1- WebmMuxer should probably be renamed to MkvMuxer then.

 Ok. A bit outdated now.

> 2- We won't add src=video support for this format.

 Ok.

> 3- How sure are you that this actually works? Your prototype CL is just writing out our in-memory representation of YUV420, not some specified YUV420 format. I.e. AllocationSize() has a bunch of per Chrome specific tweaks which are not codified anywhere.

The prototype CL is supposed to showcase how it's working, it's not a 
landable piece of code.  The vast majority of MediaStreamTrack VideoFrames
are going to be YUV 420, fully planar, in CPU memory, so the contents can be
perfectly parsed and are compatible with libyuv::I420Copy() (and these
frames are IsMappable(), they are  not e.g. <video> VideoFrames). What would 
be your concern then?

> 4- Also why? Clients can already get this via WebGL/Canvas? What's the value in recording it to a file?

Several reasons. When we get a MediaStream connected to a Canvas (2d/3d), we
lose the timing, because the process of blitting from one to another has to
happen via explicit JS intervention. The developer doesn't know if the MSTrack
is producing 30 fps, 60fps or other, and has to guess. Second, retrieving
data from a canvas context is, in general, costly, involving in many cases
readbacks from gpu/cc structures. Third, MediaRecorder doesn't need to
record to a file, although you can do that too if you want; JS can process
the "recorded" data, or send it remotely, as is produced. Summing up, MR
is faster to produce raw data to JS than the alternatives.

Cc: kainino@chromium.org
For 3, VideoFrame::AllocationSize changes the actual w/h recommendations, these are burned into the VideoFrame, and I420Copy picks them up from the parameters you provide to it, but I'm not confident that maps directly to what another reader is expecting. I.e. I suspect an odd sized video will not work with this method. Why not use something like VP9/H264 lossless which has a well formed packetization?

For 4, you're still paying all those prices and suffering that same problems? At least your prototype CL doesn't seem to help with them. IIRC there's another bug for MediaRecorder where it's dropping frames under main thread contention -- so I don't think this resolves any of the costs. It probably makes them worse in regards to memory usage.
https://cs.chromium.org/chromium/src/content/renderer/media_recorder/video_track_recorder.cc?l=263

If someone wants to retrieve the frames and send them off I think there are better ways to do that. There is work underway to expand the information available via WebGL via something like video.requestAnimationFrame(), which would provide the timestamp and other information. +kainino for more details.

Blockedon: 639174
Linking to issue 639174 about getting the metadata for the frame uploaded to a WebGL texture (and avoiding redundant uploads).

I haven't done any more work in this area since we last talked about having some kind of "frame latching"/VideoFrame concept in the web platform, but we're still interested.
Re #11 and #12, webgl is not the problem here; let me explain,
let's say we want to read the pixels from a video MediaStreamTrack,
we need the following connections:

MediaStreamTrack --<1>--> <video> --<2>--> <canvas>

and the only way to blit <video> to <canvas> is via a timed copy
similar to this function:

function renderVideoFrame(canvas, video) {
  const ctx = canvas.getContext('2d');
  ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
  setTimeout(() => renderVideoFrame(canvas, video));
}

the crucial point here is noticing that there's no way to get
either render events from <video> (when it is fed from a MSTrack),
so the user has to guess the appropriate drawImage() period.
this was already described in #10, response to 4; but I hope is
clearer now.  Expanding the user's view of requestAnimationFrame()
would not help.

It's crucial to realize that a <video> element has different 
implementations if it's fed by MediaStreamTrack(s) than if it's
e.g. playing a url or a SourceBuffer.

All this is assuming there's no direct way to feed a MediaStreamTrack
into a <canvas> (2d or webGL), that I'm unaware of, kainino, WDYT?
I don't know of any way but the drawImage way. Can you talk a bit more on what the actual goal of doing that is though? Do they want frame accurate background samples, sync video -> canvas, or something else?

What if we had something like video.requestAnimationFrame() that updated every time a new frame was presented to the video element and, most importantly, either provided a ref to that frame or ensured a drawImage() call within the scope of the rAF would paint that frame.

That said, even if we provide this, the frame on the video and what's on the canvas would/could get out of sync since the rendering into the <video> tag will happen on compositor thread and these callbacks by necessity would be delivered on the render thread. So several may queue while the video continues updating.

Unless we allow JS to inject on the composition thread (and potentially trigger a relayout) I don't think we'll ever solve that out of sync issue though. 
#14: sure, the idea is not to need:

  MediaStreamTrack --<1>--> <video> --<2>--> <canvas> --<3>--> pseudo-raw pixels in JS

but instead we want to enable:

  MediaStreamTrack --<4>--> MediaRecorder --<5>--> raw pixels in JS

the final goal is for the JS to have access to the pixels of the
video MediaStreamTrack, without having to retort to <video>s or
<canvas>, and skipping any potential color transformations that
might have been done by either of those elements.

Moreover, skipping <video> and <canvas> is (way) faster:

- because canvas (2d) usually gives the data to Skia, who might
introduce various penalties to retrieve those pixels back.
- because Media Recorder can be instructed to not encode the video, 
hence being way faster. Also, encoding vp9/h264 even at a larger
target bitrate would make some image processing algorithms (that
would run in JS) impossible, because the video frames statistics
would be changed.

Summing up: faster, simpler and we give JS access to the untreated
pixels. We want to take <video> and <canvas> out of the equation.

Also note that this Issue doesn't introduce any new vulnerability
surface, e.g. we don't allow any new access, we're just making faster
what can be done right now.


You're still mostly describing the technical details rather than the use case. I.e., do you know what folks want to do with those pixels? It's important to understand that use case given the threading issues I've mentioned above. Do they care if it's RGB or YUV, or whatever?

Do folks even want MediaRecorder in this pathway? I.e., what about something like the WebAudio Worklet for MediaStreamTracks instead:
https://developers.google.com/web/updates/2017/12/audio-worklet

FWIW, I'm not against uncompressed video/audio support for MediaRecorder -- I'm just not sure you're actually solving the problem and I'm worried that exposing our internal YUV format (and associated internal quirks) without layout details is going to be broken in weird ways. In all other areas we only provide JS with RGB pixels; i.e., that's what getImageData() returns.
The use case is to access the pixels from Js with the idea
of extracting metrics or running image/video algorithms, e.g.
optical flow, or object detection algorithms. Using any
codec, even at high bitrates, destroys the image/video
statistics. 

Spec wise we have explored other options, in particular Streams [1]
(which are different from MediaStreams, confusing, huh?), but all 
of them include landing a significant amount of code and/or new 
web-exposed interfaces, which is generally frowned upon by the
W3C. It's much easier to use a existing API.

If the thing that bugs you is that "YUV" is not define well 
enough to be interoperable etc, we could just send RGBx, 4bpp;
that would make it indeed less surprising vis-a-vis what the users
have now from CanvasRenderingContext2D.getImageData().

[1] https://streams.spec.whatwg.org/
Hi Miguel and Dale,

First, thanks for working towards making decoded videos available for video processing and recording in JS. I'm an engineer on Yeti / Project Stream - we're streaming games through Chrome (and other endpoints), which currently means that we stream video through the <video> tag & WebRTC / PeerConnection APIs.

I'm working on an SDK to help game developers profile / debug / test their game in the Chrome environment. In Q4, the features that we are looking to enable that are related to this bug are:

1) Recording a video exactly as the player experienced it (through software vs. hardware capture cards). The videos will be used by game developers to identify video compression artifacts at varying bitrates (e.g. 5mbps vs. 15mbps); the AI on the game developer is then to modify their graphics settings to be more encoder friendly (in addition to encoder improvements that Yeti will do on the platform). We want to make sure that any encoding artifacts that are recorded are representative of those experienced on the Yeti platform.
2) Tracking motion vectors on decoded video (e.g. by running OpenCV). Currently, OpenCV takes in a video element to retrieve the frame -- I'm not sure if there is additional work required for OpenCV.js to use the performant way of retrieving the decoded frame.

(Please feel free to take a look at a small doc I put together with more details: go/yeti-game-dev-tools-q4-synapse)

As I understand this bug, removing the extra re-encode to remove additional compression artifacts and ensuring the frames would not be dropped in the recording would be a direct improvement for (1) and a performance / precision improvement for (2).
New web-exposed interfaces are just fine if we think it's the right way to do something. In fact, I was just talking with someone from Apple at FOMS and they seemed interested in finding some way to deliver per-frame callbacks as well from streams. So I think there's appetite for doing this in the most sane way.

If you go this route you should definitely standardize the raw format that will be output, be it RGBx or whatever so that MediaRecorder in other browsers works the same way. Otherwise, it's definitely the wrong thing to duct tape something web-facing together via a less standardized portion of an API just because we don't want to try and go through the standards process. This has the same outcome as launching a Chrome only feature, which is not what we want to do. 

@katherinewu, delivering 1080p60 uncompressed frames that you handle on the main thread (or even in a worker) for some sort of recording is definitely going to cause even more frames to be dropped due to contention on the render thread. There's just nothing cheap you can do with that amount of data. The most efficient thing to do for the recording case would be to find some way to get every frame to a lossless hardware encoder without touching the main thread; h264 lossless is common on recent nvidia cards https://developer.nvidia.com/video-encode-decode-gpu-support-matrix

For the second case, I think you can already do this today w/o MediaRecorder using WebGL. With offscreen <video> when you connect it to WebGL, you will end up being the arbiter of video rendering. Doing this using rAF + texImage2D => render => post(readPixels()) to get the frame data for OpenCV should work fairly well (kainino@'s proposal should make this better). This is no worse than what MediaRecorder can do IIRC. You would just sample on the main thread and have access to whatever showed up. 

To do better than that, we'd really need to rearchitect and likely standardize some new path connecting the MediaStreamTrack directly to WebGL.
1) Ah I think there might be one misunderstanding. While the bug title says "Add uncompressed video recording option", I'm actually interpreting it to mean "Add decoded video recording option". We already use H264 to compress our game video stream that we send through WebRTC -- our max bitrate is 25mbps at 1080p. We just don't want the H264 stream to be decoded and then re-encoded / re-decoded after the video is downloaded from MediaRecorder, which is my understanding of the current functionality.

2) I see, thank you! You're correct that we don't depend on MediaRecorder currently (can see some of the logic cl/216044857). I'll have to look more at WebGL / do more profiling to see if we need to replace the VideoCapturer class in the CL.
For 1) now I'm even more confused :) It sounds like you just want to record the output of the screen losslessly to a stream / file. It doesn't sound like you want to parse any of pixel data. For that purpose using a lossless hardware encoder would be most efficient and compatible with the existing spec.
Correct, we are not parsing pixel data for use-case 1). We are using an AMD encoder with the H264 codec to send data from one machine (the VM where the game is running, e.g. in SFO) to another machine (the player's machine, e.g. in Oakland). We already have a way to capture the stream on the VM (e.g. in SFO). We want to capture the stream as experienced by the player (e.g. in Oakland), which is the encoded stream as impacted by the network (e.g. after some amount of packet loss, with changing download bandwidth).
Some of the use cases are described in a WebRTC NV use cases document:

https://w3c.github.io/webrtc-nv-use-cases/

In particular the augumented reality and machine learning use cases.
This document is going to be discussed next week at TPAC.

Sorry still confused, for 1) do you want to capture the encoded stream before decode?
we should probably try to make the game-recording-at-player use case into a case for that document - it seems like a different use case from the ones already described.

I'm mixing two feature requests up (my bad). We would like to use the decoded stream for recording (use-case #1) -- i.e. this bug :)

I do have a separate request for getting the encoded stream (same SFO / Oakland situation mentioned above) dumped to local disk so that we have the H264 headers available to us. It only exists in an email thread right now.
Sounds good Harald! Let me know how I could start to help.
I believe case 1 is
<----------machine A ----------->      <------------------ machine B ----------------->
 feed --> encode --> WebRTC ----------> webRTC --> decode --<A>--> MediaRecorder --> 👍

This is the most efficient way to get the pixels in <A> with the appropriate timing
information, which is necessary a per #22 "as experienced by the player". The 
alternatives of rendering <A> to a <video> and then to a 2D/3D <canvas> would have
higher latency, consume more CPU and crucially would lose the timing information.

Gotcha for the recording use case #1 you mentioned, I think modifying MediaRecorder to allow h.264/vp9 lossless is probably the most effective solution. Those will be too slow in software, but a hardware encoder should be fine. It'll still be huge though, i.e., for 1080p60 it's ~1GB/minute (compared to ~22GB/minute uncompressed)

@mcasas: Does MediaRecorder get frames in parallel to the WebMediaPlayerMS or in sequence prior to/after? I.e. is it possible to end up in a situation where MediaRecorder receives frames the WebMediaPlayerMS never renders? That would end up hurting use case #1. 
Re #19, I'm suprised how can a "rAF + texImage2D => render => post(readPixels())"
be performant in any way, texImage2D + drawArrays + readPixels() is one of the
slowest things you can do in GL ...? kainino@ is ththereis a WebGL fast path ...?
@c#28 -- are you talking about case #2? Per c#22, case #1 does not want the pixel data.
@30, I didn't say it wasn't that :) It's the exact same thing you'd be doing in MediaRecorder though, so I don't think it's any slower -- and in fact can be much faster if the OpenCV.js work can be done in a shader.
Re #32 - if a MediaRecorder is recording from a MediaStreamTrack, there is a point in the pipeline where the bytes are present in a video buffer before being turned into pixels - that's the point I think mcasas expects to be recording from; the drawArrays + readPixels part of the pipeline would not be invoked at all.

@hta, that's not true for hardware decoded frames only software.
#34 We still have the code in [1] to use PaintCanvasVideoRenderer to recover
the pixels from whatever texture they are in, in the cases you seem to be
pointing at.  This code path is still much faster than WebGL because we
are using an explicit Skia rendering API (in all likelihood using its own
GL context, and in the future a super duper fast Vulkan context), whereas 
WebGL is a) not expecting that anyone is going to read back the rastered
pixels, which introduces a deep flush in its rendering pipeline (big 
penalty) and b) WebGL draw calls are clocked to VSync for obvious reasons,
and this would introduce even more latency (destroying the rendered time
that we want to preserve); note that an offline render-to-texture contexts 
like the Skia I mentioned don't have this limitation. kainino@ to keep
me real ;-)

[1] https://cs.chromium.org/chromium/src/content/renderer/media_recorder/video_track_recorder.cc?q=videotrackrecorder&sq=package:chromium&dr=CSs&l=280

in general we are shying away from hardware decoding in WebRTC, but anyway
what we do in 
In WebGL 2.0 it's possible to use a fenceSync to wait for the GPU to finish, then read the result back. However a readPixels still requires a roundtrip to the GPU process and a fetch from the GPU. We recently added an optimization to allow getBufferSubData to work WITHOUT that round-trip. Take a look at the "non-normative" text on getBufferSubData (3.7.3) and readPixels (3.7.10) here:

https://www.khronos.org/registry/webgl/specs/latest/2.0/

However if you use this path there's still:
- copy from video frame into WebGL RGB texture (decode, plus at least one extra copy)
- copy from WebGL texture into WebGL buffer (readPixels with PIXEL_PACK_BUFFER)
- gpu process waits asynchronously for all that to finish
- copy from WebGL buffer to gpu-renderer shmem (implicit in optimization)
- renderer waits asynchronously for gpu process to pass that point
- copy from gpuprocess-renderer shmem to getBufferSubData's dstBuffer
Eh, it's not that slow these days if it's hardware decoded video. You'll get no arguments from me on speeding this up / fixing or creating new APIs that do better. I'm sure kainino@ would love help in making this faster :)

http://xorax.sea/webgl/simple.html?src=../watk_vid/buck1080_h264_60.mp4 runs real time video (~2ms/frame on Linux/Win, ~0.3ms/frame on Mac) => canvas (based on http://codeflow.org/issues/slow_video_to_texture/simple.html). It's much slower on Windows for software decode w/o GPUMemoryBuffers.

I think for a debugging tool that's probably sufficient? 
Dale, a couple of things

a) MediaRecorder is a cross-platform API that represents the preferred way
of getting offline access to real time media on the Web. It's implemented
by Chrome and Firefox, and it's in the making by Safari. The alternative you
propose is complicated and confusing: needs not one, but two unrelated 
elements (a <video> and a <canvas>) to do the same job. It also depends on 
implementation details like the type of canvas (WebGL 2.0) and how performant
these unrelated elements are to the goal. This would make it hard to explain
to the user. Moving some or all of these elements to offscreen would only
make things harder to explain and debug.

b) Developing a whole new API as you mention, would incur the friction of 
the Web, because we'd need to answer the question of "who wants this so
badly??". The Web cannot grow senselessly, and we are very careful of landing
APIs that are never or rarely used, because then they are hard to remove.

 Nonetheless, a while ago I proposed a proposal to integrate the WhatWG
Streams API with the W3C MediaStreams, which will cover the presently discussed
use case 100%, see
 https://discourse.wicg.io/t/rfc-proposal-for-integration-streams-mediastreamtrack-api/2256
but it did not get much traction. I'd be supportive (re #19, #37) of your 
proposal, do you have any draft we could discuss?

c) VLC, Gstreamer and FFMpeg can play the produced Matroska with uncompressed
video, which means that this is an existing combination, although perhaps is
a novelty to the <video> element.

IOW I think we have a use case, a strong Web background and a straightforward
technical solution, but none of this convinces you...? 
Sorry I think there's some confusion. I'm not opposed to a uncompressed video option; see c#16. I think your streams proposal lgtm -- though I suspect it will have  the same limitations I mention below for high bandwidth content. Have you reached out to Apple or Firefox about it? I'm happy to help with contacts.

My recommendations based on the use cases discussed above, in order of importance, are the following:
1. Optimize out the main thread hop for MediaRecorder retrieving hardware frames.
2. Add support for configuring h264 lossless in MediaRecorder if the hardware encoder supports it.
3. Further optimize the video -> WebGL path. Firefox is ahead of Chrome here. See issue 91208 and others linked from http://codeflow.org/issues/slow_video_to_texture/
4. Standardize the WebGL video texture, https://www.khronos.org/registry/webgl/extensions/proposals/WEBGL_video_texture/ for frame accurate textures.
5. Standardize a raw format *without* a container that MediaRecorder can stream out.

Specifically #1,#2,#3 solve all of katherinewu@'s use cases. #4 would improve the accuracy if that's necessary.

#5 would not be for recording to a file (#2 should be used for that), but only working with the pixel data (ideally from low bandwidth <60fps, <1080p) in JS. Having it muxed solves none of the use cases discussed above. E.g., you say that the WebGL way is confusing, but bundling libwebm wouldn't be? :) Further, for higher bandwidth, e.g., gigabytes/second, having the JS demux the raw frames out of the WebM container is a huge amount of overhead.

Ultimately, I don't think anything other than WebGL is viable for working with real-time raw 60fps material or >= 1080p resolution. The bandwidth is too high. I don't even think OpenCV should be reading the pixel data at all, but rather working in a shader. https://github.com/jamt9000/webcv looks like they are attempting this. For software decoded codecs that can avoid the main thread and are not yet uploaded to the GPU, I can see the MediaRecorder solution being better than WebGL -- but that situation is rare outside of YouTube.

Sign in to add a comment