New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 749973 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Consider using DMABuf-backed VideoCaptureBufferTrackers

Project Member Reported by mcasas@chromium.org, Jul 28 2017

Issue description

VideoCaptureBufferPool [1] is used by video capture to allocate buffers
for VideoFrame transport across processes. Currently it only uses 
VideoCaptureBufferHandles [2] that are SharedMemory-backed.

In some devices, IIUC, we can:
- create DMABufs from user space (with specific syntax)
- map these DMABufs with a similar (same?) syntax as ShMem
- share these DMABufs across processes with a similar (same?)
 syntax as ShMem.

=> IOW, in these devices, DMABufs are allocated specifically, but after
that point they behave like ShMems.

_IF_ DMABufs are more performant than standard ShMem buffers for video
capture loads, it would be interesting to use them !




[1] https://cs.chromium.org/chromium/src/media/capture/video/video_capture_buffer_pool.h?sq=package:chromium&dr=CSs&l=39
[2] https://cs.chromium.org/chromium/src/media/capture/video/video_capture_buffer_handle.h?sq=package:chromium&dr=CSs
[3] https://cs.chromium.org/chromium/src/media/capture/video/shared_memory_buffer_handle.h?sq=package:chromium&dr=CSs&l=31
 

Comment 1 by mcasas@chromium.org, Jul 28 2017

Cc: emir...@chromium.org chfremer@chromium.org tfiga@chromium.org
tfiga@, dongseong.hwang@ y'all know more than me about these buffers
is there some improvement to be found by using them?

+ chfremer@ and emircan@ FYI.


Comment 2 by tfiga@chromium.org, Jul 28 2017

If we use DMA-bufs and given capture device (camera) can produce a pixel format supported by Chrome (or Chrome can be made support such format), then we can avoid a memcpy between the buffers used by camera and shmem buffers used by Chrome. This would obviously cut memory bandwidth and CPU usage significantly.

Comment 3 by tfiga@chromium.org, Jul 28 2017

Cc: jcliang@chromium.org posciak@chromium.org

Comment 4 by mcasas@chromium.org, Jul 28 2017

#2: sparing a copy is good of course, but the idea here was to compare
ShMem versus DMABuf for the particular workloads talked about before.

Also, this issue is _not_ about using DMABufs -- which was explored in 
the past, extensively, I should say :-)
I have a bug to use DMA-bufs on Chrome OS VideoCaptureDevice.

Even if the camera does not produce I420 (the preferred Chrome pixel format), using DMA-bufs is still preferred over shmem because DMA-bufs generally play well with kernel drivers, while with shmem we need to import the buffer to the kernel drivers as user pointers which may have subtle alignment issues.
To be precise, I intend to use DMA-bufs in ChromeOS VideoCaptureDevice to get the frames from the camera drivers to Chrome. In current Chrome architecture we will need to memcpy the frames from the DMA-bufs to SharedMemory buffers, and inside Chrome we'll still be using ShareMemory handles.

Comment 7 by tfiga@chromium.org, Jul 28 2017

#4, well both ShMem and DMABuf are just a way to refer to system memory. There is no benefit for current Chrome capture stack from just changing the way memory is allocated. The benefit is in how the memory can be used, i.e. directly by hardware without the need to copy.

Actually accessing a DMABuf from CPU on certain platforms might be slower than ShMem, because an uncached mapping might be provided to guarantee coherency with hardware. The whole point is that these days almost everything can be done directly by dedicated hardware (e.g. GPU), so there should be little need to access such memory by CPU.

Given how I imagine the typical capture pipeline, we would save at least two memory copies, because the same DMABuf could be used as an input for video encoder and GPU compositor at the same time.
> Actually accessing a DMABuf from CPU on certain platforms might be slower than ShMem, because an uncached mapping might be provided to guarantee coherency with hardware. The whole point is that these days almost everything can be done directly by dedicated hardware (e.g. GPU), so there should be little need to access such memory by CPU.

This is exactly what we saw in the earlier experiments as well. See issue 440843(dd on #32) for all the patches that laid the groundwork. We were able to pass GMB backed buffers all the way to HW encode if the flag was turned on. If WebRTC decides to use a SW codec or copies/scales/simulcast the input to Shmem buffers for any reason, we lose most of the benefit. 
You can see the latest patches where I cleaned up this code due to this ongoing mojo project. Since mojo project is now nearly landed, we can bring this back as described in the first post, i.e. support other buffer types in capture: DMABuf, IOSurafece etc. If we can have a communication between renderer and capture service that can switch this dynamically based on HW usage, then we would have a clear benefit everywhere.


Comment 9 by tfiga@chromium.org, Jul 28 2017

Note that the allocator providing uncached memory is not a hardware
limitation, but rather allocator design decision in all cases known to me.

We are actually considering allowing cached allocations on a per-allocation
and/or per-mapping basis, because there are some cases where CPU access is
still needed, such as
- chrome single copy rendering (CPU renders to DMAbufs, GPU copies from
DMAbufs to internal staging buffers),
- badly designed hardware (or software), which needs a format conversion or
other intermediate processing between hardware parts of the pipeline.

With this, we might not even need any dynamic switching.

2017/07/29 2:45 "emircan via monorail" <monorail+v2.879856812@chromium.org>:
Owner: tfiga@chromium.org
Status: Assigned (was: Unconfirmed)
friendly ping, this bug is still in unconfirmed status. 
as per c#9, this bug potentially can be resolved by some feature implementation.
assign bug to tfiga@, please feel free to re-assign appropriately.
Owner: jcliang@chromium.org
Not sure what are next steps here. The only sure thing is that we need to start supporting DMA-bufs in capture path if we expect to have performance and power-efficiency of anything involving camera anywhere close to other platforms. Ricky, is this something you are already looking at?
FYI, I'm designing a Chrome OS video capture improvement plan that includes both video encode and camera capture parts of the stack. A design doc is being created here: go/cros-vide0Capture .

I mostly untangled the encode side and trying to get more understanding of what's going on inside camera stack and webrtc/hangouts. emircan@, mcasas@ would you be able to point me to relevant code, especially from the perspective of what was mentioned in #8?
chfremer@ has a few nice docs/diagrams of how the capture classes mix
together now with the VideoCapture Service; from the Renderer perspective 
all is encapsulated in VideoFrames.

Sign in to add a comment