New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 867368 link

Starred by 1 user

Issue metadata

Status: Fixed
Closed: Aug 23
EstimatedDays: ----
NextAction: 2018-08-23
OS: Android
Pri: 2
Type: Bug

Blocked on:
issue 872551
issue 875374

issue 912842

Sign in to add a comment

texImage2D from getUserMedia video much slower than H.264 video file on Android

Reported by, Jul 25

Issue description

Steps to reproduce the problem:
I've uploaded the test case to here for easier testing:

When playing a 720p MP4 on Chrome Android 67, I get around 2ms for the texImage2D call on my Samsung Galaxy S6. When switching to getUserMedia with 720p constraints, upload time jumps to over 18ms.

On Desktop OS X Chrome gives timing for both cases (roughly 0.6ms from video, and 0.5 from user media).

What is the expected behavior?

What went wrong?
I suspect Chrome is using some slow path internally for user media videos, whereas has a GPU-only path for the mp4 case.

Did this work before? N/A 

Does this work in other browsers? N/A

Chrome version: 67.0.3396.87  Channel: n/a
OS Version: 
Flash Version: 

Some additional background - now getUserMedia, WebGL and WebAssembly are supported on both Android and iOS, I'm investigating whether the web could be a suitable runtime platform for our Augmented Reality platform.

Our image tracking algorithms are relatively efficient and can run at real-time on mobile via WebAssembly, but they are designed for greyscale-only images. In a native app we get YUV from the camera and can process the Y plane directly, whereas on the web as far as I can tell RGB is the only option.

I was hoping texImage2D from a <video> would provide a hardware-accelerated and relatively low-overhead route to getting the latest frame into a texture, which would allow greyscale conversion (and half-sampling) to happen in a shader. I'd be happy to wait a frame before reading-back the data on the WebAssembly side, to minimise the GPU-CPU sync point of the readPixels [and with WebGL 2 fence sync objects and using a pixel buffer object for the readPixels would also be something I'd look into].

Unfortunately the overhead of the texImage2D call with user media appears just too high for this to be a viable route. I'll have to do some tests with the other read-back methods (drawImage to a 2d canvas followed by getImageData, [which feels like a very high-overhead method but is perhaps specifically optimised] or the Image Capture grabFrame() that should be supported by Chrome). Either way the greyscale conversion will be a tradeoff between the lower performance of WebAssembly without SIMD vs overheads, memory bandwidth and CPU/GPU sync points of just uploading the full res RGB image as an ArrayBuffer from the JS side.
2.0 MB Download
Requesting 360p rather than 720p does offer more usable performance - around 7-8ms for the upload (so not fully 4x faster despite a 4x reduction in pixel count, but over a 2x improvement).

Would still be nice to get speeds closer to that from video files. would also be nice to know if the frame has changed since the last upload to avoid doing pointless work but I can't see any web APIs to access that data.
Labels: Needs-triage-Mobile
Labels: Target-70 M-70 Triaged-Mobile FoundIn-70
Status: Untriaged (was: Unconfirmed)
Tested this issue on Android and able to reproduce this issue.

Steps Followed:
1. Launched chrome, navigated to
2. Clicked on play and observed upload time as 0.57(max)
3. Now clicked on getUserMedia, clicked allow and observed upload time as 4.05(max) 

In mac seeing 0.3 to 0.5 on selecting .mp4/getusermedia -- no difference observed

Chrome versions tested:
 60.0.3072.0, 67.0.3396.87(stable) ; 70.0.3501.0(canary)

Android 9.0

Android Devices:
Pixel 2 XL

This seems to be a Non-Regression issue as same behavior is seen from M-60 builds. Leaving the issue as Untriaged  for further input's on this issue.

Please navigate to below link for log's --
Kai, could you build on Android and see which code path is being taken for texImage2D in this case?

Status: Started (was: Untriaged)
Nexus 6P, 8.1.0, Canary 70.0.3503.0: 2.7ms -> 10ms
Pixel 1, 9.0.0 (PPP5), Canary 80.0.3503.0: 2.2ms -> 4.2ms

It's interesting that the difference is so large on 6P and Pixel 2 but not quite as huge on Pixel 1. But regardless, it seems to reproduce on both of these.
Oh yeah, forgot to say:

Re: comment #2, you can star issue 639174. If you want to try out what we've prototyped there (warning: it won't ship as is!) you can enable chrome://flags/#enable-experimental-canvas-features and you should be able to access the last-uploaded frame's metadata via these properties of the WebGLTexture object:
If you have any feedback on whether that works or doesn't work for you, please comment on that issue.
Sorry, that's not the right flag. You need to start Chrome with --enable-blink-features=ExtraWebGLVideoTextureMetadata
Thanks for the notes on avoiding re-processing; have starred that issue and added a couple of comments there.

I've taken a quick look at Chrome's android camera implementations (assuming I've found the right place?):

I notice you're using the CPU callbacks for both the legacy camera and the Camera2 APIs. Is that due to the multi-process architecture in Chrome that means it's not possible to use a SurfaceTexture there? Looks like there's some CPU-side colour conversion too which is going to add some overhead. Is this the same with video files or do they manage somehow to end up as SurfaceTextures throughout?
Thanks for the pointer, I haven't investigated that area yet. What I have determined so far is basically summarized by the attached screenshot. It looks like time is going into a CPU-side color/format-conversion (I420 to RGB) and subsequent upload to the GPU. Indeed, I don't know why the data starts out on the CPU, but it seems you've found some insight on that.
71.0 KB View Download
What appears to be happening here is that the I420 video frames come in on the browser process, and are shared with the renderer process via shared memory. The renderer then does a CPU I420-to-RGB decode, uploads the result to a texture, and then does a GPU-GPU copy from that texture to the target WebGL texture.

The first possible solution is to upload the YUV planes into GPU resources first, and then do a GPU decode. Optimally:
- The data is uploaded directly from shared memory in the GPU process, avoiding an extra copy into a transfer buffer.
- The GPU decode is done directly into the WebGL texture.

The second solution would be to have the video frames start out as textures rather than CPU memory. As #9 pointed out, it should be possible to get video frames as GL textures directly from Camera2. However this could be a big project since the camera would have to be accessed from the GPU process instead of the browser process. AFAIK this is how However, I do think this would be the ideal solution.

+chfremer, mcasas Do you have any input on this? I don't know for sure that something isn't going wrong - maybe we're not supposed to be hitting the shared memory path. Would it be feasible to get camera frames as textures in the GPU process? Would be happy to VC to better understand the situation here - let me know.
Adding support for capturing directly into textures is definitely something we are interested in. And you are right, this would probably a bigger effort. 

A question you already raised is if the camera would have to be accessed from the GPU process or if we could get away with allocating textures from the GPU process and asking the Camera2 API to fill them for us (from a different process). Another issue that needs to be addressed is that the Chromium video capture stack does not currently have signals or logic for deciding if we want to capture into CPU memory, GpuMemoryBuffer, or GL textures.
FYI, the latest trace:
79.9 KB View Download
Thanks Christian, sounds like it would be a pretty big project. I think I'll keep looking into the first solution - I think it should be pretty tractable (I have most of a prototype already - but don't know if it works).
A GPU conversion path for texImage2d from I420 CPU data does make sense and should offer a significant win.

I feel the SurfaceTexture route is a better long term solution though, and it seems like some of the plumbing is already in place for hardware video decode. I was doing quite a bit of reading around Chrome architecture docs and source at the weekend.

See for example this page:

There's a content::StreamTextureProxyImpl on Android that appears to plumb SurfaceTexture between the required layers. I also read that on android there isn't a separate GPU process, but the threads for the GPU process are run in the Browser process. Not sure where the capture stuff runs though?

#12 mentioned Camera2 can use a SurfaceTexture for output, but just for clarity this is also possible in the old camera API, and you're already setting a "dummy" one (either a SurfaceTexture or SurfaceView is required for the old API):

One other advantage of SurfaceTexture output with the old camera API - you can also get timestamps, which are not accessible from the CPU callback route.

I'd expect the path to WebGL to be significantly faster if the camera frames are directly sent to SurfaceTexture. Would be interesting to see a trace of the texImage2d from the file source, just to see the difference.

In terms of other code paths, I suspect the compositor and Accelerated 2D canvas would both prefer the data in a SurfaceTexture.

I don't know the WebRTC spec well so not sure what is required there - but if Android hardware encoding of the video is used, there is definitely an interface to pump data in via a SurfaceTexture there too which should avoid any need for CPU read-back of the pixels.

The only read-back I can think of where CPU data is definitely required is grabFrame() from the MediaStream Image Capture spec. Even then YUV -> RGB is still required, so CPU data combined with CPU color conversion won't necessarily beat SurfaceTexture data, GPU color conversion, and read-back from the GPU.
Just to chime in, in all likelihood Android captures video on some sort of
GPU-side buffer, that we then proceed to download to CPU [1]. That's not
good for performance. This frame is then copied again [2] (but not a third 
time, phew [3]).  So we get a GPU-friendly buffers capture, which we then
proceed to download and copy to Shared Memory, only to then reupload to GPU
for display (usually we encode it as well).  

simon@: please keep in mind also - migrating to use 
SurfaceLayer ISO VideoLayer.  The bug does not apply to MediaStreams (live
feeds) but a similar concept will be applied to WebMediaPlayerMS soon


Even without the major improvement of keeping camera data on the GPU, we get a huge win here by moving the YUV decode to the GPU, even with the extraneous copies.

It would seem that the slow code path here (CanvasResourceProvider+PaintCurrentFrame+StaticBitmapImage::CopyToTexture), which was supposed to accelerate some video upload cases, may no longer be serving its purpose. It was added in , before the CopyVideoTextureToPlatformTexture path existed at all. I am pretty sure that nowadays, the CanvasResourceProvider path is only ever hitting the non-accelerated case, and therefore isn't providing any benefit over the last fallback case (VideoFrameToImage+TexImageImpl, which is actually a CanvasResourceProvider+PaintCurrentFrame+WebGLRenderingContextBase::TexImageImpl).

I'm looking into whether we can remove that path, which would be nice, and add something like the one I prototyped (which does work, by the way).
chfremer/mcasas: I have a WIP CL here:

However I'm still struggling to figure out how to test it. Do you have pixel tests, layout tests, or anything else which are able to exercise this kind of media stream case (where video frames come in STORAGE_SHMEM)?
Blocking: 871417
+ emircan@

I don't know what test coverage there is for webmediaplayer_ms.
#19: any local video capture and playback would exercise the WMPMS using the ShMem 
storage (because it's the default, unless in code paths/platforms where ermican@ 
might have connected the GpuMemoryBufferVideoFramePool).
LayoutTests probably don't exercise this Renderer code, but content_browsertests 
and/or browser_tests starting with WebRtcGetUserMedia should do it (they draw 
whatever is produced by a FakeVideoCaptureDevice, you should be able to repro
those running Chrome with 
--use-fake-device-for-media-stream --use-fake-ui-for-media-stream
(the second one is to avoid asking for permission)
e.g. with the site
mcasas@ thanks for the pointers. Can we ask for some more pointers to what those flags can do? It looks like --use-fake-device-for-media-stream is mainly used with browser tests which mock out the video capture device like .

Is it possible (for example) to give the browser a video file and tell it to treat it as though it were input from a camera? Or what else can --use-fake-device-for-media-stream do?

--use-fake-device-for-media-stream replaces any webcams or capture devices
in the system with a Chromium one that looks like a rolling pacman with a
timer (and it also produces a beep every second).

--use-file-for-fake-video-capture=bla.y4m can be used to replace the system
capture devices with a file (that plays in a loop, forever). The file format 
accepted is a subset of the Y4M container [0] , which is essentially
uncompressed video frames in I420 format. Any of the 420 files in e.g. [1] 
should work (but not the 422 nor 444).

Fake or file, however, the browser converts any incoming data to I420
triplanar (or fully planar, depending on your terminology) before sending
the VideoFrames to the Renderer, so that's all the latter sees anyway. (*)

(*) Not entirely true: depth capture produces and sends Y16, but it's
a smaller use case and not supported by the file-... ; there's a way to
configure the --use-fake... for producing depth-like frames, let me know
if you guys want that path.

Blockedon: 872551
Project Member

Comment 26 by, Aug 10

The following revision refers to this bug:

commit 1236a1623e8f1eb638e33d3d145739fe1ca081d4
Author: Kai Ninomiya <>
Date: Fri Aug 10 20:50:39 2018

Remove qualcomm from ToughWebglPage skipped_gpus

Historically, we haven't been running these WebGL perf tests
on our Android perf bots (which are all Qualcomm devices).
This change should hopefully allow us to start tracking WebGL
perf on mobile.

Bug:  867368 
Change-Id: I8e409d649f6238094928dfdac4cc6f7d2c444ca5
Reviewed-by: Kenneth Russell <>
Reviewed-by: Ned Nguyen <>
Commit-Queue: Kai Ninomiya <>
Cr-Commit-Position: refs/heads/master@{#582326}

Project Member

Comment 27 by, Aug 11

The following revision refers to this bug:

commit 074cecb79d9d30ec9f2bf6b5b10edb6cd77a8b4f
Author: Kai Ninomiya <>
Date: Sat Aug 11 01:26:13 2018

Normalize rendering_desktop.json, rendering_mobile.json

This is needed because re-recording archives (using record_wpr)
causes these files to get re-alphabetized automatically.
This commit lets us re-record an archive without producing an
unreadable diff.

Doing this also removes the duplicate key
"androidpolice_mobile_sync_scroll_2018" from rendering_mobile.json.

Tangentially, also delete the unused archive

Bug:  872551 ,  867368 
Change-Id: Ib15c9c0a8600e7fa510a146ad896aa92e4f9de45
Reviewed-by: Ned Nguyen <>
Commit-Queue: Kai Ninomiya <>
Cr-Commit-Position: refs/heads/master@{#582414}

Components: Speed>Benchmarks>Waterfall
It looks like the aquarium benchmark is running on the Nexus 5X perf bot. See:

and in particular shard #1:

[ RUN      ]
[       OK ] (38475 ms)
(INFO) 2018-08-13 17:13:26,608 cloud_storage.Insert:383  Uploading /b/swarming/w/itTAKiK3/tmpZJ4_6J.html to gs://chrome-telemetry-output/aquarium_2018-08-13_17-08-35_86413.html
View generated trace files online at for story aquarium
Uploading logs of page aquarium to (1 out of 1)

but later in the log:

(CRITICAL) 2018-08-13 17:16:38,755 story_runner.RunBenchmark:377  Benchmark execution interrupted by a fatal exception: <class ''>(Timed out waiting for 1 of 1 threads.)
[ RUN      ]
===== SKIPPING TEST aquarium: Telemetry interrupted =====
[  SKIPPED ] (0 ms)

I don't know how to read these logs. Is this benchmark running or not? Speed team, can you please help?

Note that there was some sort of catastrophic failure which caused a bunch of the benchmarks to fail to upload results. Here's the log excerpt from the point where the aquarium benchmark successfully ran, to the point where the harness claimed that it was skipped.

741 KB View Download
Ken, the benchmark was run succesfully (see the "json.output" link in

To double check the result of the aquarium test, I can find "aquarium" in "": {"perf_results"} entry in "Results DashboardUpload Failure ..." link.

Both the "Results DashboardUpload.." and "Merge script log" links confirmed that we failed to upload The "Merge script log" link also showed that the failure is due to 500 error, which is tracked in issue 867379

I also look at the SKIPPING TEST issue, it was a known problem and only "aquarium_20k" was skipped, not "aquarium" test.: 

===== SKIPPING TEST aquarium_20k: =====

I cannot find "===== SKIPPING TEST aquarium: Telemetry interrupted =====", it was probably in other build run?
Also the uploading error is flaky, so we do have some data recently for aquarium test:
Project Member

Comment 32 by, Aug 17

The following revision refers to this bug:

commit 8ae85144bca45f0f0897d3deb3fc998e360e439b
Author: Kai Ninomiya <>
Date: Fri Aug 17 04:22:10 2018

Add camera_to_webgl perf test to ToughWebglCases

This adds a new perf test case for camera-to-WebGL uploads.
This new case is intended to detect the performance improvements
being made in  issue 867368 .

Bug:  867368 
Change-Id: Ia8b05f422e15bb7625491ac10f33903dcc1c55d0
Reviewed-by: Sadrul Chowdhury <>
Reviewed-by: Ned Nguyen <>
Reviewed-by: Kenneth Russell <>
Commit-Queue: Kai Ninomiya <>
Cr-Commit-Position: refs/heads/master@{#583957}

Blocking: 875374
It looks like camera_to_webgl is running successfully on the android-nexus5x-perf bot:

The data for avg_surface_fps doesn't seem right (stuck at 1.0), but the frame_times_avg that Ned linked at some point looks good:

This is probably the graph I'll be watching after I land .
Blocking: -875374
Blockedon: 875374
Blocking: -871417
Project Member

Comment 39 by, Aug 23

The following revision refers to this bug:

commit 321904d3d1c4675cfe9d25b1c30f528efcf869e4
Author: Kai Ninomiya <>
Date: Thu Aug 23 00:05:49 2018

Add optimized path for YUV-to-WebGL, remove old path

For camera-to-WebGL on Nexus 6P at 720p, this improves blocking
texImage2D time from ~12ms to ~4ms (200% speedup). On some other
devices and resolutions, I think there can be up to a ~10x speedup.

* Adds an optimized upload path for CPU-side YUV video frames (e.g.
  those coming from a video camera on Android). This path uploads the
  individual Y/U/V textures to the GPU, performs a GPU YUV-RGB decode,
  and copies the result into the WebGL texture.

  This code path could potentially be further optimized in 2 ways:
    * Avoid the extra copy of the CPU-side YUV data from
      browser-renderer shared memory (VideoFrame::STORAGE_SHMEM) to
      renderer-gpu shared memory (transfer buffer, probably).
    * Avoid the extra copy from the decoded image (SkImage) into the
      WebGL texture, and instead decode directly into the WebGL texture.

* Removes an old GPU-GPU path that was obsoleted by
  CopyVideoTextureToPlatformTexture. This obsolete path was only
  handling CPU-GPU uploads instead of GPU-GPU uploads, and it was doing
  so by performing an expensive YUV-RGB conversion on the CPU.

  This also allowed some cleanup in TexImageHelperHTMLVideoElement.

Bug:  867368 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Id25d5dbfc76ec8f9dc606890588a20978f6943f6
Commit-Queue: Kai Ninomiya <>
Reviewed-by: Mounir Lamouri <>
Reviewed-by: Kenneth Russell <>
Reviewed-by: Dan Sanders <>
Cr-Commit-Position: refs/heads/master@{#585322}

NextAction: 2018-08-23
Fix is in - I'll check tomorrow to see if it has shown up in the perf results yet.
The NextAction date has arrived: 2018-08-23
Status: Fixed (was: Started)
Excellent work Kai! The performance improvement on your new test is superb!

Thanks for all the work on this, sounds like a big step forward. Is there a way to opt-in to canary builds on android so I can try this out on my test case?

Shifting the YUV conversion to the GPU will definitely free up the main javascript thread sooner (which is the most important thing for CPU-intensive applications such as mine). The downside is I imagine it becomes significantly harder to measure the overall overhead as some of it will now be in the GPU process.

I've been doing a lot of detailed systrace investigation of the android camera pipeline and noticed that the CPU callbacks are associated with pretty high overhead within the system-level "camera server" process (at least on the Galaxy S8). Even though I need greyscale data on the CPU in my native app I'm getting significantly better overall performance using the SurfaceTexture camera interface and doing the RGB -> greyscale conversion in a shader followed by a glReadPixels. I'd suspect many Chromium use cases don't actually require the data on the CPU side at all, so a SurfaceTexture-throughout pipeline will almost certainly give the lowest overhead overall and would be worth pursuing IMHO.

Finally I just want to thank you all for the openness and responsiveness here; it's much appreciated and such a massive difference from how Apple treats bug reports :)
simon@ Chrome Canary is directly available in play store :-)

Re. how capture works, you're probably right that, in an ideal world, we
should capture video in a SurfaceTexture (generally in an abstract handle
representing a platform-dependent thing, e.g. IOSurface or Dma-Buf) and 
just move that one around; for the main use cases of playback (seeing the 
cam feed in a <video> or rendering it in a webgl canvas) or encoding, this 
should work just fine, right? Since we use Android APIs and those are 
compatible with SurfaceTextures.

The issue is manifold here: historically, the only capture scenario was 
WebRTC, and that one used only software encoding, hence: the captured pixels
were readback and, for good measure, converted to a single transport pixel 
format, I420. Some time after, we started using platform encoders, which 
would be happy to take SurfaceTextures, but WebRTC likes to encode _several_ 
times the same feed in different resolutions >_< -- and, let's assume 
only one platform encoder can be used at once, the others would still be sw 
encoders and for this, you guess, the pixels still need to be in the CPU. 
Even if we had only one encoding of one resolution, WebRTC still likes to 
have the chance of switching to sw encoding to tweak parameters and as a 
fallback in case the platform encoder bursts in flames, hence still we need 
the pixels on the CPU side. Argh!  I'm not saying this is ideal, far from
it and, as matter of fact, in other areas we use GpuMemoryBuffers which are 
precisely wrappers around those platform abstractions to some extent, but 
changing the capture code would need to file bugs and write code.
Blocking: 912842

Sign in to add a comment