New issue
Advanced search Search tips

Issue 820167 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

7%-141.7% regression in media.desktop at 541222:541405

Project Member Reported by sandersd@google.com, Mar 8 2018

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=820167

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=ac9ae2c75b50df0b2d13d9d59eac9e70f71a1e355aea4ba4ac815eaf663dc5aa


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
chromium-rel-mac11-pro
chromium-rel-mac12
Cc: dcasta...@chromium.org dalecur...@chromium.org
Owner: dalecur...@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/12b3fbca440000

Move GpuMemoryBuffer upload stage earlier in pipeline. by dalecurtis@chromium.org
https://chromium.googlesource.com/chromium/src/+/2068cef4dc94410597183df1ddc9b66eb08ef3f1

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Status: WontFix (was: Assigned)
As noted in the CL there are no real changes here; the tests are just exposing memory consumption that may have occurred on faster computers. Will mark this as WontFix, but watch the alerts that come in to see if there is anything unexpected.
 Issue 820165  has been merged into this issue.
 Issue 820164  has been merged into this issue.
Status: Assigned (was: WontFix)
Reopening since there was 1 big change: we'll now kick off multiple concurrent copies, dcastagna@ is working on a fix which should resolve this. It's likely this will reduce total memory usage to previous levels.
Cc: jrumm...@chromium.org
 Issue 821083  has been merged into this issue.
Project Member

Comment 9 by bugdroid1@chromium.org, Mar 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9b0676478d3fe3dd41f09211c7644a7acacdcfcc

commit 9b0676478d3fe3dd41f09211c7644a7acacdcfcc
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Wed Mar 14 06:53:30 2018

Don't super-saturate the decoder when GMB copies are slow.

When GMB copies fall behind we would previously also slow down the
Decode() calls, after GMBDecoderWrapper was introduced we would
return the DecodeCB while the copy was pending, letting DecodeStream
think it could schedule another Decode. If decoding rate is higher
than the copy rate, this will generate absurd amounts of decode
results waiting for copying.

This patch changes the behavior to be based on GetMaxDecodeRequests();
e.g., when GetMaxDecodeRequests() == 1, this allows another decode if
there is at most 1 copy outstanding. When GMDR() == 2, this allows
another decode if there are at most 2 copies outstanding.

BUG=801245,  820167 
TEST=new test and existing tests modified to enforce this.

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I06415feda36643eb65aac69b17125737e12254c0
Reviewed-on: https://chromium-review.googlesource.com/959149
Reviewed-by: Xiaohan Wang <xhwang@chromium.org>
Reviewed-by: Daniele Castagna <dcastagna@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543021}
[modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper.cc
[modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper.h
[modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper_unittest.cc

Cc: xhw...@chromium.org
That fixed some of the issues, but it seems OutputCB() is similarly bursty. I.e., ffmpeg will queue up to N frames where N is the number of threads it is started with. So after N decode calls, N frames may be bursted out. This is what's happening on the garden video, we end up with 14 GMBs vs 7 before; which even before feels too high.

Eventually memory usage will fall back down as we reach a steady state, but I think we can do better without any performance loss by ensuring that we cap the size of the GMB pool. There are two ways to do this: either in the decoder or in the gmb pool. Either one is fairly easy, here's the two prototypes:

https://chromium-review.googlesource.com/c/chromium/src/+/963722
https://chromium-review.googlesource.com/c/chromium/src/+/963386

Personally I think it seems simpler to limit this inside the GpuMemoryBufferPool since it's already tracking a lot of this stuff, but it is a little unwieldy to add an AtCap() function on the pool.

Opinions?
It feels the MemoryBufferPool should be in charge of memory usage, so I slightly prefer to tracking this in the pool. But it's sad that we have to add AtCap()...

Is it possible for gmb_pool_->MaybeCreateHardwareFrame() to just hold the callback until memory is available (becomes under the cap)? This will keep |pending_copies_| high, which at some point will prevent us from issueing new decode calls.
That's what my prototype does, but we need the AtCap() to signal CanReadWithoutStalling() AFAICT. Unless you're suggesting we say something like when CanReadWithoutStalling == pending_copies <= max_decodes ?
Yeah, that's exactly what I meant, though I don't have any specific formula in mind at this point.
CanReadWithoutStalling would be a lie in the case where we've vended max_pool_frames though; which will hang the pipeline when min_buffered_frames > max_pool_frames.
Overnight I realized this might work, but I haven't tested.

CanReadWithoutStalling() == pending_copies < max_decodes && !eos_status_.has_value();

dcastagna@ do you have a preference?
I'm OK with either solutions but would prefer the first one, maybe with CanReadWithoutStalling querying the pool for the number of in-flight VideoFrames.

We had a similar discussion when they started using the pool in media stream. The pool would get a burst of copy requests and we'd end up OOMing, since videoframes + GMBs would double the memory usage.
Limiting the number of copies in that case would just make OOMing less likely to happen, but wouldn't fix the actual issue (unbounded queue of VideoFrame on the main thread).

What you're suggesting in #15 might work, but IIUC it doesn't take into account GMBs backed VideoFrames that have been sent to the compositor and not returned yet.
So if what we're looking for is a global limit we should expose the number of in-flight VideoFrames and add that to the equation.
I have an additional question, when we have a burst of vf to copy. How likely is it we'll display all of them? Is it possible we'll end up displaying only the last one?

If that's the case, should we consider avoid copying all the videoframes that are unlikely going to be displayed?
They'll all be displayed in this case, this startup burst comes from the thread model ffmpeg uses. It'd be great if we could cancel undisplayed/dropped frames, but I don't think we have the right signals for it yet. There's a fix we can do for seeks, which I've been planning to implement once I get the memory regression handled: tell the decoder the likely first frame timestamp and skip uploads until that timestamp.

Yes, the logic in c#15 has some implicit assumptions on what's going to happen with the outstanding VideoFrames. If our compositor ever started holding onto N>2 frames, we'd have a problem.

Hmm, if we need to query the pool for in-flight frames I think it's better to just have the pool handle the cap. It already has infrastructure in place for handling and aborting this cleanly; the cap should be provided by an optional constructor parameter.
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69

commit 5c175997b4d9c42c7c3ad40468e7b5ab9950ca69
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Tue Mar 20 22:52:43 2018

Switch back to uploading GpuMemoryBuffers in VideoRendererImpl.

Uploading these after decode is leading to issues with bursts
of outputs that are difficult to fix without giving the pool
knowledge of VideoRendererImpl's queue size. See this change
for more information on why: https://crrev.com/c/963386

There's another approach where we use DecoderStream and add
the concept of prepared buffers that would allow us to work
around this while preserving the early upload benefits that
the decoder approach provides; see this comment for details:

https://crrev.com/c/963386/13/media/renderers/video_renderer_impl.cc#484

This is a partial restoration of code prior to my changes, it
preserves some of the improvements: Only one pool necessary,
no need to provide GpuFactories + WorkerTaskRunner.

This should resolve the memory usage issues and allow easier
experimentation locally while I work towards the prepared
buffers approach.

BUG=801245,  820167 
TEST=existing tests all pass.

Change-Id: Iaedd298aba11f48345237c1c69f04086d4656f64
Reviewed-on: https://chromium-review.googlesource.com/969658
Reviewed-by: Daniele Castagna <dcastagna@chromium.org>
Reviewed-by: Xiaohan Wang <xhwang@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#544576}
[modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/default_renderer_factory.cc
[modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl.cc
[modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl.h
[modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl_unittest.cc
[modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/test/pipeline_integration_test_base.cc

Status: Fixed (was: Assigned)
Restored to original levels.

Sign in to add a comment