Issue metadata
Sign in to add a comment
|
7%-141.7% regression in media.desktop at 541222:541405 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Mar 8 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12b3fbca440000
,
Mar 8 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/12b3fbca440000 Move GpuMemoryBuffer upload stage earlier in pipeline. by dalecurtis@chromium.org https://chromium.googlesource.com/chromium/src/+/2068cef4dc94410597183df1ddc9b66eb08ef3f1 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 8 2018
As noted in the CL there are no real changes here; the tests are just exposing memory consumption that may have occurred on faster computers. Will mark this as WontFix, but watch the alerts that come in to see if there is anything unexpected.
,
Mar 8 2018
Issue 820165 has been merged into this issue.
,
Mar 8 2018
Issue 820164 has been merged into this issue.
,
Mar 9 2018
Reopening since there was 1 big change: we'll now kick off multiple concurrent copies, dcastagna@ is working on a fix which should resolve this. It's likely this will reduce total memory usage to previous levels.
,
Mar 12 2018
,
Mar 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9b0676478d3fe3dd41f09211c7644a7acacdcfcc commit 9b0676478d3fe3dd41f09211c7644a7acacdcfcc Author: Dale Curtis <dalecurtis@chromium.org> Date: Wed Mar 14 06:53:30 2018 Don't super-saturate the decoder when GMB copies are slow. When GMB copies fall behind we would previously also slow down the Decode() calls, after GMBDecoderWrapper was introduced we would return the DecodeCB while the copy was pending, letting DecodeStream think it could schedule another Decode. If decoding rate is higher than the copy rate, this will generate absurd amounts of decode results waiting for copying. This patch changes the behavior to be based on GetMaxDecodeRequests(); e.g., when GetMaxDecodeRequests() == 1, this allows another decode if there is at most 1 copy outstanding. When GMDR() == 2, this allows another decode if there are at most 2 copies outstanding. BUG=801245, 820167 TEST=new test and existing tests modified to enforce this. Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: I06415feda36643eb65aac69b17125737e12254c0 Reviewed-on: https://chromium-review.googlesource.com/959149 Reviewed-by: Xiaohan Wang <xhwang@chromium.org> Reviewed-by: Daniele Castagna <dcastagna@chromium.org> Commit-Queue: Dale Curtis <dalecurtis@chromium.org> Cr-Commit-Position: refs/heads/master@{#543021} [modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper.cc [modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper.h [modify] https://crrev.com/9b0676478d3fe3dd41f09211c7644a7acacdcfcc/media/filters/gpu_memory_buffer_decoder_wrapper_unittest.cc
,
Mar 14 2018
That fixed some of the issues, but it seems OutputCB() is similarly bursty. I.e., ffmpeg will queue up to N frames where N is the number of threads it is started with. So after N decode calls, N frames may be bursted out. This is what's happening on the garden video, we end up with 14 GMBs vs 7 before; which even before feels too high. Eventually memory usage will fall back down as we reach a steady state, but I think we can do better without any performance loss by ensuring that we cap the size of the GMB pool. There are two ways to do this: either in the decoder or in the gmb pool. Either one is fairly easy, here's the two prototypes: https://chromium-review.googlesource.com/c/chromium/src/+/963722 https://chromium-review.googlesource.com/c/chromium/src/+/963386 Personally I think it seems simpler to limit this inside the GpuMemoryBufferPool since it's already tracking a lot of this stuff, but it is a little unwieldy to add an AtCap() function on the pool. Opinions?
,
Mar 14 2018
It feels the MemoryBufferPool should be in charge of memory usage, so I slightly prefer to tracking this in the pool. But it's sad that we have to add AtCap()... Is it possible for gmb_pool_->MaybeCreateHardwareFrame() to just hold the callback until memory is available (becomes under the cap)? This will keep |pending_copies_| high, which at some point will prevent us from issueing new decode calls.
,
Mar 15 2018
That's what my prototype does, but we need the AtCap() to signal CanReadWithoutStalling() AFAICT. Unless you're suggesting we say something like when CanReadWithoutStalling == pending_copies <= max_decodes ?
,
Mar 15 2018
Yeah, that's exactly what I meant, though I don't have any specific formula in mind at this point.
,
Mar 15 2018
CanReadWithoutStalling would be a lie in the case where we've vended max_pool_frames though; which will hang the pipeline when min_buffered_frames > max_pool_frames.
,
Mar 15 2018
Overnight I realized this might work, but I haven't tested. CanReadWithoutStalling() == pending_copies < max_decodes && !eos_status_.has_value(); dcastagna@ do you have a preference?
,
Mar 15 2018
I'm OK with either solutions but would prefer the first one, maybe with CanReadWithoutStalling querying the pool for the number of in-flight VideoFrames. We had a similar discussion when they started using the pool in media stream. The pool would get a burst of copy requests and we'd end up OOMing, since videoframes + GMBs would double the memory usage. Limiting the number of copies in that case would just make OOMing less likely to happen, but wouldn't fix the actual issue (unbounded queue of VideoFrame on the main thread). What you're suggesting in #15 might work, but IIUC it doesn't take into account GMBs backed VideoFrames that have been sent to the compositor and not returned yet. So if what we're looking for is a global limit we should expose the number of in-flight VideoFrames and add that to the equation.
,
Mar 15 2018
I have an additional question, when we have a burst of vf to copy. How likely is it we'll display all of them? Is it possible we'll end up displaying only the last one? If that's the case, should we consider avoid copying all the videoframes that are unlikely going to be displayed?
,
Mar 15 2018
They'll all be displayed in this case, this startup burst comes from the thread model ffmpeg uses. It'd be great if we could cancel undisplayed/dropped frames, but I don't think we have the right signals for it yet. There's a fix we can do for seeks, which I've been planning to implement once I get the memory regression handled: tell the decoder the likely first frame timestamp and skip uploads until that timestamp. Yes, the logic in c#15 has some implicit assumptions on what's going to happen with the outstanding VideoFrames. If our compositor ever started holding onto N>2 frames, we'd have a problem. Hmm, if we need to query the pool for in-flight frames I think it's better to just have the pool handle the cap. It already has infrastructure in place for handling and aborting this cleanly; the cap should be provided by an optional constructor parameter.
,
Mar 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69 commit 5c175997b4d9c42c7c3ad40468e7b5ab9950ca69 Author: Dale Curtis <dalecurtis@chromium.org> Date: Tue Mar 20 22:52:43 2018 Switch back to uploading GpuMemoryBuffers in VideoRendererImpl. Uploading these after decode is leading to issues with bursts of outputs that are difficult to fix without giving the pool knowledge of VideoRendererImpl's queue size. See this change for more information on why: https://crrev.com/c/963386 There's another approach where we use DecoderStream and add the concept of prepared buffers that would allow us to work around this while preserving the early upload benefits that the decoder approach provides; see this comment for details: https://crrev.com/c/963386/13/media/renderers/video_renderer_impl.cc#484 This is a partial restoration of code prior to my changes, it preserves some of the improvements: Only one pool necessary, no need to provide GpuFactories + WorkerTaskRunner. This should resolve the memory usage issues and allow easier experimentation locally while I work towards the prepared buffers approach. BUG=801245, 820167 TEST=existing tests all pass. Change-Id: Iaedd298aba11f48345237c1c69f04086d4656f64 Reviewed-on: https://chromium-review.googlesource.com/969658 Reviewed-by: Daniele Castagna <dcastagna@chromium.org> Reviewed-by: Xiaohan Wang <xhwang@chromium.org> Commit-Queue: Dale Curtis <dalecurtis@chromium.org> Cr-Commit-Position: refs/heads/master@{#544576} [modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/default_renderer_factory.cc [modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl.cc [modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl.h [modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/renderers/video_renderer_impl_unittest.cc [modify] https://crrev.com/5c175997b4d9c42c7c3ad40468e7b5ab9950ca69/media/test/pipeline_integration_test_base.cc
,
Mar 21 2018
Restored to original levels. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Mar 8 2018