New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872753 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 24
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Animating the same GIF from multiple compositors in the same renderer is janky.

Project Member Reported by khushals...@chromium.org, Aug 9

Issue description

If the same GIF is animating on multiple windows running in the same renderer, the animation can potentially become very janky. The root cause of this issue is the caching mechanism for dependent frames of the animation.

The GIFDecoder internally has a chain of dependent frames, for which frame is required to decode a particular frame. It maintains an internal cache of the previously decoded frame so that decoding the next frame of the animation doesn't require decoding the complete dependency chain. In the common case, where a single animation timeline sequentially moves forwards, this works fine.

In the case where multiple widgets and through that multiple compositors are animating the same GIF with potentially different timelines, the same ImageDecoder is asked to decode frames out of order which defeats its internal caching and causes to re-decode the complete dependency chain for each animation update.

Here is a test case which reproduces the bug (sorry google only): https://docs.google.com/presentation/d/1arqzjVE1pLqZ8G9MOxjlKlJ3PMkUZI3g-9RNyo2hVeA/edit#slide=id.g3e00ea2fb0_0_11, copied from b/112358729 which shows this bug. And a trace attached which shows the lengthy decode tasks due to decoding of multiple frames.

This became worse with moving gifs to the cc thread since earlier we had a common timeline for the animation, with the animation state being maintained on a single BitmapImage instance on the main thread which inadvertently ensured the compositors displayed the same frame and avoided this.

I think the ideal solution would be to avoid keeping this side cache in blink. CC already has a cache of decoded frames and could simply share the required frame with the decoder when asking for the next decode. But that's not trivial to do. It will require changes for the ImageDecoders to expose this dependency, have cc's cache pass through the decoded memory, and have the decoder use it. 

As a short term easier fix, I think we could maintain different ImageDecoders in blink's cache for different compositors if needed. Blink's ImageDecodingStore keys ImageDecoders based on ImageFrameGenerator which is shared across compositors. But each cc decode task could potentially pass a "client_id" identifying the unique cc instance so they use different decoders, each of which can manage a frame cache for a different timeline. I think this could be simple enough that we could merge to 69 if needed.
 
Also, just to put some context on severity for this, if the complete GIF can fit within cc's image cache, which is backed by system discardable memory, then then this won't be an issue since after the first loop we won't need to re-decode any frame. Its only an issue if it keeps getting evicted.
And actually attaching the trace this time.
trace_bad_gif_decoding.json.gz
3.7 MB Download
Labels: Hotlist-Partner-GSuite
Cc: cblume@chromium.org
This is maybe a farfetched thought, but why are we animating the same gif differently on multiple renderers?  Could we consider breaking the logic about fast forwarding gifs and synchronize these from the compositor side? Or if we are starting a gif and we already have it decoded, to start on that frame as well?
Its not multiple renderers but different widgets in the same renderer. The logic for which frame of the gif to show is tracked in cc::ImageAnimationController which is on the cc thread but is per-compositor. If we wanted to have all compositors to show the same frame, the AnimationState in the controller would need to be shared among them and its pretty intertwined with tracking its visibility and triggering invalidations based on when the next tick is needed which really needs to be per compositor.

Its possible to maybe restructure the ImageAnimationController such that it could be shared between compositors and allow for a common timeline, but I think it'll be a lot more complicated.
enne: One of the GSuite wants was that they be able to control when a gif animation starts, e.g. as soon as a slide is presented.  I think the way to do this is to remove and insert the image element, to re-start it's animation.  This doesn't apply to most sites, but I think this case means we can't rely on all having a synchronized key frame.
I agree with Khushal. I think the short-term fix makes sense for now.

But good news on a more long-term fix!! In the original comment:

> I think the ideal solution would be to avoid keeping this side cache in blink. CC already has a cache of decoded frames and could simply share the required frame with the decoder when asking for the next decode.

Back when I was doing SkCodec stuff I made a roadmap for exactly this. And as a result, the SkCodec work includes this ability. It just hasn't been exposed yet. I was going to do that as a follow-up step.

You can see an example of that here:
https://chromium-review.googlesource.com/c/chromium/src/+/783647/5/third_party/blink/renderer/platform/image-decoders/gif/gif_image_decoder.h#76
We made SkCodec require the caller to pass in a frame from the dependent range.

We landed SkGifCodec a while ago but was reverted as a suspect in a perf regression. We later found the cause was elsewhere but never re-landed.

I'm working on landing this now. Then we can expose the API.
Khushal's proposed short-term fix makes sense to me.
As a side note, I also am very tempted to have the default state coalesce animation instances.

You can imagine 100 instances of the same gif on a page. If the gif is 100 frames and each instance is 1 frame off from the previous, we have no choice but to keep all 100 frames in memory.

If we can coalesce by default we would only need 1 frame in memory. Our memory usage would go from O(n) frames to O(1).

But like Victor said certain sites need control over animation. I've only ever heard of a required use case being "start the animation from the beginning when I tell you". So maybe we can get away with a sort of ref-counted thing? Once the image count goes from 0->1 start the animation...then existing solutions continue to work. But from 1->2 we start the animation where the first instance already is.

Removing and reinserting images into the DOM definitely isn't the ideal long-term solution because we could take that as a signal to purge the cache. But since it is what users are forced to do now, it isn't making things worse. Later, we could add the ability to control individual instances and not use the default coalesced state.
Agreed withg cblume's comment that in most cases coalescing would be ideal.  I think that wouldn't always work for Slides, as of the case when you present with with "Speaker Notes" in a separate window.  The speaker notes have an instance of the image.

In general this seems like something that may benefit from a clear specification that browsers agree on.
Labels: -Type-Bug FoundIn-67 Type-Bug-Regression
Marking that this was a regression introduced in 67
This is not M69 blocker as it is regressed in M67. Pls target fix for M70. Pls let us know ASAP if there is any concern here. Thank you.
Presenting with animated GIFs is a common use case for Slides. We have reports of this happening but don't have data on often users are seeing this. Can you keep the target as M69?
M69 Stable promotion is coming VERY soon. Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix and request a merge into the release branch ASAP. Thank you.


I'm trying out the fix I mentioned in #0 for namespacing ImageDecoders in blink's cache and while less scary than the ideal fix, its still a fairly non-trivial amount of plumbing that I don't think we should merge to 69 unless absolutely necessary.

fenil@, is there a sense of how many bug reports we have? I suspect that hitting this case is rare since its not that every case of having a GIF in 2 windows will hit this, it needs to be large enough such that it can't completely fit within our cache limits (the GIF in the page here has 430 frames, each frame taking ~3M). Unfortunately we don't have any UMA metrics for how often this happens either.

Also note that the feature that regressed this was rolled out to stable completely with M64 (issue 776216) in February with cr/186007630.
fenil@ is currently OOO.

Given the fix isn't straightforward, sounds good to target M70 instead. Please let us know if for some reason this becomes at risk for M70.
Labels: -M-69
Thanks. Removing the M-69 label. This will definitely be fixed for 70.
So. I tried the fix for assigning different decoders to different compositors and that didn't fix the issue. It partly addresses the problem but not the root cause.

Different decoders ensure that we get different caches for each timeline of the animation but the other issue here is that access to these decoders still relies on going through the same ImageFrameGenerator and access to it is guarded by a mutex which prevents the decode work from different compositors from running in parallel. Since the decode for a particular compositor is blocked, the animation is delayed causing us to catch up to the desired frame on the next update which in turns requires decoding all intermediate frames in a single batch. This blocks the other compositor's decode now so we have a cascading effect of being in continual catch up mode with lengthy decode tasks.

The correct fix is obviously to parallelize this decode work, which should be possible since now each compositor uses a different ImageDecoder instance. I'm running into some issues with making that happen while using a common ImageFrameGenerator. Just to validate that this does actually work I tried creating different ImageFrameGenerator instances for each compositor, which should have the same effect and this does address the issue. Attaching some traces.
trace_bad_decoding.json.gz
4.3 MB Download
trace_parallel_decoding.json.gz
7.7 MB Download
What is the benefit of using the same ImageFrameGenerator? It's just managing access to the ImageDecoder, right? I think your workaround is the right approach (setting aside for now, the longer-term idea of moving the frame cache out of ImageDecoder) - use separate ImageFrameGenerators which each have their own ImageDecoder.
It's not possible to assign different ImageFrameGenerators to different compositors right now. The ImageFrameGenerator is created at paint time and is used to key the ImageDecoder in the ImageDecodingStore. If we want a cache hit and reuse the same ImageDecoder per compositor, it has to be keyed using a stable ImageFrameGenerator. I think if all caching were external to the decoders, i.e., we didn't actually need a ImageDecodingStore at all, then we could create temporary ImageFrameGenerators per compositor for each decode request. But that's not possible right now.

For testing above, I just threw in a global map for infinitely caching ImageFrameGenerators for a combination of [paint-time ImageFrameGenerator, compositor_id], which is obviously not landable. Also I don't see a reason why ImageFrameGenerator couldn't be used across multiple threads as long as the access to the decoders is correctly synchronized.
> Also I don't see a reason why ImageFrameGenerator couldn't be used across
> multiple threads as long as the access to the decoders is correctly
> synchronized.

It's not that I didn't think it could be done; I just didn't see the benefit to doing so. But that's because I didn't know the constraints of the compositor etc.
Project Member

Comment 22 by bugdroid1@chromium.org, Aug 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f

commit 98afd4cc3f640881b0d20aeaa89ecaa50e164b4f
Author: Khushal <khushalsagar@chromium.org>
Date: Fri Aug 24 07:23:00 2018

blink: Refactor ImageFrameGenerator to support multi-thread use.

Currently ImageFrameGenerator can only be used for decoding from a
single thread at a time, with the access to any decoder for this
generator protected using the same mutex. This can result in janks where
the same image is being animated by multiple compositors in the same
process, since the decode tasks from these compositors can not execute
in parallel.

In order to allow multiple threads to use different decoders backed by
the same generator, this change splits the piece in the generator which
uses the decoder into a ImageDecoderWrapper. This makes it explicit that
different decoders can be used concurrently, as long as the access is
synchronized with a mutex per decoder.

The change should be a no-op.

R=chrishtr@chromium.org

Bug:  872753 
Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_slimming_paint_v2;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: Ib31ca739fbf858e0dc292dc9b47f9fd4a087956d
Reviewed-on: https://chromium-review.googlesource.com/1185629
Reviewed-by: Chris Harrelson <chrishtr@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#585740}
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/BUILD.gn
[add] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/graphics/image_decoder_wrapper.cc
[add] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/graphics/image_decoder_wrapper.h
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/graphics/image_frame_generator.cc
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/graphics/image_frame_generator.h
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/graphics/test/mock_image_decoder.h
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/image-decoders/image_decoder.cc
[modify] https://crrev.com/98afd4cc3f640881b0d20aeaa89ecaa50e164b4f/third_party/blink/renderer/platform/image-decoders/image_decoder.h

Project Member

Comment 23 by bugdroid1@chromium.org, Aug 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/123f71d0b3565585f4d1b78370b47fe8827bfb23

commit 123f71d0b3565585f4d1b78370b47fe8827bfb23
Author: Khushal <khushalsagar@chromium.org>
Date: Fri Aug 24 08:21:16 2018

blink/cc: Use different ImageDecoders for different compositors.

If the same image is being displayed by multiple compositors, the decode
tasks currently use the same ImageDecoder protected by a common mutex.
This leads to janks in the case of animated images where the tasks from
different compositors to decode the same image can not execute in
parallel.

For animated images in particular, using the same decoder also results
in defeating the internal caching in the decoder. The decoder currently
caches dependent frames assuming that a decode for frame n will be
followed by a decode request for frame n+1. If different compositors use
the same decoder and run different timelines for the same animation, the
decode requests can run out of order and defeat this caching causing a
re-decode of the complete dependency chain.

To avoid this, assign a GUID to each compositor which is passed through
to blink via the PaintImageGenerator callback for requesting a decode.
This allows the ImageFrameGenerator to assign different decoders to each
compositor, and also allow the decode work to run in parallel.

R=chrishtr@chromium.org
TBR=vmpstr@chromium.org

Bug:  872753 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_layout_tests_slimming_paint_v2;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I338b9a31550df1dcfaf777a3e8372d1828fa0043
Reviewed-on: https://chromium-review.googlesource.com/1185793
Commit-Queue: Khushal <khushalsagar@chromium.org>
Reviewed-by: Chris Harrelson <chrishtr@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Fredrik Hubinette <hubbe@chromium.org>
Cr-Commit-Position: refs/heads/master@{#585759}
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/oop_pixeltest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/paint_image.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/paint_image.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/paint_image_generator.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/paint_image_unittest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/paint_shader_unittest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/skia_paint_image_generator.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/paint/skia_paint_image_generator.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/test/fake_paint_image_generator.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/test/fake_paint_image_generator.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/test/fake_tile_manager.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/gpu_image_decode_cache.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/gpu_image_decode_cache.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/gpu_image_decode_cache_perftest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/gpu_image_decode_cache_unittest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache_unittest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache_unittest_combinations.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache_utils.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/software_image_decode_cache_utils.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/tiles/tile_manager_unittest.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/trees/layer_tree_host_impl.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/cc/trees/layer_tree_host_impl.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/components/viz/service/display/software_renderer.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/content/renderer/webgraphicscontext3d_provider_impl.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/media/renderers/paint_canvas_video_renderer.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/bitmap_image_test.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/decoding_image_generator.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/decoding_image_generator.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/deferred_image_decoder_test_wo_platform.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_decoder_wrapper.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_decoder_wrapper.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_decoding_store.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_decoding_store.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_decoding_store_test.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_frame_generator.cc
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_frame_generator.h
[modify] https://crrev.com/123f71d0b3565585f4d1b78370b47fe8827bfb23/third_party/blink/renderer/platform/graphics/image_frame_generator_test.cc

Status: Fixed (was: Assigned)

Sign in to add a comment