New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 619173 link

Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug



Sign in to add a comment

Decoding 1280x720 gif takes ~1G of ram

Project Member Reported by boliu@chromium.org, Jun 10 2016

Issue description

It's this google logo gif: https://www.google.com/url?q=http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif&sa=D&usg=AFQjCNHaa3X6dNNBosMuM_1D4GYJhR1FlQ

Came from internal bug: https://buganizer.corp.google.com/issues/27334223#comment49

I'll just attach the trace here. I'm using trunk today (r399173).

There are 3 batches of decode in the trace, each getting slower and slower, because OS is killing more and more apps in the background. First batch was fast and memory infra didn't get to sample memory in between. But third batch is slow enough that memory infra managed to show ~900MB of memory under "malloc" during decode. It's freed immediately after decode is done, according to memory infra anyway.

1G for that thing seems kinda ridiculous..

In this particular use case, the image is shrunk down to 96x54 though. So tricks that help with this case, even if not in general, might be useful too.
 
trace_b27334223_malloc.json.gz
6.5 MB Download

Comment 1 by ccross@google.com, Jun 10 2016

I reproduced it by sending myself an email with an img tag to  http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif

Interestingly, it didn't trigger when it the img tag was:
<img src="http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif" width="525" height="295">

But it did with:
<img src="http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif" width="66" height="37" style="margin-right: 0px;">

So it may be specific to the large shrinking ratio

Comment 2 by cblume@chromium.org, Jun 10 2016

At that resolution, each 32 BPP frame would be 3.6 MB.
And there are many frames.
~900 MB / 3.6 MB per frame = ~250 frames. I believe that.

I am a little surprised the cache isn't limiting this. Maybe CC isn't telling the decoder to free unused frames from the decoder's cache (and maybe the decoder's cache isn't limited).

I will investigate when I can.

Comment 3 by cblume@chromium.org, Jun 10 2016

Re #1, does that style="margin-right: 0px;" influence it at all?

Comment 4 by boliu@chromium.org, Jun 10 2016

> ~900 MB / 3.6 MB per frame = ~250 frames. I believe that.

we are decoding all the frames all at once? :o

Comment 5 by ccross@google.com, Jun 10 2016

Re #3, it doesn't seem to.  The repro is a little fuzzy, but I sent myself an HTML email with
<img src="http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif" width="66" height="37">
in the body through go/send-raw-email.  Then:
1. Open a fresh gmail
2. Open the message
3. The image displays and animates, memory usage looks normal.  Let it play all the way through.
4. Go back to the inbox
5. Open the same message again
6. Gmail closes, sometimes before even showing the image.  If it doesn't crash, repeat at step 4

Comment 6 by cblume@chromium.org, Jun 10 2016

Re #4 No, we are not decoding all the frames at once.
They are decoded on demand and cached within the decoder. This is useful because frame #N might require frame #N-1 or #N-2.
The frames then stay in cache unless CC tells the decoder to purge its cache of all frames it doesn't need (so it might keep frame #N-1 or #N-2 around).

Comment 7 by aelias@chromium.org, Jun 11 2016

Labels: -Pri-3 Pri-2
Status: Assigned (was: Untriaged)

Comment 8 by boliu@chromium.org, Jun 11 2016

> Re #4 No, we are not decoding all the frames at once.

If you look at the trace, a single ImageFrameGenerator::decode event allocates that 900MB.. that's what I'm confused about.

Comment 9 by cblume@chromium.org, Jun 11 2016

Re #8 hrmmm that might be trying to catch up with a frame missing? I'll have to investigate to confirm but it sounds like it. Right now our save-past-frames is slightly hackish so it is entirely possible.

Here is how it works: If we are decoding frame 250, but it references frame 247 which we don't have, we need to decode frames 1-247. All of those will be cached. That is how decoding one frame could actually lead to many frames.

Right now we only store the last two frames, which covers a huge majority of all animated gifs. But a smarter solution would be to do a first pass to detect which frames reference which other frames and be more intelligent about storing them.

When I get around to this task I'll analyze that gif and figure out what it is doing.

Comment 10 by boliu@chromium.org, Jun 13 2016

the other thing I wasn't clear on is why is the allocation under malloc, as opposed to discardable memory. did memory-infra tracing incorrectly put it into the wrong category, or is that really malloc memory?
Re #10 - That, I don't know. I'll have to investigate.
> the other thing I wasn't clear on is why is the allocation under malloc, as opposed to discardable memory. did memory-infra tracing incorrectly put it into the wrong category, or is that really malloc memory?
Are you talking about the trace in #0? I miss the context of all this bug but from that trace the value reported by mallinfo() itself (the "allocated_objects" row of the "malloc" column) is monstruous and that is 100% coming from malloc().
Not sure how you pinned it down to cc, but definitely you guys have more context-specific knowledge than me. If you want to be 100% sure that comes from cc you can try use the heap profiler (https://chromium.googlesource.com/chromium/src/+/master/components/tracing/docs/heap_profiler.md)
I never tried with webview but should work there.

Comment 13 by boliu@chromium.org, Jun 13 2016

Just took a trace with malloc stack turned on (with instructions dskiba just sent out). The allocations all happened in GIFLZWContext::doLZW. Full stack below. Now do we have any LZW compression experts..?

base::SimpleThread::ThreadMain()  
content::CategorizedWorkerPool::Run(std::__1::vector<cc::TaskCategory, std::__1::allocator<cc::TaskCategory> > const&, base::ConditionVariable*)  
content::CategorizedWorkerPool::RunTaskWithLockAcquired(std::__1::vector<cc::TaskCategory, std::__1::allocator<cc::TaskCategory> > const&)  
content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired(cc::TaskCategory) 
RunOnWorkerThread 
cc::SoftwareImageDecodeController::DecodeImage(cc::ImageDecodeControllerKey const&, cc::DrawImage const&) 
cc::SoftwareImageDecodeController::DecodeImageInternal(cc::ImageDecodeControllerKey const&, cc::DrawImage const&) 
cc::SoftwareImageDecodeController::GetOriginalImageDecode(cc::ImageDecodeControllerKey const&, sk_sp<SkImage const>)  
SkImage::readPixels(SkImageInfo const&, void*, unsigned int, int, int, SkImage::CachingHint) const  
SkImage_Generator::onReadPixels(SkImageInfo const&, void*, unsigned int, int, int, SkImage::CachingHint) const  
SkImageCacherator::directGeneratePixels(SkImageInfo const&, void*, unsigned int, int, int)  
SkImageGenerator::getPixels(SkImageInfo const&, void*, unsigned int)  
blink::DecodingImageGenerator::onGetPixels(SkImageInfo const&, void*, unsigned int, unsigned int*, int*)  
blink::ImageFrameGenerator::decodeAndScale(blink::SegmentReader*, bool, unsigned int, SkImageInfo const&, void*, unsigned int)  
blink::ImageFrameGenerator::tryToResumeDecode(blink::SegmentReader*, bool, unsigned int, SkTSize<int> const&, SkBitmap::Allocator*) 
blink::ImageFrameGenerator::decode(blink::SegmentReader*, bool, unsigned int, blink::ImageDecoder**, SkBitmap*, SkBitmap::Allocator*) 
blink::ImageDecoder::frameBufferAtIndex(unsigned int) 
blink::GIFImageDecoder::decode(unsigned int)  
GIFImageReader::decode(unsigned int)  
GIFFrameContext::decode(blink::FastSharedBufferReader*, blink::GIFImageDecoder*, bool*) 
GIFLZWContext::doLZW(unsigned char const*, unsigned int)
trace_symbolized.json.gz
6.2 MB Download

Comment 14 by torne@chromium.org, Jun 16 2016

Toby and I looked at the code here a bit and it wasn't obvious what the allocation might be or whether or not there was anything here that was based on the decoded frame size instead of just the gif file size. Can anyone from image decoder land suggest how we can work out why this gif takes so much memory?

Comment 15 by regisd@google.com, Jun 16 2016

I think it was easier for me to reproduce when:
- the gif is scaled down in the html
- the user scrolls in the webview while the gif is loading


Comment 16 by torne@chromium.org, Jun 16 2016

Yeah, it appeared to use significantly more memory even on desktop Linux for me when the gif was scaled down compared to rendering at native size.
I couldn't get it to reproduce with the webview shell and the following URL:

http://jsbin.com/siwevehipo

Comment 18 by torne@chromium.org, Jun 16 2016

Maybe the scrolling is relevant, then - perhaps having the gif partially onscreen by a different amount in different frames is causing something different to happen?
I tried that; that's why there's all the extra text below it. Also tried different resizings. The worst it got to was ~200M of PSS.
I believe I have found the cause.

To help paint the picture, imagine we have missed a few animation frames. CC will ask for the frame at the current time, meaning the image decoder needs to catch up. So when we are decoding frame 100, we may actually decode frames 50-99 first because we missed those frames.

That catch up phase happens inside the gif decoder and does not return execution to CC until it has caught up. This means that CC does not have a chance to tell the gif decoder to purge its cache.

I added a trace event inside GIFImageDecoder::decode to show the frame it is currently decoding. You will see that when it is catching up, it takes significantly more time and the memory grows. Once it has completed, CC gets to ask the decoder to purge and the memory usage drops back down.

In the attached trace, you will see a row for CompositorTileWorkerBackground with a GIANT green ImageFrameGenerator::decode call on the right-of-center. Inside it are many frames for the catch up + the growing memory. And shortly after it, the memory drops back down.

If it crashed, the memory won't get to drop back down.

I will now see if purging-while-catching-up fixes it.
trace_frame_index_trace.json
21.7 MB Download
Sure enough, purging while we catch up reduces memory (surprise). You can see in this new trace that there are no more memory issues.

Unfortunately, the solution I wrote just now is potentially a complete loss of cache. It maintains the needed frames, but when you loop back to the beginning it will not have the previously-decoded frames.

Maybe we could be more careful about purging-while-catching-up only if the cache would be large. Or we could set a cache size limit. I only did this to sort of check if this was indeed the cause.
trace_purge_while_catching_up.json
28.6 MB Download

Comment 22 by boliu@chromium.org, Jun 17 2016

seems the quickest and simplest change we can make is just disable the animation catch up

dunno if there are any websites that assumes gifs animations are synchronizated to when they are inserted into the dom..
Rather than disable the animation catch up, I am not enforcing the memory limit we were already being given.

If the GIF would remain below the memory limit then we can cache the frames and everything is lovely.

If the GIF would break the memory limit then when we need to constrain it somehow. LRU wouldn't work because once the animation loops, the cached frames will be gone and there was no benefit to caching them. So instead I'm purging aggressively and not caching frames longer than I need to.

You can see the change here: https://codereview.chromium.org/2075093002

Comment 24 by torne@chromium.org, Jun 17 2016

Where does that memory limit come from? How much is it typically? Just curious..
It comes from here: https://cs.chromium.org/chromium/src/content/child/blink_platform_impl.cc?cl=GROK&gsn=maxDecodedImageBytes&rcl=1466133585&l=958

To summarize it:
- if Android
--- if low end device
----- 3 MB limit
--- else
----- physical memory / 25
- else
--- no limit

Comment 26 by enne@chromium.org, Jun 17 2016

Components: -Blink>Image Internals>Compositing>Images
I should add, that is where the value is set.
It is passed into the ImageDecoder when it is created:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/ImageDecoder.cpp?cl=GROK&gsn=maxDecodedImageBytes&rcl=1466162237&l=82
I haven't checked if other decoders respect this value.
Project Member

Comment 28 by bugdroid1@chromium.org, Jun 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9899eade00c556b8086b74867f02edf8c98cb4b9

commit 9899eade00c556b8086b74867f02edf8c98cb4b9
Author: cblume <cblume@chromium.org>
Date: Tue Jun 28 20:36:12 2016

Don't cache GIF frames if it would be too large.

The GIF decoder now respects m_maxDecodedBytes and checks for overflow.

If the GIF would go over the memory limit then cached frames provide no
benefit. By the time the animation loops we would have been forced to
purge those frames. So caching any extra (not currently needed) frames
would be a waste. In that case, run lean and purge aggressively.

R=scroggo@chromium.org
BUG= 619173 

Review-Url: https://codereview.chromium.org/2075093002
Cr-Commit-Position: refs/heads/master@{#402536}

[modify] https://crrev.com/9899eade00c556b8086b74867f02edf8c98cb4b9/third_party/WebKit/Source/platform/image-decoders/gif/GIFImageDecoder.cpp
[modify] https://crrev.com/9899eade00c556b8086b74867f02edf8c98cb4b9/third_party/WebKit/Source/platform/image-decoders/gif/GIFImageDecoder.h

The code is GIF-specific, but don't forget that GIF is not the only animation format supported. 
Sure, but GIF is probably the only one crummy enough to have this bug (only one keyframe at the start).
Hello Max, :)

I took a look at webp (the only other animated image formate in Blink right now).
It does the same catch-up that the GIF decoder does.

Soon, I'll create a webp test case to see if the caching-while-decoding problem is the same. I suspect it is.

If I'm understanding this code correctly, the current webp decoder won't go from the latest keyframe. It seems to add all catch-up frames to the to-be-decoded list. But I'll test and make sure.
Why do you say that?  WebP has the same "find required previous frame index" to stop at the most recent keyframe that GIF does.  And the WebP transcoder tool purposefully inserts keyframes every so often when transcoding from GIF, to avoid the issues GIFs with no keyframes have.
It looks like it will cache all the frames from the nearest decoded keyframe, which would mean it will create a memory spike as it catches up the way the gif decoder did.

It might be a smaller spike if there are keyframes. But it would still be a spike. And it looks like it doesn't just start at the nearest keyframe...it looks like it starts at the nearest already-decoded keyframe.

I could be wrong. Would you mind helping me understand it?

WEBPImageDecoder::decodeSingleFrame seems to be the only place that sets a frame's status to FrameComplete.

https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/webp/WEBPImageDecoder.cpp?q=WebPImageDecoder&sq=package:chromium&dr=CSs&l=499

WEBPImageDecoder::decodeSingleFrame seems to only be called from inside WEBPImageDecoder::decode

https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/webp/WEBPImageDecoder.cpp?q=WebPImageDecoder&sq=package:chromium&dr=CSs&l=436

Inside WEBPImageDecoder::decode there is a do..while loop that goes from the current frame through each required previous frame, adding it to the list of frames to be decoded unless it sees that frame is already complete.

After the do..while loop is the for loop which decodes each frame.


So it looks like if we are doing a pre-catchup (no frames decoded yet) and we need frame 100, it will ignore if frame 98 is a keyframe and decode all the frames 1..100. Maybe I am misunderstanding this.

Also, the WebP decoder isn't respecting m_maxDecodedBytes like how the GIF decoder wasn't.
frame.requiredPreviousFrameIndex() is kNotFound for key (independent) frame.

>So it looks like if we are doing a pre-catchup (no frames decoded yet) and we need frame 100, it will ignore if frame 98 is a keyframe and decode all the frames 1..100. Maybe I am misunderstanding this.

do..while loop adding required previous frames would would go only until nearest keyframe - so decode only 98,99 and 100 (assuming 99 and 100 are not keyframes).

clearCacheExceptFrame causes that only the latest complete (2 in case of DisposeOverwritePrevious in GIF) frame is kept in cache (ImageDecoder::m_framebufferCache). 
clearCacheExceptFrame is implemented in such way that if e.g. frame 98 was kept in cache, frame 100 could never require frame 97 or any other frame before.

That said, purge aggressively set or not isn't causing any difference in number of frames required to decode -  it is good to apply it always, not only to "large" files.
Ah okay. Thank you! So the do..while loop goes from the current frame through each required previous frame until either 1.) the required previous frame is already decoded, or 2.) this frame is a keyframe (kNotFound).

So it likely will do less catching up since it can catch up from a keyframe, most likely. But it is still doing some catching up, and adding the frames to the cache without checking if it is going to go over the budget. So this fix should still be applied to WebP. It is just less likely to be a problem.


I don't think we should always purge aggressively. If the image is small it would be best to keep all frames in the cache. I only purge aggressively right now if the image is large enough that it would go past its budget.
> I don't think we should always purge aggressively. If the image is small it would be best to keep all frames in the cache. I only purge aggressively right now if the image is large enough that it would go past its budget.

Why it is needed to keep them in cache only during this decode() execution?
Please note that all those frames are not used in decode once decoding advances to next frame and they are anyhow deleted from the cache as soon as decode returns:
1.decode called from frameBufferatIndex:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/ImageFrameGenerator.cpp?rcl=0&l=284
2. right after it clear all except the last completely decoded from the cache:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/ImageFrameGenerator.cpp?rcl=0&l=294

So, even for small images it is problematic to accumulate them in cache as they are never used but only allocated and accumulated and then deleted.

I submitted this patch for review and it is fixing the issue on a bit different way:

crrev.com/2155973002
Save a bitmap copy when advancing to dependent GIF animation frames  

Could you please review it? Thanks.
 

Ah, I see what you are saying.
Hrmmm I may actually like what you are saying...

The caller is the one that purges immediately after the decode completes. So if we are going to purge anyway, why not purge while we go?

My reasoning was that they are different systems. The outer system knows it will purge right after. But the inner system doesn't know that.


The problem is maybe we'll change the caller to not purge every time if we decide the image is small enough to keep it in cache.


Ideally, the caller would be providing the cache so it has better control over all this. Right now, the decoder needs to assume what the caller wants.


But since right now we purge all the time anyway, maybe it should do that.
You're right about it - it is in the caller.
Maybe we could supply purgeAggresively parameter to ImageDecoder API in frameBufferAtIndex() call (and avoid the need for computation).
Cc: klo...@chromium.org
Labels: -Pri-2 Pri-1
Chris, what is the current status of this? There are some reported bugs due to this issue.
This landed back in comment #28 and I was able to test that it was fixed (comment #21).

The conversation after was about applying a similar patch for WebP.

I would like to do that but haven't yet. WebP has keyframes and won't need to do as much catching up, so it also won't have as much memory pressure.


Do you mean that there are reported bugs relating to WebP? Or perhaps the fix hasn't made its way to stable yet?

Oh, do you know whether the fixed made into M53, which branched on 30th? The date of #28 is 28th.

The referenced b/27334223 is still open. Similar for b/30677447. If we believe this is fixed, we should update there. Thanks.
Status: Fixed (was: Assigned)
Hrmmm, I guess it would have made it into M53 then. Are there reported problems when using M53?

I see b/30677447 is recent. But I suspect it is still M52. I'm reading over those bugs now and I suspect they have been fixed. I'll update those bugs.



Also, I'm pushing to fix the deeper, root cause -- synchronizing animated images. This is what caused us to use so much memory. This fix solves the memory problem. But there is still a CPU cost. There is a tread about it now on paint-dev@ and loading-dev@
I have made a separate bug for WebP --  crbug.com/636205 
Thanks for the update.

Sign in to add a comment