Decoding 1280x720 gif takes ~1G of ram |
|||||
Issue descriptionIt's this google logo gif: https://www.google.com/url?q=http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif&sa=D&usg=AFQjCNHaa3X6dNNBosMuM_1D4GYJhR1FlQ Came from internal bug: https://buganizer.corp.google.com/issues/27334223#comment49 I'll just attach the trace here. I'm using trunk today (r399173). There are 3 batches of decode in the trace, each getting slower and slower, because OS is killing more and more apps in the background. First batch was fast and memory infra didn't get to sample memory in between. But third batch is slow enough that memory infra managed to show ~900MB of memory under "malloc" during decode. It's freed immediately after decode is done, according to memory infra anyway. 1G for that thing seems kinda ridiculous.. In this particular use case, the image is shrunk down to 96x54 though. So tricks that help with this case, even if not in general, might be useful too.
,
Jun 10 2016
At that resolution, each 32 BPP frame would be 3.6 MB. And there are many frames. ~900 MB / 3.6 MB per frame = ~250 frames. I believe that. I am a little surprised the cache isn't limiting this. Maybe CC isn't telling the decoder to free unused frames from the decoder's cache (and maybe the decoder's cache isn't limited). I will investigate when I can.
,
Jun 10 2016
Re #1, does that style="margin-right: 0px;" influence it at all?
,
Jun 10 2016
> ~900 MB / 3.6 MB per frame = ~250 frames. I believe that. we are decoding all the frames all at once? :o
,
Jun 10 2016
Re #3, it doesn't seem to. The repro is a little fuzzy, but I sent myself an HTML email with <img src="http://o.aolcdn.com/hss/storage/midas/23cdfe3da345f994879224ae26769e13/202565355/OGB-INSIDER-BLOGS-GoogleLogox2-Animated.gif" width="66" height="37"> in the body through go/send-raw-email. Then: 1. Open a fresh gmail 2. Open the message 3. The image displays and animates, memory usage looks normal. Let it play all the way through. 4. Go back to the inbox 5. Open the same message again 6. Gmail closes, sometimes before even showing the image. If it doesn't crash, repeat at step 4
,
Jun 10 2016
Re #4 No, we are not decoding all the frames at once. They are decoded on demand and cached within the decoder. This is useful because frame #N might require frame #N-1 or #N-2. The frames then stay in cache unless CC tells the decoder to purge its cache of all frames it doesn't need (so it might keep frame #N-1 or #N-2 around).
,
Jun 11 2016
,
Jun 11 2016
> Re #4 No, we are not decoding all the frames at once. If you look at the trace, a single ImageFrameGenerator::decode event allocates that 900MB.. that's what I'm confused about.
,
Jun 11 2016
Re #8 hrmmm that might be trying to catch up with a frame missing? I'll have to investigate to confirm but it sounds like it. Right now our save-past-frames is slightly hackish so it is entirely possible. Here is how it works: If we are decoding frame 250, but it references frame 247 which we don't have, we need to decode frames 1-247. All of those will be cached. That is how decoding one frame could actually lead to many frames. Right now we only store the last two frames, which covers a huge majority of all animated gifs. But a smarter solution would be to do a first pass to detect which frames reference which other frames and be more intelligent about storing them. When I get around to this task I'll analyze that gif and figure out what it is doing.
,
Jun 13 2016
the other thing I wasn't clear on is why is the allocation under malloc, as opposed to discardable memory. did memory-infra tracing incorrectly put it into the wrong category, or is that really malloc memory?
,
Jun 13 2016
Re #10 - That, I don't know. I'll have to investigate.
,
Jun 13 2016
> the other thing I wasn't clear on is why is the allocation under malloc, as opposed to discardable memory. did memory-infra tracing incorrectly put it into the wrong category, or is that really malloc memory? Are you talking about the trace in #0? I miss the context of all this bug but from that trace the value reported by mallinfo() itself (the "allocated_objects" row of the "malloc" column) is monstruous and that is 100% coming from malloc(). Not sure how you pinned it down to cc, but definitely you guys have more context-specific knowledge than me. If you want to be 100% sure that comes from cc you can try use the heap profiler (https://chromium.googlesource.com/chromium/src/+/master/components/tracing/docs/heap_profiler.md) I never tried with webview but should work there.
,
Jun 13 2016
Just took a trace with malloc stack turned on (with instructions dskiba just sent out). The allocations all happened in GIFLZWContext::doLZW. Full stack below. Now do we have any LZW compression experts..? base::SimpleThread::ThreadMain() content::CategorizedWorkerPool::Run(std::__1::vector<cc::TaskCategory, std::__1::allocator<cc::TaskCategory> > const&, base::ConditionVariable*) content::CategorizedWorkerPool::RunTaskWithLockAcquired(std::__1::vector<cc::TaskCategory, std::__1::allocator<cc::TaskCategory> > const&) content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired(cc::TaskCategory) RunOnWorkerThread cc::SoftwareImageDecodeController::DecodeImage(cc::ImageDecodeControllerKey const&, cc::DrawImage const&) cc::SoftwareImageDecodeController::DecodeImageInternal(cc::ImageDecodeControllerKey const&, cc::DrawImage const&) cc::SoftwareImageDecodeController::GetOriginalImageDecode(cc::ImageDecodeControllerKey const&, sk_sp<SkImage const>) SkImage::readPixels(SkImageInfo const&, void*, unsigned int, int, int, SkImage::CachingHint) const SkImage_Generator::onReadPixels(SkImageInfo const&, void*, unsigned int, int, int, SkImage::CachingHint) const SkImageCacherator::directGeneratePixels(SkImageInfo const&, void*, unsigned int, int, int) SkImageGenerator::getPixels(SkImageInfo const&, void*, unsigned int) blink::DecodingImageGenerator::onGetPixels(SkImageInfo const&, void*, unsigned int, unsigned int*, int*) blink::ImageFrameGenerator::decodeAndScale(blink::SegmentReader*, bool, unsigned int, SkImageInfo const&, void*, unsigned int) blink::ImageFrameGenerator::tryToResumeDecode(blink::SegmentReader*, bool, unsigned int, SkTSize<int> const&, SkBitmap::Allocator*) blink::ImageFrameGenerator::decode(blink::SegmentReader*, bool, unsigned int, blink::ImageDecoder**, SkBitmap*, SkBitmap::Allocator*) blink::ImageDecoder::frameBufferAtIndex(unsigned int) blink::GIFImageDecoder::decode(unsigned int) GIFImageReader::decode(unsigned int) GIFFrameContext::decode(blink::FastSharedBufferReader*, blink::GIFImageDecoder*, bool*) GIFLZWContext::doLZW(unsigned char const*, unsigned int)
,
Jun 16 2016
Toby and I looked at the code here a bit and it wasn't obvious what the allocation might be or whether or not there was anything here that was based on the decoded frame size instead of just the gif file size. Can anyone from image decoder land suggest how we can work out why this gif takes so much memory?
,
Jun 16 2016
I think it was easier for me to reproduce when: - the gif is scaled down in the html - the user scrolls in the webview while the gif is loading
,
Jun 16 2016
Yeah, it appeared to use significantly more memory even on desktop Linux for me when the gif was scaled down compared to rendering at native size.
,
Jun 16 2016
I couldn't get it to reproduce with the webview shell and the following URL: http://jsbin.com/siwevehipo
,
Jun 16 2016
Maybe the scrolling is relevant, then - perhaps having the gif partially onscreen by a different amount in different frames is causing something different to happen?
,
Jun 16 2016
I tried that; that's why there's all the extra text below it. Also tried different resizings. The worst it got to was ~200M of PSS.
,
Jun 16 2016
I believe I have found the cause. To help paint the picture, imagine we have missed a few animation frames. CC will ask for the frame at the current time, meaning the image decoder needs to catch up. So when we are decoding frame 100, we may actually decode frames 50-99 first because we missed those frames. That catch up phase happens inside the gif decoder and does not return execution to CC until it has caught up. This means that CC does not have a chance to tell the gif decoder to purge its cache. I added a trace event inside GIFImageDecoder::decode to show the frame it is currently decoding. You will see that when it is catching up, it takes significantly more time and the memory grows. Once it has completed, CC gets to ask the decoder to purge and the memory usage drops back down. In the attached trace, you will see a row for CompositorTileWorkerBackground with a GIANT green ImageFrameGenerator::decode call on the right-of-center. Inside it are many frames for the catch up + the growing memory. And shortly after it, the memory drops back down. If it crashed, the memory won't get to drop back down. I will now see if purging-while-catching-up fixes it.
,
Jun 16 2016
Sure enough, purging while we catch up reduces memory (surprise). You can see in this new trace that there are no more memory issues. Unfortunately, the solution I wrote just now is potentially a complete loss of cache. It maintains the needed frames, but when you loop back to the beginning it will not have the previously-decoded frames. Maybe we could be more careful about purging-while-catching-up only if the cache would be large. Or we could set a cache size limit. I only did this to sort of check if this was indeed the cause.
,
Jun 17 2016
seems the quickest and simplest change we can make is just disable the animation catch up dunno if there are any websites that assumes gifs animations are synchronizated to when they are inserted into the dom..
,
Jun 17 2016
Rather than disable the animation catch up, I am not enforcing the memory limit we were already being given. If the GIF would remain below the memory limit then we can cache the frames and everything is lovely. If the GIF would break the memory limit then when we need to constrain it somehow. LRU wouldn't work because once the animation loops, the cached frames will be gone and there was no benefit to caching them. So instead I'm purging aggressively and not caching frames longer than I need to. You can see the change here: https://codereview.chromium.org/2075093002
,
Jun 17 2016
Where does that memory limit come from? How much is it typically? Just curious..
,
Jun 17 2016
It comes from here: https://cs.chromium.org/chromium/src/content/child/blink_platform_impl.cc?cl=GROK&gsn=maxDecodedImageBytes&rcl=1466133585&l=958 To summarize it: - if Android --- if low end device ----- 3 MB limit --- else ----- physical memory / 25 - else --- no limit
,
Jun 17 2016
,
Jun 17 2016
I should add, that is where the value is set. It is passed into the ImageDecoder when it is created: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/ImageDecoder.cpp?cl=GROK&gsn=maxDecodedImageBytes&rcl=1466162237&l=82 I haven't checked if other decoders respect this value.
,
Jun 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9899eade00c556b8086b74867f02edf8c98cb4b9 commit 9899eade00c556b8086b74867f02edf8c98cb4b9 Author: cblume <cblume@chromium.org> Date: Tue Jun 28 20:36:12 2016 Don't cache GIF frames if it would be too large. The GIF decoder now respects m_maxDecodedBytes and checks for overflow. If the GIF would go over the memory limit then cached frames provide no benefit. By the time the animation loops we would have been forced to purge those frames. So caching any extra (not currently needed) frames would be a waste. In that case, run lean and purge aggressively. R=scroggo@chromium.org BUG= 619173 Review-Url: https://codereview.chromium.org/2075093002 Cr-Commit-Position: refs/heads/master@{#402536} [modify] https://crrev.com/9899eade00c556b8086b74867f02edf8c98cb4b9/third_party/WebKit/Source/platform/image-decoders/gif/GIFImageDecoder.cpp [modify] https://crrev.com/9899eade00c556b8086b74867f02edf8c98cb4b9/third_party/WebKit/Source/platform/image-decoders/gif/GIFImageDecoder.h
,
Jun 29 2016
The code is GIF-specific, but don't forget that GIF is not the only animation format supported.
,
Jun 29 2016
Sure, but GIF is probably the only one crummy enough to have this bug (only one keyframe at the start).
,
Jun 29 2016
Hello Max, :) I took a look at webp (the only other animated image formate in Blink right now). It does the same catch-up that the GIF decoder does. Soon, I'll create a webp test case to see if the caching-while-decoding problem is the same. I suspect it is. If I'm understanding this code correctly, the current webp decoder won't go from the latest keyframe. It seems to add all catch-up frames to the to-be-decoded list. But I'll test and make sure.
,
Jul 19 2016
Why do you say that? WebP has the same "find required previous frame index" to stop at the most recent keyframe that GIF does. And the WebP transcoder tool purposefully inserts keyframes every so often when transcoding from GIF, to avoid the issues GIFs with no keyframes have.
,
Jul 20 2016
It looks like it will cache all the frames from the nearest decoded keyframe, which would mean it will create a memory spike as it catches up the way the gif decoder did. It might be a smaller spike if there are keyframes. But it would still be a spike. And it looks like it doesn't just start at the nearest keyframe...it looks like it starts at the nearest already-decoded keyframe. I could be wrong. Would you mind helping me understand it? WEBPImageDecoder::decodeSingleFrame seems to be the only place that sets a frame's status to FrameComplete. https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/webp/WEBPImageDecoder.cpp?q=WebPImageDecoder&sq=package:chromium&dr=CSs&l=499 WEBPImageDecoder::decodeSingleFrame seems to only be called from inside WEBPImageDecoder::decode https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/image-decoders/webp/WEBPImageDecoder.cpp?q=WebPImageDecoder&sq=package:chromium&dr=CSs&l=436 Inside WEBPImageDecoder::decode there is a do..while loop that goes from the current frame through each required previous frame, adding it to the list of frames to be decoded unless it sees that frame is already complete. After the do..while loop is the for loop which decodes each frame. So it looks like if we are doing a pre-catchup (no frames decoded yet) and we need frame 100, it will ignore if frame 98 is a keyframe and decode all the frames 1..100. Maybe I am misunderstanding this. Also, the WebP decoder isn't respecting m_maxDecodedBytes like how the GIF decoder wasn't.
,
Jul 20 2016
frame.requiredPreviousFrameIndex() is kNotFound for key (independent) frame. >So it looks like if we are doing a pre-catchup (no frames decoded yet) and we need frame 100, it will ignore if frame 98 is a keyframe and decode all the frames 1..100. Maybe I am misunderstanding this. do..while loop adding required previous frames would would go only until nearest keyframe - so decode only 98,99 and 100 (assuming 99 and 100 are not keyframes). clearCacheExceptFrame causes that only the latest complete (2 in case of DisposeOverwritePrevious in GIF) frame is kept in cache (ImageDecoder::m_framebufferCache). clearCacheExceptFrame is implemented in such way that if e.g. frame 98 was kept in cache, frame 100 could never require frame 97 or any other frame before. That said, purge aggressively set or not isn't causing any difference in number of frames required to decode - it is good to apply it always, not only to "large" files.
,
Jul 20 2016
Ah okay. Thank you! So the do..while loop goes from the current frame through each required previous frame until either 1.) the required previous frame is already decoded, or 2.) this frame is a keyframe (kNotFound). So it likely will do less catching up since it can catch up from a keyframe, most likely. But it is still doing some catching up, and adding the frames to the cache without checking if it is going to go over the budget. So this fix should still be applied to WebP. It is just less likely to be a problem. I don't think we should always purge aggressively. If the image is small it would be best to keep all frames in the cache. I only purge aggressively right now if the image is large enough that it would go past its budget.
,
Jul 20 2016
> I don't think we should always purge aggressively. If the image is small it would be best to keep all frames in the cache. I only purge aggressively right now if the image is large enough that it would go past its budget. Why it is needed to keep them in cache only during this decode() execution? Please note that all those frames are not used in decode once decoding advances to next frame and they are anyhow deleted from the cache as soon as decode returns: 1.decode called from frameBufferatIndex: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/ImageFrameGenerator.cpp?rcl=0&l=284 2. right after it clear all except the last completely decoded from the cache: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/ImageFrameGenerator.cpp?rcl=0&l=294 So, even for small images it is problematic to accumulate them in cache as they are never used but only allocated and accumulated and then deleted. I submitted this patch for review and it is fixing the issue on a bit different way: crrev.com/2155973002 Save a bitmap copy when advancing to dependent GIF animation frames Could you please review it? Thanks.
,
Jul 22 2016
Ah, I see what you are saying. Hrmmm I may actually like what you are saying... The caller is the one that purges immediately after the decode completes. So if we are going to purge anyway, why not purge while we go? My reasoning was that they are different systems. The outer system knows it will purge right after. But the inner system doesn't know that. The problem is maybe we'll change the caller to not purge every time if we decide the image is small enough to keep it in cache. Ideally, the caller would be providing the cache so it has better control over all this. Right now, the decoder needs to assume what the caller wants. But since right now we purge all the time anyway, maybe it should do that.
,
Jul 28 2016
You're right about it - it is in the caller. Maybe we could supply purgeAggresively parameter to ImageDecoder API in frameBufferAtIndex() call (and avoid the need for computation).
,
Aug 10 2016
Chris, what is the current status of this? There are some reported bugs due to this issue.
,
Aug 10 2016
This landed back in comment #28 and I was able to test that it was fixed (comment #21). The conversation after was about applying a similar patch for WebP. I would like to do that but haven't yet. WebP has keyframes and won't need to do as much catching up, so it also won't have as much memory pressure. Do you mean that there are reported bugs relating to WebP? Or perhaps the fix hasn't made its way to stable yet?
,
Aug 10 2016
Oh, do you know whether the fixed made into M53, which branched on 30th? The date of #28 is 28th. The referenced b/27334223 is still open. Similar for b/30677447. If we believe this is fixed, we should update there. Thanks.
,
Aug 10 2016
Hrmmm, I guess it would have made it into M53 then. Are there reported problems when using M53? I see b/30677447 is recent. But I suspect it is still M52. I'm reading over those bugs now and I suspect they have been fixed. I'll update those bugs. Also, I'm pushing to fix the deeper, root cause -- synchronizing animated images. This is what caused us to use so much memory. This fix solves the memory problem. But there is still a CPU cost. There is a tread about it now on paint-dev@ and loading-dev@
,
Aug 10 2016
I have made a separate bug for WebP -- crbug.com/636205
,
Aug 10 2016
Thanks for the update. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by ccross@google.com
, Jun 10 2016