Image.decode() fails/throw exception inconsistently
Reported by
p...@sketchfab.com,
Jul 5
|
||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36 Steps to reproduce the problem: 1. https://sketchfab.com/models/7514863f948345469625ae366e14092c/embed?autostart=1 2. Wait for end of loading of the blue bar on top of the page 3. See that the Image::Decode of 4k image fails inconsistently (ie: https://media.sketchfab.com/urls/7514863f948345469625ae366e14092c/dist/textures/ca9825e841f247529240976b5daaff63/9dbf88bb2aee4b5595430bd3dc4dfe9b.jpeg ) What is the expected behavior? Image should be decoded allow promise to be resolved, as the Image is correct (and works when not using decode()) What went wrong? - Image::Decode should not fail more in asynchronously than synchronously - Image::Decode should fail or work consistently Did this work before? N/A Does this work in other browsers? N/A Chrome version: 67.0.3396.99 Channel: stable OS Version: 10.0 Flash Version: Using Image.decode as documented here https://www.chromestatus.com/feature/5637156160667648
,
Jul 6
paul@ Thanks for the issue. Tested this issue on Windows 10 and Mac OS 10.13.3 on the reported version 67.0.3396.99 and the latest Canary 69.0.3483.0 and unable to reproduce the issue by following the below steps. 1. Launched Chrome and navigated to the above given link. 2. Waited for the page to be loaded and then opened devtools -> Console and couldn't observe any exception being thrown. Attached is the screen cast for reference. Request you to check and confirm if anything is missed from our end in triging the issue. Thanks..
,
Jul 6
You probably reproduced the bug as you got a browser freeze at 0.39-0:40. If you had been rotating the model while model loaded the rendering would stop for 500-1sec, thus effectively freezing during the image decode done synchronously instead of asynchronously. There is no error in console or wherever because I'm afraid we had to handle the decode reject/error promise on our production code now, so use still get the image loaded even if the asynchronous decoding failed. To see it I guess you have two solution: - You'll have to debug chrome and breakpoint in chrome code when it output a fail/reject promise on the decode() call. - much less indicative of the bug but still hinting at the error, you'll have to use the profiler (devtools=>performance=>record) until the model is complete, run multiples session and see the inconsistent decode call in the flamegraph section called "TaskSchedulaerForegroundBlockingWorker"(toward the bottom of the profiler and under the "teximage2D" calls). if it's in the thread it works, if it's in the main thread, under the "texImage2d" it failed Our use case is to alleviate the huge synchronous loading time of texture on webgl, clearly shown by the "decode" part, it's quite a big difference in UX.
,
Jul 6
Note: As usual when it concerns multithreading code, It's not easy to reproduce and to debug, sadly. Probably would be better if image:decode author (vmpstr@chromium.org, domenic@chromium.org) could check it out themselves.
,
Jul 6
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 9
[+vmpstr, domenic per comment 4]
,
Jul 11
vmpstr@, could you try to repro? Then re-assign?
,
Jul 11
This is what I'm seeing that is happening from chrome side: (note the sizes are 4 * width * height of the image, the decoded size) - we queue a decode request (1) for a 524288 byte image - we queue a decode request (2) for a 16384 byte image - decode for request 1 succeeds - decode for request 2 succeeds - we queue a decode request (3) for a 65536 byte image - we queue a decode request (4) for a 65536 byte image - decode for request 3 succeeds - decode for request 4 succeeds - we queue a decode request (5) for a 65536 byte image - we queue a decode request (6) for a 65536 byte image - we queue a decode request (7) for a 65536 byte image - decode for request 5 succeeds - decode for request 6 succeeds - decode for request 7 succeeds - we queue a decode request (8) for a 3686400 byte image - decode request 8 succeeds - we queue a decode request (9) for a 67108864 byte image - we queue a decode request (10) for a 67108864 byte image - we queue a decode request (11) for a 67108864 byte image - we queue a decode request (12) for a 67108864 byte image - we queue a decode request (13) for a 67108864 byte image - decode request 13 fails on the spot, because we're out of the discardable memory budget (256mb). - decode request 9 succeeds - decode request 10 succeeds - we queue a decode request (14) for a 8294400 byte image - decode request 11 succeeds - decode request 12 succeeds - decode request 13 fails (the failure was already determined earlier) - decode request 14 succeeds So one of the decodes does fail, and from the pattern we're seeing here, it's somewhat expected.. if we request enough decodes at once, we will run out of memory and would not be able to hold all of the decodes locked at the same time. It's expected because the intent of the API is to provide a performance win _if possible_. We do limit the amount of memory we can lock at the same time. FWIW, this failure (and handling the promise rejection as the site is doing) should be equivalent to having an onload handler without the decode API. paul@, could you elaborate on the pattern this site is using when requesting image decodes? Is it in fact requesting this many decodes close together? If not, it might be the case that these decodes are somehow being duplicated within chrome, which is unexpected. The other thing that we have is that when we receive a decode, we budget the memory for it right there and then. So if it doesn't fit, it's destined to fail. However, by the time the decode actually runs, which may be some time later, we might have more available memory so it's possible that we may still fit the decode. However, this would only change the timing of the events, we may still run into the out of budgeted memory situation later. +khushalsagar, +ericrk FYI
,
Jul 11
btw, the 67108864 byte requests correspond to 4096x4096 images
,
Jul 11
Also note that this site is using webgl, so it's unclear to me how the decode api would help here.. The decode api locks the images in the image cache for "regular web content". I _think_ webgl would not use this cache and do a decode anyway.
,
Jul 11
Thanks for bringing this up, I've filed issue 862686 for potentially supporting the decode API for canvas use-cases. But I doubt we'll be getting to it anytime soon. As noted in #10, img.decode API wouldn't help if an image is used with webgl or 2d canvas since the decode is cached in a different part of the stack for when an img is used directly in the dom. Secondly, the API is not a great fit for when multiple decodes are requested since we try to ensure that all pending images can fit in our cache's budget and start rejecting decodes if that limit is breached. We could potentially throttle budget allocation for these requests, since the actual decode work is scheduled one by one anyway. But the same could be done by the page itself, try to limit the number of outstanding decode requests. Also, if all you want is for the decode to be done asynchronously and for the images to appear one by one as they are decoded with no other side-effect, consider using the async attribute on the img. That allows us to decode images off the critical path and prioritize based on visibility. But the async attribute is also not supported for canvas at the moment. Sigh.
,
Jul 12
ok, htanks for the followup. Seems "Error in decode" mislead me. (perhaps the Exception or the error message could be clearer) We load in batch the texture to load in webgl, the decode() seemed the perfect fit so that we could avoid the freeze we get on texture load. Using the Profiler, when we tested, it seemed when the image::decode works, it doesn't appear anymore in the flame graph under the texImage2d call ? Is it the profiler that is giving wrong info ? Or is there some caching of decoding that would breaks on profiles run ? If the decode works for those, even with those fails due to too much batching, it would still be worth it. The less those complete freeze the better. Before the subject comes, here's why we don't user worker/createImageBitmap: https://twitter.com/stephomi/status/987009396726226945 Our use case with loading several 4k textures is not an easy one, for memory, threading and cpu :)
,
Jul 12
@ericrk Ah ok, We thought it would be handled by the decode thread, taking advantage of multicore cpu to allow multiple decode at once. A good "pattern" would be to chain the decode() then avoiding multiple calls at once ? (again with the big "if" it does works in our webgl case)
,
Jul 12
Chaining decodes will certainly allow you to avoid the particular failure you saw. It should also be possible to batch them, but limit the total size of the in-flight decodes by some amount. I'm going to assign this to khushalsagar to investigate what is happening with webgl. It seems that there is a benefit to calling decode. Is it possible that it's the decoder that is doing the caching here?
,
Jul 12
paul@, could you specify what framework you're using to verify the impact of using the decode API? I used chrome://tracing to validate that the img.decode API is interacting with canvas the way we'd expect and it is working as expected, though not desired. I'm attaching these traces here as well. 1) trace_webgl_decode_tot is a tot build. Between 8-10s you can see 5 decodes running on TaskSchedulerForegroundWorker, which is the decode work from img.decode on a background thread. Then between 15-16s you can see the frame that uses these images and the decode happening again on CrRendererMain, which is the app's main thread. As expected, img.decode warms up a different cache that is not used by WebGL so you have to re-decode when you actually use the image in a draw. 1) trace_webgl_debug is with a local chrome build that effectively disables img.decode. It resolves the promise without actually doing the decode work. The traces on the background thread disappear but the main thread is still the same. At the moment, img.decode can't be used with webgl...
,
Jul 13
We used Used "Devtools::Performance profiler flamegraph".
It wasn't clear enough to be sure ( between cached decode and decode fails because of no chaining).
As we don't know if is there a decode cache ? is there a way to disable it ("disable cache" doesn't work)?
With Decode(), Decode part under GPU "seemed" smaller if any but the chrome:trace seems clear enough that's not the case then.
,
Jul 20
There isn't a way to disable the decode cache from script, but on load the first use of the image should always be a cache miss. Thanks for filing the bug, the unfortunate answer here is that the decode API is not ready for use with canvas yet. You can follow issue 862686 to track progress on that. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by krajshree@chromium.org
, Jul 5