Canvas drawImage calls 10x slower until orientation change
Reported by
j...@ironhelmet.com,
Sep 26 2016
|
||||||||
Issue descriptionExample URL: https://blight.ironhelmet.com Steps to reproduce the problem: 1. Load my large web game, create account, navigate to multiplayer, open one of the games, observe the unresponsive map movement. 2. Change the device orientation, 3. observe smooth responsive map movement. A youtube video that shows the issue. https://youtu.be/7rSqkJuOQjc What is the expected behavior? the map would render smoothly without requiring an orientation change. What went wrong? In our javaScript game we are having a problem where "touchmove" events are slow and "janky" until an orientation change at which point they become smooth again. The slowness returns when the game does a lot of work like loading a level. Here is a video that show the slow touch move in action, and the orientation change which fixes the issue. https://youtu.be/7rSqkJuOQjc ---- The following warning does appear in the log > Handling of 'touchmove' input event was delayed for 141 ms due to main thread being busy. Consider marking event handler as 'passive' to make the page more responive. However, the slow events persist well after the game has stopped doing heavy lifting like loading a level and building sprites. I cannot use passive handlers in any case becuase panning the map must preventDefault to stop the interface scrolling as well. ---- I assume the slowness is related to touch events as I have measured how often the touch handlers are called and they seem strange. Will receive 5-10 events in a row in under ~20ms, then there will be one that is about 150-200ms, then another batch of fast events, and another slow one. After the orientation change, all events are received in under 20ms ---- I have also measured my responses from requestAnimationFrame and they seem to be a fairly solid 16.8ms ---- The problems do not occur on Safari on iOS or any tested Desktop browser. ---- if this is not a bug, Is possible to force an "orientation change" or whatever happens during an orientation change, using javaScript? Does it occur on multiple sites: N/A Is it a problem with a plugin? No Did this work before? Yes Not sure exactly Does this work in other browsers? Yes Chrome version: 53.0.2785.124 Channel: stable OS Version: Android 6.0.1 Flash Version: Shockwave Flash 23.0 r0
,
Sep 26 2016
Are you calling preventDefault() from the touch handler? My theory is that the browser thinks we're scrolling the page and therefore deprioritizes touch handling.
,
Sep 26 2016
Yes, I'm calling preventDefault specifically to prevent the UI scrolling while the player is panning the map.
,
Sep 26 2016
Also, can I do anything in javascript to prevent this behaviour, and what does an orientation change do that re-prioritises touch handling.
,
Sep 27 2016
Have you tried using a touch-action: none on the map? Likewise a trace might yield us some useful data. skyostil@ what trace parameters do you need to see throttling?
,
Sep 27 2016
Hey perfect, that did the trick. I didn't know about the touch-action property. Thanks for the heads up. I'll some more tests later but I think this fixes my issues.
,
Sep 28 2016
jay@ if you can provide a trace we'd like to understand the cause in case someone else encounters it because it is pretty odd behaviour you describe. I'm glad you were able to fix it though touch-action but I'm not sure why it did help.
,
Sep 28 2016
If you can point me towards some instructions for how to make the trace you need I would be happy to help. I discovered looking at my css that touch-action: manipulation; was set on element right at the top of my UI DIV hierarchy. I have no recollection of turning it on, but after googling it a little today it was probably to prevent double tap zooming, or perhaps something to do with 300ms delay.
,
Sep 29 2016
It sounds to me like either preventDefault wasn't reliably being called on the touchstart event, or we were somehow loosing that fact in Chrome somewhere. If you're calling preventDefault on every touch event over an element, that should behave identically to 'touch-action: none'. The way I typically validate this is by injecting script like the following via Chrome devtools:
var origpd = TouchEvent.prototype.preventDefault;
TouchEvent.prototype.preventDefault = function() {
console.log("preventDefault " + this.type);
origpd.call(this);
};
Regardless, 'touch-action: none' is a better solution anyway - tells the browser declaratively that you don't want any touch scrolling or zooming (and yes, also disables the annoying 300ms click delay you get from the double-tap-to-zoom feature).
,
Sep 29 2016
I'm not calling preventDefault on touchstart only the touchmove. Would I need it on the start as well?
,
Sep 29 2016
Calling on either touchstart or ALL touchmove events should prevent scrolling (the difference being that calling it on touchstart will also prevent the generation of 'click' events and associated browser tap behavior). But if you don't call preventDefault on touchstart, and only some of the touchmove events in a sequence but not all, then a scroll could start (leading to throttled low-frequency touchmove events).
,
Sep 29 2016
Hrm, the only line in the touch handler before event.preventDefault is a line that aborts the handler if the event.target is not the map itself. (So you can't scroll the map by interacting with a screen open over the top of the map.) Anyhow, let me know if you want that trace.
,
Sep 29 2016
I was able to reproduce this and touch-action:none doesn't appear to help. I verified that preventDefault is indeed called on every touchmove and the touchmoves are marked cancelable=false so we're not actually scrolling. It looks to me like there's just a ton of time being spent perf frame in the blight JavaScript when panning the map. See the ~200ms long JS blocks in the attached devtools timeline profile. A JavaScript profile shows all the CPU time is being spent in dragImage and fillRect calls from 'drawSprite' and 'fillGrass' functions in blight (respectively). After rotating, when I pan the animation frame calls are ~1-2ms instead of the ~100ms-200ms seen previously. It looks like they're just doing a LOT less work. I suspect some sort of bug in the app that is just doing a ton of extra (redundant) drawing for some reason.
,
Sep 29 2016
I did a quick check using the below script and don't see drawImage being called a lot more before rotate than after (~50 in both cases). Maybe one particular drawImage call is particularly expensive or something?
var origdi = CanvasRenderingContext2D.prototype.drawImage;
var drawImageCount = 0;
function tick() {
if (drawImageCount) {
console.log("drawImage calls per frame: " + drawImageCount);
drawImageCount = 0;
}
requestAnimationFrame(tick);
}
CanvasRenderingContext2D.prototype.drawImage = function() {
drawImageCount++;
return origdi.apply(this, arguments);
};
requestAnimationFrame(tick);
,
Sep 29 2016
Ugg, sorry for leading us down the wrong path. I did a very quick test on another project that uses the same map renderer and it seemed to fix this issue. I jumped on the touch move thing becuase requestAnimationFrame times seemed smooth.
,
Sep 29 2016
No worries, we have a bunch of different complex things in Chrome that could have caused a similar symptom (although I should have realized the "main thread is busy for ~140ms" warning on the console was a sign that it probably wasn't our scrolling code). So there's no evidence of a browser bug here, right? Hopefully you can debug from here - eg. measure the times of your different drawImage calls and try to see why some are very slow? Of course if Canvas is somehow being crazy and having very slow drawImage/fillRect calls for no apparent good reason, then you should file another bug for that. But I hopefully it's just some sort of bug in the app that is resulting in the slow graphics operations :-)
,
Sep 29 2016
Errr no, I still think there is a browser bug here, why would exactly the same functions run significantly faster after an orientation change? Also remember Chrome on Android is the only platform there is an issue. I have no problems on iOS or any Desktop browser. Perhaps the loaded textures are slow to read or write for some reason? A problem with the canvas i'm painting into?
,
Sep 29 2016
Ok, well if there is a browser bug then it's in Canvas not input, so changing the labels and +junov.
I injected this instrumentation:
var origdi = CanvasRenderingContext2D.prototype.drawImage;
CanvasRenderingContext2D.prototype.drawImage = function() {
var s = performance.now();
origdi.apply(this, arguments);
console.log('drawImage', performance.now() - s, arguments[0].src, arguments);
};
And found it's drawing the 1024x1024 JPEG tiles that are sometimes slow (~20ms) before rotate, and fast after:
VM223:5 drawImage 1.7150000000037835 undefined [canvas, 0, 0, 144, 228, -72, -228, 144, 228]
VM223:5 drawImage 0.030000000006111804 undefined [canvas, 0, 0, 144, 228, -72, -228, 144, 228]
VM223:5 drawImage 0.024999999994179234 undefined [canvas, 0, 0, 144, 228, -72, -228, 144, 228]
VM223:5 drawImage 0.02500000000145519 undefined [canvas, 0, 0, 144, 228, -72, -228, 144, 228]
VM223:5 drawImage 1.4700000000011642 https://blight.ironhelmet.com/images/map/pin_selection.png [img, 0, 0, 128, 192, -64, -192, 128, 192]
VM223:5 drawImage 1.6999999999970896 undefined [canvas, 0, 0, 144, 228, -72, -228, 144, 228]
VM223:5 drawImage 5.875000000007276 https://blight.ironhelmet.com/images/missions/celestial/004/4_4.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
VM223:5 drawImage 21.55500000000029 https://blight.ironhelmet.com/images/missions/celestial/004/4_5.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
VM223:5 drawImage 1.8300000000017462 https://blight.ironhelmet.com/images/missions/celestial/004/4_6.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
VM223:5 drawImage 6.404999999998836 https://blight.ironhelmet.com/images/missions/celestial/004/5_4.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
VM223:5 drawImage 22.05000000000291 https://blight.ironhelmet.com/images/missions/celestial/004/5_5.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
VM223:5 drawImage 1.9650000000037835 https://blight.ironhelmet.com/images/missions/celestial/004/5_6.jpeg [img, 0, 0, 1024, 1024, -0, -0, 1024, 1024]
I don't see anything interesting in a trace, just all unattributed v8 time. I don't know enough about canvas to debug why some drawImage calls might be an order of magnitude slower sometimes. junov: can you please triage / route further? I reproduced this on a Nexus 6p running Chrome 54.0.2840.34
,
Sep 29 2016
Oh actually, I do see something in a trace - a bunch of ~20ms ImageFrameGenerator::tryToResumeDecode entries. Perhaps we're not caching the decoded JPGs properly for some reason?
,
Sep 29 2016
thanks again for looking into this rbyers!
,
Sep 29 2016
No problem! Attached is a more detailed timeline with pain and JS profile data - makes it look like image decodes is indeed part of the problem but not necessarily all of it.
Just for fun I tried caching the HTMLImageElements into an ImageBitmap (thinking that would force a decode and save the results) as follows:
var origdi = CanvasRenderingContext2D.prototype.drawImage;
var cachedImages = {};
CanvasRenderingContext2D.prototype.drawImage = function() {
if (arguments[0] instanceof HTMLImageElement) {
var key = arguments[0].src;
if (key in cachedImages && cachedImages[key]) {
arguments[0] = cachedImages[key];
} else {
cachedImages[key] = null;
createImageBitmap(arguments[0]).then(function(bitmap) {
cachedImages[key] = bitmap;
});
}
}
origdi.apply(this, arguments);
};
But that didn't appear to help much (even when all the images were cached and so HTMLImageElements were never being passed directly to drawImage). I no longer saw the image decode tasks in the timeline, but the drawImage calls were often still quite expensive.
Part of the problem may be all the anti-aliasing you're doing (both the transform and scale values used in the 'drawSprite' function are non-integer). I tried setting r.context.imageSmoothingEnabled=false and the performance got much better. But it was still a little smoother after rotating the device.
,
Sep 29 2016
Yes, I also noticed that reducing my pixelRatio from 3 on the phone to 1 helped the frame rate a lot. (pixelRatio = devicePixelRatio / backingStoreRatio) I've never seen 3 before. My Laptop is 2 and normal monitor is just 1. I'm going to cap it at 2 in future. (But I wont deploy any changes while we try and get to the bottom of this issue.) I will have a look at the visual impact of disabling smoothing.
,
Sep 29 2016
Also, if you are interested, you can get a version of the game with uncompresses scripts buy using the URL http://blight.ironhelmet.com/u#/game/4726641566679040 (note the u before the hashtag)
,
Sep 29 2016
,
Sep 30 2016
I have attached tracing for both good and bad case. In the bad case, the rendering takes about 5-6 times longer than the good case. Also, in the bad case, scheduler seems to be waiting for the compositor to finish playing back. It makes me thinking that maybe we are having gpu-accelerated canvas in the good case, and having software canvas in the bad case. I will have a chromium build to test my theory.
,
Sep 30 2016
From the traces it is easy to confirm that gpu acceleration is being used in the good case (Canvas2DLayerBridge::flush) on the main thread. In the bad case, we are in software rendering mode (non-display list). You should try to figure out why the initial canvas is not getting gpu acceleration. (Which fallback condition is kicking in). Either HTMLCanvasElement::shouldAccelerate is return false for some reason, or disableAcceleration is getting called due to a performance heuristic kicking in.
,
Sep 30 2016
Hi Jay,
What I found is that in the javascript you have this:
canvas = document.createElement('canvas');
in a for loop. My guess is that these are temporary canvases, and when the game starts, chrome detects that there are too many canvases and it decides to not gpu-accelerate. But once we change the orientation of the phone, garbage collection kicks in, and those temporary canvases all go away, then we are running gpu-accelerated canvas.
If these canvas are temporary, could you set their width and height to be 0, once you are done with them in the for loop. I believe that will fix the problem. Thanks.
,
Oct 1 2016
unfortunately they aren't temporary - I use canvases as composites of text and images to form a larger sprite. (less draw calls, no text rendering while trying to scroll the map). They are rebuilt very often, every time we receive an update from the server. I would think that if there was some issue with too many canvases it would kick in as the user plays, but I could not force the problem to happen by just playing. I attempted to write a function to zero off the size of each canvas before discarding them and building new ones, but for some reason this made things worse and the orientation change stopped fixing the problem. There is an issue where the canvas sprites are built three times when the page is first loaded. I fixed that but it had no impact on this issue. Is there any way I can help you identify why HTMLCanvasElement::shouldAccelerate is returning false for some reason, or disableAcceleration is getting called? Something to do with other transforms on the page perhaps? Perhaps the fact that images are not completely loaded before I start drawing them into the canvases? (whereas after an orientation change the are?) edit: in some more tests this afternoon I discovered that the place sprites that are 264*320 each. If I half the height of each of these to 264x160, but create twice as many of them, the problem seems to go away. Would it make sense that it does?
,
Oct 3 2016
When we detect that the total number of HTMLCanvasElement is larger than 25 (for android), we turn off gpu-acceleration because large number of canvas could take a lot of gpu resources. That is the reason why HTMLCanvasElement::shouldAccelerate returns false when the game starts. When I was debugging last time, I think I saw a lot of canvases gets gpu-acceleration disabled and that is the reason. But when I change the orientation, HTMLCanvasElement::shouldAccelerate doesn't return false anymore. Could you confirm something: when the game starts, the script should go into the for loop that creates the canvas elements (which will cause slowness), and when you change orientation, it should not go into those for loops. If the above is true, then the reason that it is slow at the beginning is because the script is trying to create too many canvases, which disables gpu-acceleration for canvases. I am confused why the script won't go into the for loop when orientation is changed, maybe you can do some debugging on that? Please let us know. Thanks.
,
Oct 3 2016
The for loops will create canvases every time we receive new data from the server, say when the player submits a turn. The orientation change is not an event I explicitly catch. I do have some events that trigger with the screen size changes, but I dont need to recreate the canvases then becuase the data in them is the same. So a little more detail on this issue (and some questions) The games make a canvas for each army and city in the game. It seems the city canvas sprites are somehow related to this problem. The canvas for each city is 264x320. It my tests to see if I was using too much canvases or canvas memory I tried halving the height of each city canvas to 264x160. This solves the problem. Then I noticed that if I make twice as many canvases at the smaller size the problems is still fixed. Then I compared a small and large map. 29 cities vs 230 cities. Even the large map, with nearly 8 times as many cities is fixed by rendering the cities as two small canvases rather than one full size canvases. (460 Canvases) I don't really want to ship my game using twice as many canvases (and twice and many drawImage calls) so would be nice to know if there was some other way around the issue. Is there something about the larger canvases that causes them to be inefficient until an orientation change? Are smaller canvases all packed into a single larger canvas? Should I just make one huge canvas and pull cities out of it like a sprite sheet? How big is the maximum size canvas? Are there any other tests you want me to do to try and narrow down the issue?
,
Oct 3 2016
trial and error tells me it's canvases on or around 256^2 in various dimensions.
,
Oct 3 2016
--It my tests to see if I was using too much canvases or canvas memory I tried halving the height of each city canvas to 264x160. This solves the problem. That makes sense. Chrome will enable gpu-acceleration on canvas only when the canvas size is larger than 256*257. --Then I noticed that if I make twice as many canvases at the smaller size the problems is still fixed. That would also make sense. Rendering two 264*160 canvases should be faster than rendering one 264*320 canvas when both of them are software. This is because chrome will use the power of multi-core cpu. The root cause of the slowness is that the game is creating many canvases (>25), and then chrome decided to use software rendering on these canvases. As you mentioned, most of the canvases are about 264*320 and do software rendering on that will be much slower than using gpu-acceleration. --Should I just make one huge canvas and pull cities out of it like a sprite sheet? How big is the maximum size canvas? If this is not hard to do on your side, could you please give it a try. Using a small number of large canvases is certainly better than rendering a bunch of 256^2 canvases. I would suggest to keep the size of large canvas smaller than 4096 * 4096. Thanks.
,
Oct 3 2016
Also, I think a very important thing to make sure is that when the game starts, the number of created canvases should be smaller than 25.
,
Oct 3 2016
I disagree. Game developers should not have to design around our own performance heuristics. Besides, the threshold of 25 is sort of arbitrary and is not set in stone. It should not be a design parameter for web developers. Chrome just needs to do a better job of scaling for different use cases. The big problem here is that when you have a mix of gpu-accelerated and non-accelerated canvases, you can end-up in a situations where you are drawing a gpu-accelerated canvas into a non-accelerated canvas, which is the worst thing for performance because it causes gpu readbacks, which are notoriously slow. It would probably be faster to just have everything non-accelerated at that point. I am currently working on a fix for another bug, which I think will address this one at the same time. The fix is a bit counter-intuitive: it will consist in down grading a canvas to be non accelerated whenever it is used as a source image for drawing into a non-accelerated canvas. This will prevent the app from paying a gpu readback tax on every single animation frame.
,
Oct 3 2016
Thank for your help on this. Hey err, while I have your attention, I've started looking at windows. I normally do all my dev on Mac so didn't notice how bad windows was. The game is silky smooth in Firefox and Edge but feels like a similar software rendering issues on windows. Are there canvas constraints there as well? This bug might not be the best place to discuss. Can I email you guys directly?
,
Oct 3 2016
Another quick note: I pushed out my workaround for this bug today becuase it was impacting players. You should not be able to reproduce the issue on the live site anymore. If you like, I can deploy a special version somewhere with the bug intact. Just let me know.
,
Oct 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b0d65c9311d7e87784e90d60bfc68efb5f03555 commit 2b0d65c9311d7e87784e90d60bfc68efb5f03555 Author: junov <junov@chromium.org> Date: Tue Oct 04 17:36:28 2016 Disable GPU acceleration on 2D canvas when readbacks are needed We are already disabling GPU acceleration to avoid readbacks caused by calls to getImageData. This change applies the same principle to canvas-to-canvas drawImage calls in order to avoid probable future readbacks. BUG= 652126 , 651517 , 650116 , 642539 , 640144 Review-Url: https://codereview.chromium.org/2388293002 Cr-Commit-Position: refs/heads/master@{#422848} [modify] https://crrev.com/2b0d65c9311d7e87784e90d60bfc68efb5f03555/third_party/WebKit/Source/core/html/HTMLCanvasElement.cpp [modify] https://crrev.com/2b0d65c9311d7e87784e90d60bfc68efb5f03555/third_party/WebKit/Source/modules/canvas2d/CanvasRenderingContext2DTest.cpp [modify] https://crrev.com/2b0d65c9311d7e87784e90d60bfc68efb5f03555/third_party/WebKit/Source/platform/graphics/ExpensiveCanvasHeuristicParameters.h
,
Oct 4 2016
> I disagree. Game developers should not have to design around our own performance heuristics. I couldn't agree more - these sorts of performance cliffs are terrible for predictability. Thank you for investing in making the performance properties less surprising! It might also be worth studying what other engines do here. If we're going to have some performance cliffs, it would at least be less surprising for developers if the cliffs were in roughly the same places across multiple browsers :-)
,
Oct 4 2016
This particular performance cliff was all about GPU readbacks. The solution here is to cascade the decision to not use the GPU in order to avoid mixes of gpu-accelerated and non-gpu-accelerated canvases, which is the worst possible situation. @jay: the main difference in performance characteristics you observed are most likely not due to the difference in OS, but the difference in GPU model and display driver version. Anyways, the problem should be fixed now. Try it out in the next (almost daily) Canary release, found here: https://www.google.com/intl/en/chrome/browser/canary.html
,
Oct 4 2016
Did we every know why an orientation change fixed the problem?
,
Oct 6 2016
Guys, did you see this bug? Kind of related. https://bugs.chromium.org/p/chromium/issues/detail?id=652906 |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by j...@ironhelmet.com
, Sep 26 20166.5 KB
6.5 KB View Download