New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 785939 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 703608
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Canvas rendering appears to be debounced in some cases without user interaction

Reported by sger...@gmail.com, Nov 16 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36

Steps to reproduce the problem:
Application specific, see below.

What is the expected behavior?
The behavior in both attached images should be the same.

What went wrong?
I'm working on a project making extensive use of Canvas rendering and I noticed a change in behavior last night in our application following a Chrome update. Also, I can confirm that none of our related application code was changed between last night and this morning. I've been trying to come up with a simple test case that reproduces the problem, however, I'm worried that the problem isn't exhibited when there isn't enough things rendering on the screen. I'm hoping that the following description and screenshots can help diagnose or point to what might have changed.

Basically, what we are doing is fetching a whole bunch of individual image slices and rendering them as the browser receives them. What that ends up causing is a continuous "loading" effect where you can see the new slices as they become available. Up until v62, both of the attached images would have been behaving the same, however, after v62, nomouse.gif has now become the default behavior where it appears that not all rendering calls are actually updating the canvas. Oddly enough, if I move the mouse off the canvas elements (there are three separate canvas elements in these screenshots) and, while the images are downloading, I move the mouse wheel back and forth (so no interaction with the canvas, but interaction with the browser) then all of the rendering calls are updating the canvas and you can see how much smoother the rendering is in mouse.gif. 

To ensure we're not missing any events, I added a counter to the function that performs the canvas rendering and it reports the same number of calls in both of these cases. Furthermore, the fact that this "debounced" behavior isn't seen in any other browsers (or previous version of Chrome) makes me believe that this a change in Chrome is at the root cause and not an application issue. 

I'll be happy to assist any way I can, however, after trying (and failing) to create an independent, minimal reproducible case for a few hours, I'm a bit at a loss, and hoping that an expert can quickly identify the probable root cause (or at least narrow down the possibilities).

Did this work before? Yes 61.0.3163

Chrome version: 62.0.3202.94  Channel: stable
OS Version: OS X 10.12.6
Flash Version:
 
nomouse.gif
132 KB View Download
mouse.gif
322 KB View Download
Components: -Blink Blink>Canvas
Labels: Needs-Bisect Needs-Feedback Needs-Triage-M62
sgerace@, thank you for the report. Do you have a repro case to triage it further??

Comment 2 by sger...@gmail.com, Nov 16 2017

No, and unfortunately that's kind of the issue, I wasn't able to reproduce the problem with relatively simple datasets, and since at the moment we are only seeing the issue in the context of our application, there is clearly a lot going on to get the data to populate in a similar manner. I should be able to spend some more time this afternoon trying to reproduce, but I was hoping something obvious might stand out to a dev to help narrow down what to focus on when creating the repro case.
Project Member

Comment 3 by sheriffbot@chromium.org, Nov 16 2017

Cc: manoranj...@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "manoranjanr@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: Triaged-ET Needs-Feedback
@ sgerace: It would be great if you could provide a sample URL, that would help us in triaging the issue from TE-end

Comment 5 by sger...@gmail.com, Nov 21 2017

I'm actually thinking this might be related to WebWorkers now; I'll continue to try to narrow down the problem into a repro case and I'll keep you posted.
Project Member

Comment 6 by sheriffbot@chromium.org, Nov 21 2017

Cc: divya.pa...@techmahindra.com
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "divya.padigela@techmahindra.com" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: Needs-Feedback
Adding label Needs-Feedback as we are still waiting for the reporter as per comment #5.

Comment 8 by sger...@gmail.com, Nov 28 2017

After playing with this over the long weekend, I was finally able to create a simple repro case. I attached the relevant files here (index.html, script.js, and worker.js), and you can also find a working copy (with live-reload server) at https://github.com/sgerace/bugs-chromium-785939

Upon further investigation, it appears like the issue occurs when the CPU is maxed in the browser while there is a lot of rendering happening. I'm not sure if the WebWorker is actually contributing to the root cause (my guess is that it is not), however, it was the only way I was able to both isolate the rendering and get it to occur due to an external event (I had issues reproducing the behavior using setInterval, plus, this most closely resembles what we are doing in our application as well).

In the example, if I let it render without interacting with the page (nomouse2.gif), you can see a very clear lag to the animation; however, if I scroll the mouse wheel on the page, the lag mostly goes away (mouse2.gif). Looking at the performance profile in Chrome, the mouse events seem to be breaking up the high CPU function calls and  "allowing" the rendering to take place.

If I load the same example in Firefox, the animation is smooth. Unfortunately, I'm unaware of any method to install previous versions of Chrome to test, however, given the similarities to our actual application, I'm going to guess that it would *not* exhibit the lag.

Let me know if you have any questions or if you have any issues with the repro case.

Thanks!
index.html
518 bytes View Download
script.js
701 bytes View Download
worker.js
866 bytes View Download
nomouse2.gif
131 KB View Download
mouse2.gif
1.0 MB View Download
Project Member

Comment 9 by sheriffbot@chromium.org, Nov 28 2017

Cc: krajshree@chromium.org
Labels: -Needs-Feedback
Thank you for providing more feedback. Adding requester "krajshree@chromium.org" to the cc list and removing "Needs-Feedback" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Pri-2 -Needs-Bisect hasbisect-per-revision M-64 Pri-1
Owner: jbroman@chromium.org
Status: Assigned (was: Unconfirmed)
Able to reproduce the issue on reported version 62.0.3202.94, latest Canary 64.0.3279.0 using Mac 10.12.6. 

Note: Issue is not reproducible on Ubuntu 14.04 and Windows 10

Bisect Info:
================
Good Build: 57.0.2935.0 
Bad Build:  57.0.2936.0

CHANGELOG URL:
You are probably looking for a change made after 434684 (known good), but no later than 434685 (first known bad).
https://chromium.googlesource.com/chromium/src/+log/ce1c7996be64c3af7243256c5032f11aef6fe7cd..19b847a9301f7139b7ff85cb32eb13e514fa1ab1

Suspect: https://codereview.chromium.org/2517813002
Suspecting same from changelog.

@jbroman: Please confirm the issue and help in re-assigning if it is not related to your change.

Thanks!


Comment 11 by junov@chromium.org, Nov 29 2017

Components: -Blink>Canvas Internals>GPU>Scheduling
Re-triaging under "Scheduling"

Comment 12 by sger...@gmail.com, Nov 30 2017

I'm currently in a search for a possible workaround to improve performance until this is fixed (or possibly to improve performance in general), and I was playing around with calling requestAnimationFrame. In the repro case, if I introduce a simple call to requestAnimationFrame (see attached updated script.js), this seems to significantly improve the behavior. My question is whether I should consider this "best practice" regardless, and if so, is it safe to consider this a "workaround" for the current issue in general?
script.js
964 bytes View Download
Components: -Internals>GPU>Scheduling Blink>Scheduling
Owner: alexclarke@chromium.org
A few thoughts here after a quick investigation (apologies for the delay):

1. The test case creates very large arrays of doubles, which currently we don't serialize/deserialize as efficiently as possible. Presently we don't use packed double arrays because the format does not have a special tag for that, but we could, which would radically reduce the time spent deserializing, which is presently about 60% of the work done by each task. But this would simply change the CPU speed at which this issue begins to occur.

2. I hypothesize that serialization has gotten cheaper such that previously the worker thread tasks would take long enough that it didn't enqueue tasks fast enough to keep the main thread busy, but now it is able to. (Deserialization of this seems to be bounded in large part by the #1.)

3. The renderer scheduler appears to be allowing tasks in the pausable task queue (where posted message tasks go) to drown out compositor tasks, even though control is being yielded to the run loop. Looking in tracing shows that the BeginMainFrame tasks get progressively more spread out time goes on (and the worker enqueues more and more tasks).

The root Chromium issue is #3, which is a scheduling question. (Perhaps our heuristic ought to deal with this better.)

Reporter: For your particular reduced test case (and I imagine, your application), having a worker continually add tasks for the main thread without any sort of backpressure or rate-limiting is leading to this issue. Consider some mechanism like using requestAnimationFrame on the main thread to watch frame timing, and having the worker thread post back a new state only thereafter (though you may want to populate the canvas with the previous known state) -- at a cost of a small amount of latency (quite possibly less than a frame, though), you'll reduce main thread CPU usage and make your application more responsive generally. Even if we scheduled this more nicely, other main thread work in this process may jank due to overloading.

(Aside: you may also be interested in OffscreenCanvas which may mitigate the need to send this data across at all.)

Bouncing to alexclarke@ to see if scheduler team wants to take a closer look a this (for #3). Unless packed double arrays are particularly important to do fast, I'm not inclined to prioritize #1. (If they are, you should actually consider transferring a Float64Array or similar, which we can do without the copy at all -- which is even faster than fixing #1 would be.)
785939.png
87.0 KB View Download
^ in case it wasn't clear, the green tasks on the main thread are V8 tasks running the onmessage handler, and the magenta ones are cc::ThreadProxy::BeginMainFrame.
#12: whoops, missed your comment in which you already mentioned requestAnimationFrame. Yes, rAF is helpful, but you probably also want to avoid the extra serialization/deserialization work. Since you're still decoding the message (in Blink we do that when you read the |data| property of the event -- I think some other vendors might do it immediately), part of the work is still happening. Skipping the canvas work alone improves things by a little, but probably only on a sufficiently fast CPU.
Cc: alexclarke@chromium.org
Owner: altimin@chromium.org
Mergedinto: 703608
Status: Duplicate (was: Assigned)
It seems that's another case of compositor task queue starvation. Merging it into 703608.

Sign in to add a comment