New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 468098 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Apr 2015
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Regression



Sign in to add a comment

JavaScript requestAnimationFrame() performance regression from r319827 to r319833

Reported by jer...@duckware.com, Mar 17 2015

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36

Steps to reproduce the problem:
1. Visit www.vsynctester.com and take notice that the blue line (time spend in JavaScript animation code, self timed). 

What is the expected behavior?

What went wrong?
Large regression between r319827 to r319833.  Attached is what I see.

Did this work before? Yes r319827

Chrome version: 41.0.2272.89  Channel: stable
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 17.0 r0

function rafCallback() {
  var t1 = window.performance.now();
  // optional spin delay
  // render
  var t2 = window.performance.now();
  // blue line is graph of t2-t1
  }

Interestingly, when there is a spin delay added to the beginning of the render time, around 6ms, the additional time to finish rendering is once again on par with r319827 (much lower).
 
r319827.jpg
39.4 KB View Download
r319833.jpg
36.2 KB View Download

Comment 1 by jer...@duckware.com, Mar 18 2015

traces only show more time in "V8.Execute" (see attached).
before.jpg
29.8 KB View Download
after.jpg
29.3 KB View Download

Comment 2 by jer...@duckware.com, Mar 18 2015

Very strange, F12 / Profile / Collect JavaScript CPU profile / Start causes render times to DROP.
start.jpg
14.6 KB View Download
Labels: TE-NeedsFurtherTriage

Comment 4 by tkent@chromium.org, Mar 20 2015

Labels: Cr-Blink-Animation
Labels: Hotlist-Scheduling
Labels: -Cr-Blink-Animation Cr-Blink-Canvas
Owner: junov@chromium.org
Chrome range: https://chromium.googlesource.com/chromium/src/+log/5014cf375daac0746cc4725a3c76e5ff026caa06..60b42286b4f63882bc5f1ffc4fd7e94bc5d319f0
This includes a Blink roll: https://chromium.googlesource.com/chromium/blink/+log/baae7d7..95e34ab

Based on the canvas operations in doRenderAnimationFrame() on www.vsynctester.com suspecting:
https://chromium.googlesource.com/chromium/blink/+/2868c0c15a1860c6217f898528830c5678fda780 "Inhibit paint invalidations on accelerated 2D canvases"


Comment 7 by junov@chromium.org, Apr 1 2015

Labels: Needs-Feedback
Ironically, the change that probably introduced this issue was specifically targeted at improving the performance on www.vsynctester.com. See  issue 464034 

I did confirm using chrome://tracing that the code change cited above reduces the total CPU workload for rendering a frame on www.vsynctester.com

@jerryj: The regression you are measuring could be caused by the main thread of the chrome render process being de-scheduled in order to perform work on separate threads.  The probability of this happening during JavaScript execution is higher on machines with a small number of CPU cores (I am guessing that is the case of the machine on which you ran this test?). The fact that running the profiler reduces the JS execution wall clock time is another hint that your metric is senstive to thread scheduling and can therefore be thrown off by the slightest change in workload balance between threads.  In the graphs produced by chrome://tracing you can observe threads being de-scheduled and the graph also reports actual CPU time as well as wall clock time for each task.

If you look at the CPU usage reported by the OS or the Chrome task manager, you should actually see an overall improvement, which is what I observed.

Please confirm whether this is in fact what you are observing. To confirm the existence of a regression in JavaScript execution time--thread scheduling not withstanding--I would recommend capturing a trace in chrome://tracing and looking for a task called v8.callFunction (that would be the call to requestAnimationFrame) in a thread called CrRendererMain. If you see a regression in the *CPU* time, then we may have an actual regression.

I performed exactly this analysis when I landed that change, in order to verify that performance is in fact improved.  That being said, my hardware is probably very different from yours, so please report back if you have any additional evidence of a regression.  And if that is the case, please provide additional information on the system where the regression was observed (CPU, GPU, RAM, OS).

Thanks.
My hardware: Dell L702X notebook (XPS 17, Intel i7 2630QM 2.00Ghz with 8 threads, Intel HD Graphics 3000, 8GB, Windows 7, Aero Glass ON).

V8.Execute between the two revs provided in comment #1

Attached are traces from the two revs.
r319827.zip
2.0 MB Download
r319833.zip
1.8 MB Download
I also see the problem on this hardware:  Dell Inspiron 15R notebook (Intel i7-4500U 1.80Ghz with 4 threads, Intel HD Graphics 4400, 12GB, Windows 8.1).

junov, you see no dramatic increase in the blue line at vsynctester.com between these two revs?
Half the threads on my CPU have virtually 'no work' during a vsynctester.com animation?  See attached.
threads.jpg
90.2 KB View Download
Thanks for the data.  This is really helpful. The trace does reveal that the objective of reducing the total CPU workload was achieved (work on the Compositor and CompositorTileWorker threads was cut drastically). Nonetheless, there really is a measurable difference in the JS execution time that cannot be attributed to thread scheduling.  This was unexpected. What seems to be happening is that the change in timing of the various tasks in a frame lifecycle induced by the code change is causing GPU rasterization work to line-up directly with the JS execution task, which is a coincidence. What we are probably observing is the the spike in GPU activity having a side-effects on on CPU performance (memory bus traffic?). On my workstation I am not observing this phenomenon, probably because I have discrete GPU (with its own dedicated memory) and processors with large caches.

The net effect of the change, all things considered, is very positive even on your system if you look at total CPU usage as well as total GPU activity. Overall performance and power consumption are improved, despite a hiccup that is lined-up with requestAnimationFrame.  I am closing this issue without further resolution. 

@jerryj: Your high quality input is still very much appreciated even though it is not leading to a fix this time.  Keep the great reports coming!
Status: WontFix
The difference in the 'before' render time vs the 'after' render time is 'precisely' 1ms -- my bet is on a timing problem inside Chrome.

OK, the difference of 'precisely' 1ms is a red herring and a coincidence.  I have just played around with the vsynctester.com code and the difference in render times can be made to be any delay desired -- simply by adjusting the number of graphics (ctx) calls that are issued.

Anytime overhead is added into the system (delivering mail, opening an app, anything), render times in Chrome decrease in r319833 from high to low (to their level in r319827).

When I put these two things together -- along with research that I did a couple of weeks ago regarding the overhead of thread context switch in Windows -- this screams of a producer/consumer (or locking) timing issue in Chrome.

Anytime a thread needs to wait (whereas before when it did not have to wait), the overhead of waiting is noticed -- as it is here.
jerryj@: In #14, is the issue equivalent to issue 464740? The bad trace in #8 looks like it has the same problem as issue 464740.

If so maybe we should move the conversation there. I've come up with a quick test we can try out to see if it helps. Details in issue 464740.
briander/15: no, different issues.

The problem is, switching tabs (between about://tracing and web site), it is impossible to prevent issue 464740 from first happening.  The 'trigger' is any mouse movement in the tab area.  I then click on the title bar of the window (as per that issue) to hopefully remove that issue's impact.
Any additional load on the system, even a simple 'spin wait' thread (see attached) causes Chrome render time to decrease.
spinwait.jpg
188 KB View Download
As per #17, the load on the system can also come from a 'Worker'...

  var t1 = Date.now();
  while ((Date.now()-t1)<3000) {
    }

... which also cases Chrome render time to decrease (for three seconds).
junov/11: There is no GPU activity during JS execution in this  issue 468098 .  If you are seeing this alignment in the traces, then it is very likely you are just seeing issue 464740.

Instead, look at frame 403 (high render time) in the attached trace, ts 559007729475 to 559007730744:

  559007729475 v8       v8.callFunction dur=1276
  559007729479 v8/B     V8.Execute
  559007729526 cc       ThreadProxy::SetNeedsAnimate dur=2
  559007729653 skia     BitmapImage::draw dur=22
  559007729672 renderer MemoryCache::prune() dur=2
  559007729733 skia     BitmapImage::draw dur=12
  559007729743 renderer MemoryCache::prune() dur=1
  559007730744 v8/E     V8.Execute

Then compare to frame 408 (low render time) in the attached trace, ts 559007812943 to 559007813455

  559007812943 v8       v8.callFunction dur=515
  559007812945 v8/B     V8.Execute
  559007812970 cc       ThreadProxy::SetNeedsAnimate dur=0
  559007813040 skia     BitmapImage::draw dur=11
  559007813049 renderer MemoryCache::prune() dur=1
  559007813077 skia     BitmapImage::draw dur=5
  559007813081 renderer MemoryCache::prune() dur=0
  559007813455 v8/E     V8.Execute

If you are seeing any GPU activity that overlaps these time frames, please provide the 'ts' of trace line that impacts these V8 Execute time frames...

trace.zip
1.9 MB Download
I just found the problem.

It has nothing to do with the GPU (replicated with no canvas), nothing to do with the memory bus (it happens with an inactive bus), nothing to do with  issue 464034 , and is easily replicated in 12 lines of pure JavaScript that makes no calls whatsoever (eg: calculate a prime).

It presents an interesting problem for animation code, which has the potential to now run 3.5x slower than before.

Sign in to add a comment