New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 771209 link

Starred by 3 users

Issue metadata

Status: Duplicate
Merged: issue 701925
Owner:
Last visit > 30 days ago
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

2.6%-99.4% regression in blink_perf.canvas at 505367:505466

Project Member Reported by majidvp@chromium.org, Oct 3 2017

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=771209

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=a717df9f8dfb2cca7a81fc6a542d6ab14a2e5953a38424bfa8fcca27d48a22b4


Bot(s) for this bug's original alert(s):

android-nexus7v2
android-webview-nexus5X
chromium-rel-mac-retina
chromium-rel-mac11-air
chromium-rel-mac12
chromium-rel-win7-dual
chromium-rel-win8-dual
win-high-dpi
Cc: noel@chromium.org
Owner: noel@chromium.org
Status: Assigned (was: Untriaged)

=== Auto-CCing suspected CL author noel@chromium.org ===

Hi noel@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Noel Gordon
  Commit : 09b784fd12f255a9da38107ac6e0386f4dde6d68
  Date   : Fri Sep 29 19:44:25 2017
  Subject: zlib adler_simd.c

Bisect Details
  Configuration: win_8_perf_bisect
  Benchmark    : smoothness.tough_canvas_cases
  Metric       : frame_times/tough_canvas_cases_canvas_toBlob.html
  Change       : 2.06% | 21.4123169791 -> 21.8523687874

Revision             Result                   N
chromium@505418      21.4123 +- 0.234754      6      good
chromium@505442      21.4523 +- 0.112732      6      good
chromium@505445      21.4373 +- 0.140612      6      good
chromium@505446      21.5079 +- 0.511595      9      good
chromium@505447      21.9197 +- 0.309735      6      bad       <--
chromium@505448      21.9579 +- 0.489674      6      bad
chromium@505454      21.8885 +- 0.346735      6      bad
chromium@505465      21.8524 +- 0.151766      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=tough.canvas.cases.canvas.toBlob.html smoothness.tough_canvas_cases

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8966739790339929728


For feedback, file a bug with component Speed>Bisection

Comment 4 by noel@chromium.org, Oct 12 2017

The patch is from  issue 762564  where we added SIMD implementations of the adler32 checksum used by zlib for Intel SSSE3 (run-time detected), and ARM NEON.  

The adler checksum affects PNG decoding and encoding speed, and our measurements of PNG decoding speed [1]

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=762564#c41

showed good wins for the various device tested, and win8-dual has a PNG image decode bench mark [2]

[2] https://chromeperf.appspot.com/report?sid=d5261445fe006eda5a3902f41dec10b015a522e769bed111d77eadb72ef9bc75&rev=505465

and it improved when the adler SIMD change landed.  That suggests to me that the win8-dual machines have SSSE3.

Our measurements of PNG encoding speed with the adler SIMD on Intel devices [2]

[2] https://bugs.chromium.org/p/chromium/issues/detail?id=762564#c46

showed +15% improve over the PNG 140 image test corpus.  So we do not expect a regression in PNG encode performance.  Yet we have this report ...

Comment 5 by noel@chromium.org, Oct 12 2017

Cc: xlai@chromium.org
Looking at this test, a grey color is produced by a "random number" generator that depends on how fast a source <canvas> is drawn by requestAnimationFrame(). At most 20 different shades of grey are produced by the generator.

Main part of the test samples the current grey color from the generator and fills a 4000 x 4000 test <canvas> with that color, then PNG encodes it with .toBlob(). When the blob is ready, the test repeats.

All this test does then is: PNG encode a 4000 x 4000 solid color image, composed of one of 20 possible shades of grey, over and over, where the grey color used in each PNG encode is sampled from the generator.

+xlai@ for comment.

Comment 6 by noel@chromium.org, Oct 12 2017

I ran frame_times/tough_canvas_cases_canvas_toBlob.html test locally, and I collected 80, 4000x4000 PNG images from this test to create an corpus of test images.

I then ran the image_decode_bench [1] encoding rate benchmark over the test images to measure Blink PNG encode speed, with and without the adler32 SIMD change.

[1] https://chromium-review.googlesource.com/c/chromium/src/+/706744

MAC BOOK PRO 13" 2016 (TOUCH BAR) LAPTOP
 * Dual-core Intel Core i7-6567U, 3.3GHz Skylake-U (14nm)
png mac-book-pro-canvas-encoding-rate-sse-adler
https://docs.google.com/spreadsheets/d/1wryp3Za0dcNOgXR-yDgPWU6ktiqeQEgPsmwrJNDPkXc/edit#gid=595603569

I measured on OSX (I do not have Win8) but I expect Intel chips with SSSE3 should maybe perform the same.  The improve in encoding speed is average +48% faster for this corpus.

Not sure why this test regressed, maybe I'm not understanding this test.
junov, as test owner, any ideas?

Comment 8 by xlai@chromium.org, Oct 12 2017

@noel: This test is to measure the smoothness impact of async toBlob on the other main-thread activities. In the ideal case of 60fps, requestAnimationFrame(draw) on the drawCanvas should be running at a fast rate like 16.6ms; when async toBlob (acting on testCanvas) comes into play, the fps should drop and test measurement is maintained at 27~28ms. The CL r505447 raised it to 29~31ms; it's like dropping the frame rate from about 36fps to 33fps. 

One possible direction you can look at, is probably whether the new PNG encoder has preempted other main-thread activities. I see from your comment #4 that the encoding speed has improved because of the CL 505447 and I guess that's why--the improvement of toBlob speed comes in the sacrifice of other main-thread activities.

Comment 9 by noel@chromium.org, Nov 20 2017

The other main-thread activity seems to be the canvas.toBlob call.  I instrumented the test to measure the frame rate, how long the canvas.toBlob call takes and the rate at which it is called, and also rate of blob creation (test-case attached).

Looking the test in devtools, the FPS is a square wave.  No frames are produced while main thread JS is running, canvas.toBlob call takes up the majority of that time where it makes a snapshot of the canvas image and schedules it for encoding.  Once main thread JS ends, frames start to be produced (image attached).
test-case.html
3.9 KB View Download
devtools-performace.png
178 KB View Download

Comment 10 by noel@chromium.org, Nov 20 2017

Before my change, the blob PNG encode takes avg 143ms, and with my change it takes 95ms, so the PNG encode time is much faster.  This allows main thread canvas.toBlob to be called more often with my change, from 4.43 calls/sec before, to 5.59 calls/sec with my change (see the attached graphs).

The main thread canvas.toBlob time cost does not change before or after my change (avg 84ms before and after).  However, since is called more often with my change, and blocks frame production, the frame rate reported by this test decreases. 
to-blob-before.png
84.1 KB View Download
to-blob-after.png
82.2 KB View Download

Comment 11 by noel@chromium.org, Nov 20 2017

In general, as the PNG encoding time decreases (faster encoding, good), the frame rate reported by this test will _decrease_ (weird but true).

To test that idea, I changed the canvas.toBlob code to still take the snapshot, but instead schedule a null blob callback.  This skips the PNG encode step to simulate what happens if we have a very fast PNG encoder.
to-blob-fast-encode.png
79.0 KB View Download

Comment 12 by noel@chromium.org, Nov 20 2017

Result of #11 is, canvas.toBlob is called more often (11.33 calls/sec) and the frame rate of the test decreases to 11 frames per second.

Seems to me, if we improve the performance of the PNG encoder, this test will report a regression in frame rate by design.

Comment 13 by junov@chromium.org, Nov 21 2017

Sounds to me like toBlob's work load is being scheduled too aggressively.  I thought the code was supposed to chop up the work into a series of idle tasks.  Each task processes scanlines until the idle period deadline is reached. Is the code not doing this anymore?

Comment 14 by xlai@chromium.org, Nov 22 2017

Cc: zakerinasab@chromium.org
I think Reza and I would look into this issue, using some chrome metrics data. There is a possibility that canvas.toBlob always fall into the "force-encoding" path as the previous idle task encoding was too slow; and now with the new encoder it becomes faster and now it
starts doing row-by-row encoding more frequently. 

Comment 15 by noel@chromium.org, Nov 23 2017

#14 xlai@ > There is a possibility that canvas.toBlob always fall into the "force-encoding" path as the

If, by the "force-encoding" code path, you mean this one:

  CanvasAsyncBlobCreator::ForceEncodeRowsOnCurrentThread() { ... }

note I put a CHECK(false) therein (to crash my browser) and re-ran the test-case to test your idea.  My browser did not crash :) so I don't think "force-encoding" is the problem here.

#13 junov@ > I thought the code was supposed to chop up the work into a series of idle tasks.  Each task processes scanlines until the idle period deadline is reached. 

That's my observation: I see C++ callbacks in idle time to

  CanvasAsyncBlobCreator::IdleEncodeRows(deadline)

which encodes a row-at-a-time until |deadline| is reached, and re-schedules itself on Platform Scheduler()->PostIdleTask() if there still more rows to encode.

Seems fine, and zooming in to the image in #9, the FPS during IdleEncodeRows (grey area) is 60 frames per second.
to-blob-idle-time-rows.png
61.4 KB View Download

Comment 16 by noel@chromium.org, Nov 23 2017

60 FPS -> IdleEncodeRows looks good to me.  But while we are in canvas.toBlob(), the main thread is blocked, the FPS drops to 0 (yellow area).

With a faster PNG encoder, we call toBlob more often.  Thus the proportion of time spent in toBlob() increases.  That means longer periods of FPS 0.  That causes the test's reported frame rate to decrease.

----

Aside: things would be awesome if we could move the work being done in toBlob() (viz., the snapshot and paint) into the idle time task.  That was the design we envisaged for toBlob, so that it would not jank the main thread.  The analysis above tells me that toBlob, as currently implemented, can jank the main thread.

Comment 17 by noel@chromium.org, Nov 23 2017

Components: Blink>Canvas

Comment 18 by noel@chromium.org, Dec 15 2017

Owner: xlai@chromium.org
Anyho, a see a number of patches going in about this, and, so long as we are aware the improvements in image encoding time cause this test to report a _lower_ FPS and therefore a regression, well it's working as intended.

Reassigning to the test owners, for resolve.


Comment 19 by xlai@chromium.org, Dec 19 2017

Mergedinto: 701925
Status: Duplicate (was: Assigned)
As commented by noel@ previously, the faster the encoder, the more frequently canvas.toBlob is getting called. If canvas.toBlob does occupy some main-thread resources already, then such an increase of frequency will manifest itself as a regression of frame rate in the smoothness test as we see the report here.

Currently, canvas.toBlob is able to delegate the encoding jobs to idle tasks to avoid main thread janks. In the picture shown by noel@ in comment 15, when the idle tasks are executing, the main thread is about 60 frames per second. This shows that the encoding step is working well in relieving main thread jank.

But, canvas.toBlob is calling HTMLCanvasElement::ToStaticBitmapImage to read the canvas image before encoding, which is a synchronous step. This step occupies main-thread resources and is the main reason of the regression.

This issue should be related to crbug.com/701925, a request to improve canvas.toBlob by making its image readback step to become async. I want to close this issue and links to crbug.com/701925, because the regression is not caused by the change of encoder in r505447, but an inherent implementation problem within canvas.toBlob.

Sign in to add a comment