Issue metadata
Sign in to add a comment
|
2.6%-99.4% regression in blink_perf.canvas at 505367:505466 |
||||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Oct 3 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8966739790339929728
,
Oct 3 2017
=== Auto-CCing suspected CL author noel@chromium.org === Hi noel@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Noel Gordon Commit : 09b784fd12f255a9da38107ac6e0386f4dde6d68 Date : Fri Sep 29 19:44:25 2017 Subject: zlib adler_simd.c Bisect Details Configuration: win_8_perf_bisect Benchmark : smoothness.tough_canvas_cases Metric : frame_times/tough_canvas_cases_canvas_toBlob.html Change : 2.06% | 21.4123169791 -> 21.8523687874 Revision Result N chromium@505418 21.4123 +- 0.234754 6 good chromium@505442 21.4523 +- 0.112732 6 good chromium@505445 21.4373 +- 0.140612 6 good chromium@505446 21.5079 +- 0.511595 9 good chromium@505447 21.9197 +- 0.309735 6 bad <-- chromium@505448 21.9579 +- 0.489674 6 bad chromium@505454 21.8885 +- 0.346735 6 bad chromium@505465 21.8524 +- 0.151766 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=tough.canvas.cases.canvas.toBlob.html smoothness.tough_canvas_cases More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8966739790339929728 For feedback, file a bug with component Speed>Bisection
,
Oct 12 2017
The patch is from issue 762564 where we added SIMD implementations of the adler32 checksum used by zlib for Intel SSSE3 (run-time detected), and ARM NEON. The adler checksum affects PNG decoding and encoding speed, and our measurements of PNG decoding speed [1] [1] https://bugs.chromium.org/p/chromium/issues/detail?id=762564#c41 showed good wins for the various device tested, and win8-dual has a PNG image decode bench mark [2] [2] https://chromeperf.appspot.com/report?sid=d5261445fe006eda5a3902f41dec10b015a522e769bed111d77eadb72ef9bc75&rev=505465 and it improved when the adler SIMD change landed. That suggests to me that the win8-dual machines have SSSE3. Our measurements of PNG encoding speed with the adler SIMD on Intel devices [2] [2] https://bugs.chromium.org/p/chromium/issues/detail?id=762564#c46 showed +15% improve over the PNG 140 image test corpus. So we do not expect a regression in PNG encode performance. Yet we have this report ...
,
Oct 12 2017
Looking at this test, a grey color is produced by a "random number" generator that depends on how fast a source <canvas> is drawn by requestAnimationFrame(). At most 20 different shades of grey are produced by the generator. Main part of the test samples the current grey color from the generator and fills a 4000 x 4000 test <canvas> with that color, then PNG encodes it with .toBlob(). When the blob is ready, the test repeats. All this test does then is: PNG encode a 4000 x 4000 solid color image, composed of one of 20 possible shades of grey, over and over, where the grey color used in each PNG encode is sampled from the generator. +xlai@ for comment.
,
Oct 12 2017
I ran frame_times/tough_canvas_cases_canvas_toBlob.html test locally, and I collected 80, 4000x4000 PNG images from this test to create an corpus of test images. I then ran the image_decode_bench [1] encoding rate benchmark over the test images to measure Blink PNG encode speed, with and without the adler32 SIMD change. [1] https://chromium-review.googlesource.com/c/chromium/src/+/706744 MAC BOOK PRO 13" 2016 (TOUCH BAR) LAPTOP * Dual-core Intel Core i7-6567U, 3.3GHz Skylake-U (14nm) png mac-book-pro-canvas-encoding-rate-sse-adler https://docs.google.com/spreadsheets/d/1wryp3Za0dcNOgXR-yDgPWU6ktiqeQEgPsmwrJNDPkXc/edit#gid=595603569 I measured on OSX (I do not have Win8) but I expect Intel chips with SSSE3 should maybe perform the same. The improve in encoding speed is average +48% faster for this corpus. Not sure why this test regressed, maybe I'm not understanding this test.
,
Oct 12 2017
junov, as test owner, any ideas?
,
Oct 12 2017
@noel: This test is to measure the smoothness impact of async toBlob on the other main-thread activities. In the ideal case of 60fps, requestAnimationFrame(draw) on the drawCanvas should be running at a fast rate like 16.6ms; when async toBlob (acting on testCanvas) comes into play, the fps should drop and test measurement is maintained at 27~28ms. The CL r505447 raised it to 29~31ms; it's like dropping the frame rate from about 36fps to 33fps. One possible direction you can look at, is probably whether the new PNG encoder has preempted other main-thread activities. I see from your comment #4 that the encoding speed has improved because of the CL 505447 and I guess that's why--the improvement of toBlob speed comes in the sacrifice of other main-thread activities.
,
Nov 20 2017
The other main-thread activity seems to be the canvas.toBlob call. I instrumented the test to measure the frame rate, how long the canvas.toBlob call takes and the rate at which it is called, and also rate of blob creation (test-case attached). Looking the test in devtools, the FPS is a square wave. No frames are produced while main thread JS is running, canvas.toBlob call takes up the majority of that time where it makes a snapshot of the canvas image and schedules it for encoding. Once main thread JS ends, frames start to be produced (image attached).
,
Nov 20 2017
Before my change, the blob PNG encode takes avg 143ms, and with my change it takes 95ms, so the PNG encode time is much faster. This allows main thread canvas.toBlob to be called more often with my change, from 4.43 calls/sec before, to 5.59 calls/sec with my change (see the attached graphs). The main thread canvas.toBlob time cost does not change before or after my change (avg 84ms before and after). However, since is called more often with my change, and blocks frame production, the frame rate reported by this test decreases.
,
Nov 20 2017
In general, as the PNG encoding time decreases (faster encoding, good), the frame rate reported by this test will _decrease_ (weird but true). To test that idea, I changed the canvas.toBlob code to still take the snapshot, but instead schedule a null blob callback. This skips the PNG encode step to simulate what happens if we have a very fast PNG encoder.
,
Nov 20 2017
Result of #11 is, canvas.toBlob is called more often (11.33 calls/sec) and the frame rate of the test decreases to 11 frames per second. Seems to me, if we improve the performance of the PNG encoder, this test will report a regression in frame rate by design.
,
Nov 21 2017
Sounds to me like toBlob's work load is being scheduled too aggressively. I thought the code was supposed to chop up the work into a series of idle tasks. Each task processes scanlines until the idle period deadline is reached. Is the code not doing this anymore?
,
Nov 22 2017
I think Reza and I would look into this issue, using some chrome metrics data. There is a possibility that canvas.toBlob always fall into the "force-encoding" path as the previous idle task encoding was too slow; and now with the new encoder it becomes faster and now it starts doing row-by-row encoding more frequently.
,
Nov 23 2017
#14 xlai@ > There is a possibility that canvas.toBlob always fall into the "force-encoding" path as the
If, by the "force-encoding" code path, you mean this one:
CanvasAsyncBlobCreator::ForceEncodeRowsOnCurrentThread() { ... }
note I put a CHECK(false) therein (to crash my browser) and re-ran the test-case to test your idea. My browser did not crash :) so I don't think "force-encoding" is the problem here.
#13 junov@ > I thought the code was supposed to chop up the work into a series of idle tasks. Each task processes scanlines until the idle period deadline is reached.
That's my observation: I see C++ callbacks in idle time to
CanvasAsyncBlobCreator::IdleEncodeRows(deadline)
which encodes a row-at-a-time until |deadline| is reached, and re-schedules itself on Platform Scheduler()->PostIdleTask() if there still more rows to encode.
Seems fine, and zooming in to the image in #9, the FPS during IdleEncodeRows (grey area) is 60 frames per second.
,
Nov 23 2017
60 FPS -> IdleEncodeRows looks good to me. But while we are in canvas.toBlob(), the main thread is blocked, the FPS drops to 0 (yellow area). With a faster PNG encoder, we call toBlob more often. Thus the proportion of time spent in toBlob() increases. That means longer periods of FPS 0. That causes the test's reported frame rate to decrease. ---- Aside: things would be awesome if we could move the work being done in toBlob() (viz., the snapshot and paint) into the idle time task. That was the design we envisaged for toBlob, so that it would not jank the main thread. The analysis above tells me that toBlob, as currently implemented, can jank the main thread.
,
Nov 23 2017
,
Dec 15 2017
Anyho, a see a number of patches going in about this, and, so long as we are aware the improvements in image encoding time cause this test to report a _lower_ FPS and therefore a regression, well it's working as intended. Reassigning to the test owners, for resolve.
,
Dec 19 2017
As commented by noel@ previously, the faster the encoder, the more frequently canvas.toBlob is getting called. If canvas.toBlob does occupy some main-thread resources already, then such an increase of frequency will manifest itself as a regression of frame rate in the smoothness test as we see the report here. Currently, canvas.toBlob is able to delegate the encoding jobs to idle tasks to avoid main thread janks. In the picture shown by noel@ in comment 15, when the idle tasks are executing, the main thread is about 60 frames per second. This shows that the encoding step is working well in relieving main thread jank. But, canvas.toBlob is calling HTMLCanvasElement::ToStaticBitmapImage to read the canvas image before encoding, which is a synchronous step. This step occupies main-thread resources and is the main reason of the regression. This issue should be related to crbug.com/701925, a request to improve canvas.toBlob by making its image readback step to become async. I want to close this issue and links to crbug.com/701925, because the regression is not caused by the change of encoder in r505447, but an inherent implementation problem within canvas.toBlob. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Oct 3 2017