Tile iteration order is bad for ganesh tiles. |
||||
Issue descriptionThe way the spiral iterator is implemented, we iterate tiles in a strict spiral ignoring the fact that some tiles have aspect ratios != 1. For ganesh tiles (that are wide but not that tall), this causes us to try and process tiles to the left/right of the viewport before tiles below and above the viewport even if above/below tiles are closer to the viewport. Note that this problem is somewhat alleviated by the fact that we still respect the rect that we're iterating. That is, if we're iterating the soon rect and the left/right tiles are outside of it, we will skip them. In other words, we still iterate only the tiles within a given rect, but the ordering of those tiles in there is not ideal.
,
Jun 10 2016
If this is not too urgent I'll take a look at this as I'll need some time understand the current solution.
,
Jun 11 2016
Suppose the example given below. The tiling data as (gfx::Size(10, 10), gfx::Size(30, 30), false) and center at rect (15, 15, 1, 1). Suppose tiles at locations 1, 2, 3, 7 have texture size 8x8 and tiles at locations 4,5,6 have texture size 10*8 (wide tiles). Then is the sequence expected to be 1, 2, 3, 7, 8, 4, 5, 6? x 0 1 2 y.------ 0| 4 3 2 1| 5 * 1 2| 6 7 8
,
Jun 13 2016
Read the above example as (gfx::Size(10, 8), gfx::Size(30, 30), false), so that all tiles have max texture size as 10*8.
,
Jun 13 2016
If the tiles are "square", ie the ratio of width/height is 1, then the iteration order in #3 is correct. However, when we have ganesh tiles they are sized as full viewport width, and 1/4 of viewport height so we typically end up with wide tiles, something like 1000x250. In that case, you get the following: x 0 1 2 3 4 y.-------------- 0|16 15 14 13 12 1|17 4 3 2 11 2|18 5 * 1 10 3|19 6 7 8 9 4|20 21 22 23 24 However, for example tiles 17, 18, 19 are much further away than 21, 22, 23, which means we're iterating tiles that are much further away from the center than we want.. the ideal iteration would be something like: x 0 1 2 3 4 y.-------------- 0|20 11 10 9 19 1|21 4 3 2 18 2|22 5 * 1 17 3|23 6 7 8 16 4|24 12 13 14 15 That is, considering that the tiles are smaller in the y direction, we'd consider iterating those tiles before proceeding further out in the x direction. How much further? My initial thought is to use height/width or width/height as that proxy (whichever is bigger). So in the case listed above, you'd get a ratio of 4, so you can go 4 rows in y direction before extending the x direction. This is probably a fairly complicated patch (but small), but the idea is to retain a roughtly distance increasing iteration as much as possible. We'll need a lot of unittests for this as well.
,
Jun 14 2016
Thank you vmpstr@ for clear explaination. I'll work on it and add a patch with unittests.
,
Jun 15 2016
,
Jul 18 2016
,
Jul 18 2016
https://codereview.chromium.org/2067213002/ is added for the review.
,
Sep 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/051927915d98be61b03119b7bce2301d33829bcf commit 051927915d98be61b03119b7bce2301d33829bcf Author: prashant.n <prashant.n@samsung.com> Date: Sat Sep 24 03:38:20 2016 cc: Detach spiral iterator implementation to separate file. Currently SpiralDifferenceIterator implements logic based on directions around the center rect. This logic does not take care of aspect ratios of tile size. To consider the aspect ratios during iteration new logic has to be implanted. Until the new logic gets tested thoroughly we need the current logic. This patch detaches the current spiral iterator to separate classes (SpiralIterator and ReverseSpiralIterator) and moves them to their own files making SpiralDifferenceIterator thinner. Now new logic can be added to SpiralDifferenceIterator based on some runtime switch. This patch also separates out unit tests related to SpiralIterator from tiling_data_unittest.cc to its own file. BUG=613695 TESTS=TilingDataSpiralIteratorTest.* CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Review-Url: https://codereview.chromium.org/2352393002 Cr-Commit-Position: refs/heads/master@{#420823} [modify] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/BUILD.gn [modify] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/BUILD.gn [add] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/reverse_spiral_iterator.cc [add] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/reverse_spiral_iterator.h [add] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/spiral_iterator.cc [add] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/spiral_iterator.h [add] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/spiral_iterator_unittest.cc [modify] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/tiling_data.cc [modify] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/tiling_data.h [modify] https://crrev.com/051927915d98be61b03119b7bce2301d33829bcf/cc/base/tiling_data_unittest.cc
,
Nov 15 2016
For considering aspect ratios, https://codereview.chromium.org/2498383002/ is added which implements tiling iterator based on pyramid sequence. The iterator cost results are as given below (measured on Ubuntu 14.04, i7 4 cores machine). With pyramid sequence *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_16= 137947.578125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_32= 76511.7109375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_64= 47162.99609375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_128= 27428.8203125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_16= 67154.8359375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_32= 56307.26953125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_64= 32311.705078125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_128= 17798.923828125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_16= 16145.591796875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_32= 15190.4052734375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_64= 11827.1142578125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_128= 9451.248046875 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_16= 102962.8359375 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_32= 62303.69140625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_64= 42613.2109375 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_128= 23738.669921875 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_16= 44010.2890625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_32= 36766.83984375 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_64= 25923.08203125 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_128= 16826.423828125 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_16= 11491.5009765625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_32= 10769.197265625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_64= 9052.876953125 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_128= 7199.39892578125 runs/s With spiral iterator (without pyramid sequence) *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_16= 169157.375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_32= 97946.765625 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_64= 57101.88671875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 2_128= 32033.173828125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_16= 69190.3671875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_32= 58077.4453125 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_64= 38567.109375 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 10_128= 22744.51171875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_16= 16227.841796875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_32= 15297.0556640625 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_64= 12956.171875 runs/s *RESULT tile_manager_raster_tile_queue_construct_and_iterate: 50_128= 10680.6796875 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_16= 137091.234375 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_32= 75950.75 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_64= 50263.96875 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 2_128= 27047.78125 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_16= 78565.9140625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_32= 58623.97265625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_64= 37526.19140625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 10_128= 22035.119140625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_16= 24779.541015625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_32= 21959.9453125 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_64= 17531.625 runs/s *RESULT tile_manager_eviction_tile_queue_construct_and_iterate: 50_128= 12932.736328125 runs/s The new algorithm is little-bit slower than previous spiral iterator algorithm (for less number of tiles), but it has a lot of flexibility like considering tile aspect ratio, precedence of coverage directions, manhattan distance like iteration order, etc.
,
Nov 15 2016
It looks like the iteration is 10-20% slower. I don't think that's acceptable for the relatively minor feature set. For context, there is some talk about unifying ganesh tiles to be square as well, which would mean this series of patches would only regress performance. Is there a way that we can use the current iterator code and only slightly adjust it to consider aspect ratio? I understand that the general solution has added benefits, but I don't think the performance hit is worth it.
,
Feb 21 2017
The perf tests mentioned in comment #11 checks the algorithmic time for pyramid sequence over spiral iterator. I've improved this by 5% by sharing the iterator object throughout iteration. Now it is ~ 5%-15% slower. Even tough pyramid sequence is slower, the iteration order returned by it is better than spiral iterator. I triggered few smoothness tests and on those pyramid sequence looks to be little better. Give below are the results for smoothness tests. 1. smoothness.key_silk_cases https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-21_04-03-35 2. smoothness.tough_scrolling_cases https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-21_04-06-38 3. smoothness.tough_scrolling_cases https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-21_04-03-31
,
Feb 21 2017
In above comment, the smoothness tests are triggered on https://codereview.chromium.org/2067213002 , patch set 77 (rebase -> used for smoothness tests).
,
Feb 22 2017
The results for smoothness.key_silk_cases are as given below.
TOT Patch
avg_surface_fps 52.607 53.444
first_gesture_scroll_update_latency 10.678 ms 10.647 ms
frame_lengths 1.082 1.09
frame_time_discrepancy 141.217 ms 151.576 ms
frame_times 18.041 ms 18.159 ms
input_event_latency_discrepancy 111.290 ms 103.630 ms
jank_count 4 4
main_thread_scroll_latency_discrepancy 307.816 ms 282.097 ms
max_frame_delay 6.571 7.111
mean_frame_time 20.720 ms 20.173 ms
mean_input_event_latency 29.581 ms 24.644 ms
mean_main_thread_scroll_latency 18.000 ms 17.626 ms
mean_pixels_approximated 16.782% 17.633%
mean_pixels_checkerboarded 0.350% 0.196%
percentage_smooth 92.046 91.835
queueing_durations 0.213 ms 0.204 ms
,
Feb 23 2017
I triggered few more tests and results are as given below. (All the above and below tests are done android-nexus6 device.) For many of the histograms, patch is better than TOT. Even if there is not much significant improvement, at least patch is not worsening the perf. With added advantages of the pyramid sequence it would outweigh the spiral iterator, IMO. P.S. * means patch is worse than TOT for the given histogram. 1. smoothness.sync_scroll.key_mobile_sites_smooth https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-23_00-52-44 TOT Patch avg_surface_fps 56.641 56.692 first_gesture_scroll_update_latency 45.342 ms 42.528 ms frame_lengths 1.044 1.043 frame_time_discrepancy 88.571 ms 87.743 ms frame_times 17.402 ms 17.384 ms input_event_latency_discrepancy 434.652 ms 452.524 ms * jank_count 3 2 max_frame_delay 3.487 3.538 * mean_frame_time 17.821 ms 17.851 ms * mean_input_event_latency 13.952 ms 13.883 ms mean_pixels_approximated 0.000% 0.000% mean_pixels_checkerboarded 0.264% 0.256% percentage_smooth 94.953 95.264 queueing_durations 0.603 ms 0.594 ms 2. smoothness.key_silk_cases https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-22_06-19-45 TOT Patch avg_surface_fps 52.179 53.321 first_gesture_scroll_update_latency 10.371 ms 11.224 ms * frame_lengths 1.087 1.084 frame_time_discrepancy 142.465 ms 139.433 ms frame_times 18.110 ms 18.073 ms input_event_latency_discrepancy 172.892 ms 91.092 ms jank_coun 4 4 main_thread_scroll_latency_discrepancy 627.640 ms 283.048 ms max_frame_delay 6.357 6.357 mean_frame_time 20.725 ms 19.928 ms mean_input_event_latency 27.873 ms 29.870 ms * mean_main_thread_scroll_latency 18.889 ms 17.752 ms mean_pixels_approximated 17.154% 17.004% mean_pixels_checkerboarded 0.196% 0.161% percentage_smooth 90.835 91.367 queueing_durations 0.208 ms 0.206 ms 3. smoothness.gpu_rasterization.top_25_smooth https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-02-22_06-13-37 TOT Patch avg_surface_fps 57.174 57.696 first_gesture_scroll_update_latency 14.597 ms 17.657 ms * frame_lengths 1.045 1.04 frame_time_discrepancy 75.393 ms 67.510 ms frame_times 17.417 ms 17.340 ms input_event_latency_discrepancy 262.444 ms 227.050 ms jank_count 2 1 main_thread_scroll_latency_discrepancy 2,467.497 ms 2,266.334 ms max_frame_delay 3 2.87 mean_frame_time 17.471 ms 17.356 ms mean_input_event_latency 9.701 ms 9.732 ms * mean_main_thread_scroll_latency 79.559 ms 81.632 ms * mean_pixels_approximated 0.000% 0.000% mean_pixels_checkerboarded 5.643% 5.287% percentage_smooth 96.446 96.615 queueing_durations 2.505 ms 2.493 ms |
||||
►
Sign in to add a comment |
||||
Comment 1 by vmp...@chromium.org
, Jun 9 2016