Transfer Cache should customize allocations based on size of entry |
||||
Issue descriptionCurrently, the transfer cache always uses mapped memory to transfer data to the GPU process. This is the heaviest-weight transfer mechanism, designed for large data blocks. The point at which we allocate memory is here: https://cs.chromium.org/chromium/src/gpu/command_buffer/client/client_transfer_cache.cc?rcl=0e60340df4c5c91b1d53787f0cc94414984ebac0&l=19 In order to more efficiently send smaller items, the cache should dynamically chose between three options: 1) Allocating command buffer memory for very small items (size of a GL command). 2) Allocating transfer buffer memory if possible (all items, if size permits). 3) Allocating mapped memory if transfer buffer is full. Note that option (2) is currently problematic due to the way OOP raster uses the transfer cache (reserving memory then shrinking). Maybe we need an additional oop transfer cache?
,
Mar 1 2018
,
Sep 28
Over to enne@ who is planning on looking at this.
,
Oct 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c8d0d44728928704030f8a901174d649f164cbe5 commit c8d0d44728928704030f8a901174d649f164cbe5 Author: Adrienne Walker <enne@chromium.org> Date: Mon Oct 08 19:55:31 2018 Use transfer buffer for small transfer cache entries OOP-R in RasterImplementation currently allocates the entire free space available in the transfer buffer so that it doesn't have to measure first then serialize and can just optimistically serialize into that space. This means that other transfer buffer consumers can't use it, and are instead forced to use mapped memory. This turns out to not be very efficient for large numbers of small allocations. To sidestep this problem, small transfer cache entries are serialized into the heap when encountered. Then when raster is complete, it shrinks the transfer buffer down to the correct size and these small transfer cache entries are added after it. Then the commands for these entries are submitted first before raster. For example, for three transfer cache tasks (A B C) and one raster (R) the transfer buffer and ring buffer will look like this: transfer buffer ring buffer: R A B C command buffer: A B C R This patch also loosens the restriction on gpu::RingBuffer that there can be only one in use block at any time. This leads to potential exhaustion issues because the ring buffer won't reallocate while there are in use blocks. To avoid this, this optimization is only used when there is room in the ring buffer without waiting. An alternative to this patch would have been to have yet another transfer buffer only for the transfer cache or to rewrite oopr serialization, but both of those are more invasive solutions. On OSX, on the rendering.desktop telemetry benchmark, on the story web_animation_value_type_path, this results in the following results: gpu-r: raster cpu time: 2.011ms gpu cpu time: 12.891ms total time: 14.902ms oop-r: raster cpu time: 7.314ms <- the bug gpu cpu time: 3.804ms total time: 11.118ms oop-r + this patch: raster cpu time: 0.812ms gpu cpu time: 3.901ms total time: 4.713ms Bug: 804380 , 877168 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5 Reviewed-on: https://chromium-review.googlesource.com/c/1262955 Commit-Queue: enne <enne@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#597654} [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_serialize_helper.h [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_unittest.cc [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.cc [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.h [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.cc [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.h [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.cc [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.h [modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/transfer_buffer_unittest.cc
,
Oct 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6bbcf5cf66c5e9540c93857d2fac6a357b273884 commit 6bbcf5cf66c5e9540c93857d2fac6a357b273884 Author: enne <enne@chromium.org> Date: Mon Oct 08 22:33:20 2018 Revert "Use transfer buffer for small transfer cache entries" This reverts commit c8d0d44728928704030f8a901174d649f164cbe5. Reason for revert: causing flaky webgl test failures Original change's description: > Use transfer buffer for small transfer cache entries > > OOP-R in RasterImplementation currently allocates the entire free space > available in the transfer buffer so that it doesn't have to measure > first then serialize and can just optimistically serialize into that > space. This means that other transfer buffer consumers can't use it, > and are instead forced to use mapped memory. This turns out to not > be very efficient for large numbers of small allocations. > > To sidestep this problem, small transfer cache entries are serialized > into the heap when encountered. Then when raster is complete, it > shrinks the transfer buffer down to the correct size and these small > transfer cache entries are added after it. Then the commands for these > entries are submitted first before raster. > > For example, for three transfer cache tasks (A B C) and one raster (R) > the transfer buffer and ring buffer will look like this: > > transfer buffer ring buffer: R A B C > command buffer: A B C R > > This patch also loosens the restriction on gpu::RingBuffer that there > can be only one in use block at any time. This leads to potential > exhaustion issues because the ring buffer won't reallocate while there > are in use blocks. To avoid this, this optimization is only used when > there is room in the ring buffer without waiting. > > An alternative to this patch would have been to have yet another > transfer buffer only for the transfer cache or to rewrite oopr > serialization, but both of those are more invasive solutions. > > On OSX, on the rendering.desktop telemetry benchmark, on the story > web_animation_value_type_path, this results in the following results: > > gpu-r: > raster cpu time: 2.011ms > gpu cpu time: 12.891ms > total time: 14.902ms > > oop-r: > raster cpu time: 7.314ms <- the bug > gpu cpu time: 3.804ms > total time: 11.118ms > > oop-r + this patch: > raster cpu time: 0.812ms > gpu cpu time: 3.901ms > total time: 4.713ms > > Bug: 804380 , 877168 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel > Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5 > Reviewed-on: https://chromium-review.googlesource.com/c/1262955 > Commit-Queue: enne <enne@chromium.org> > Reviewed-by: Antoine Labour <piman@chromium.org> > Cr-Commit-Position: refs/heads/master@{#597654} TBR=enne@chromium.org,piman@chromium.org,ericrk@chromium.org Change-Id: Ia135febae14f3515a8a6447071d3ddd94bd0e89f No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 804380 , 877168 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Reviewed-on: https://chromium-review.googlesource.com/c/1269632 Reviewed-by: enne <enne@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#597708} [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_serialize_helper.h [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_unittest.cc [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.cc [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.h [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.cc [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.h [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.cc [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.h [modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/transfer_buffer_unittest.cc
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/458236cb920e35864c7983b2b7e9c301a29edde9 commit 458236cb920e35864c7983b2b7e9c301a29edde9 Author: Adrienne Walker <enne@chromium.org> Date: Fri Oct 12 22:09:46 2018 Reland "Use transfer buffer for small transfer cache entries" This is a reland of c8d0d44728928704030f8a901174d649f164cbe5 Original change's description: > Use transfer buffer for small transfer cache entries > > OOP-R in RasterImplementation currently allocates the entire free space > available in the transfer buffer so that it doesn't have to measure > first then serialize and can just optimistically serialize into that > space. This means that other transfer buffer consumers can't use it, > and are instead forced to use mapped memory. This turns out to not > be very efficient for large numbers of small allocations. > > To sidestep this problem, small transfer cache entries are serialized > into the heap when encountered. Then when raster is complete, it > shrinks the transfer buffer down to the correct size and these small > transfer cache entries are added after it. Then the commands for these > entries are submitted first before raster. > > For example, for three transfer cache tasks (A B C) and one raster (R) > the transfer buffer and ring buffer will look like this: > > transfer buffer ring buffer: R A B C > command buffer: A B C R > > This patch also loosens the restriction on gpu::RingBuffer that there > can be only one in use block at any time. This leads to potential > exhaustion issues because the ring buffer won't reallocate while there > are in use blocks. To avoid this, this optimization is only used when > there is room in the ring buffer without waiting. > > An alternative to this patch would have been to have yet another > transfer buffer only for the transfer cache or to rewrite oopr > serialization, but both of those are more invasive solutions. > > On OSX, on the rendering.desktop telemetry benchmark, on the story > web_animation_value_type_path, this results in the following results: > > gpu-r: > raster cpu time: 2.011ms > gpu cpu time: 12.891ms > total time: 14.902ms > > oop-r: > raster cpu time: 7.314ms <- the bug > gpu cpu time: 3.804ms > total time: 11.118ms > > oop-r + this patch: > raster cpu time: 0.812ms > gpu cpu time: 3.901ms > total time: 4.713ms > > Bug: 804380 , 877168 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel > Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5 > Reviewed-on: https://chromium-review.googlesource.com/c/1262955 > Commit-Queue: enne <enne@chromium.org> > Reviewed-by: Antoine Labour <piman@chromium.org> > Cr-Commit-Position: refs/heads/master@{#597654} Bug: 804380 , 877168 Change-Id: I5c7b0d3b43c0c6f57eb7e5a9c43b4b8ecb22518a Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel Reviewed-on: https://chromium-review.googlesource.com/c/1277949 Commit-Queue: enne <enne@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#599376} [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_serialize_helper.h [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_unittest.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.h [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.h [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.h [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer_test.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer.cc [modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer_unittest.cc
,
Oct 16
|
||||
►
Sign in to add a comment |
||||
Comment 1 by piman@chromium.org
, Jan 22 2018