New issue
Advanced search Search tips

Issue 804380 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 2
Type: Feature



Sign in to add a comment

Transfer Cache should customize allocations based on size of entry

Project Member Reported by ericrk@chromium.org, Jan 22 2018

Issue description

Currently, the transfer cache always uses mapped memory to transfer data to the GPU process. This is the heaviest-weight transfer mechanism, designed for large data blocks.

The point at which we allocate memory is here: https://cs.chromium.org/chromium/src/gpu/command_buffer/client/client_transfer_cache.cc?rcl=0e60340df4c5c91b1d53787f0cc94414984ebac0&l=19

In order to more efficiently send smaller items, the cache should dynamically chose between three options:
1) Allocating command buffer memory for very small items (size of a GL command).
2) Allocating transfer buffer memory if possible (all items, if size permits).
3) Allocating mapped memory if transfer buffer is full.

Note that option (2) is currently problematic due to the way OOP raster uses the transfer cache (reserving memory then shrinking). Maybe we need an additional oop transfer cache?
 

Comment 1 by piman@chromium.org, Jan 22 2018

To be honest, (3) is not that different from (2). The main difference is that (3) will grow the memory pool if running out of space, whereas (2) will block waiting for the service to make progress and free up some space, so I'm not sure there's a ton of value in trying to make (2) work. (1) can be interesting though, for small items if there are a lot of them.
Cc: khushals...@chromium.org
Owner: enne@chromium.org
Status: Assigned (was: Available)
Over to enne@ who is planning on looking at this.
Project Member

Comment 4 by bugdroid1@chromium.org, Oct 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c8d0d44728928704030f8a901174d649f164cbe5

commit c8d0d44728928704030f8a901174d649f164cbe5
Author: Adrienne Walker <enne@chromium.org>
Date: Mon Oct 08 19:55:31 2018

Use transfer buffer for small transfer cache entries

OOP-R in RasterImplementation currently allocates the entire free space
available in the transfer buffer so that it doesn't have to measure
first then serialize and can just optimistically serialize into that
space.  This means that other transfer buffer consumers can't use it,
and are instead forced to use mapped memory.  This turns out to not
be very efficient for large numbers of small allocations.

To sidestep this problem, small transfer cache entries are serialized
into the heap when encountered.  Then when raster is complete, it
shrinks the transfer buffer down to the correct size and these small
transfer cache entries are added after it.  Then the commands for these
entries are submitted first before raster.

For example, for three transfer cache tasks (A B C) and one raster (R)
the transfer buffer and ring buffer will look like this:

    transfer buffer ring buffer: R A B C
    command buffer: A B C R

This patch also loosens the restriction on gpu::RingBuffer that there
can be only one in use block at any time.  This leads to potential
exhaustion issues because the ring buffer won't reallocate while there
are in use blocks.  To avoid this, this optimization is only used when
there is room in the ring buffer without waiting.

An alternative to this patch would have been to have yet another
transfer buffer only for the transfer cache or to rewrite oopr
serialization, but both of those are more invasive solutions.

On OSX, on the rendering.desktop telemetry benchmark, on the story
web_animation_value_type_path, this results in the following results:

gpu-r:
  raster cpu time: 2.011ms
  gpu cpu time: 12.891ms
  total time: 14.902ms

oop-r:
  raster cpu time: 7.314ms <- the bug
  gpu cpu time: 3.804ms
  total time: 11.118ms

oop-r + this patch:
  raster cpu time: 0.812ms
  gpu cpu time: 3.901ms
  total time: 4.713ms

Bug:  804380 , 877168
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
Reviewed-on: https://chromium-review.googlesource.com/c/1262955
Commit-Queue: enne <enne@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597654}
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/transfer_buffer_unittest.cc

Project Member

Comment 5 by bugdroid1@chromium.org, Oct 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6bbcf5cf66c5e9540c93857d2fac6a357b273884

commit 6bbcf5cf66c5e9540c93857d2fac6a357b273884
Author: enne <enne@chromium.org>
Date: Mon Oct 08 22:33:20 2018

Revert "Use transfer buffer for small transfer cache entries"

This reverts commit c8d0d44728928704030f8a901174d649f164cbe5.

Reason for revert: causing flaky webgl test failures

Original change's description:
> Use transfer buffer for small transfer cache entries
> 
> OOP-R in RasterImplementation currently allocates the entire free space
> available in the transfer buffer so that it doesn't have to measure
> first then serialize and can just optimistically serialize into that
> space.  This means that other transfer buffer consumers can't use it,
> and are instead forced to use mapped memory.  This turns out to not
> be very efficient for large numbers of small allocations.
> 
> To sidestep this problem, small transfer cache entries are serialized
> into the heap when encountered.  Then when raster is complete, it
> shrinks the transfer buffer down to the correct size and these small
> transfer cache entries are added after it.  Then the commands for these
> entries are submitted first before raster.
> 
> For example, for three transfer cache tasks (A B C) and one raster (R)
> the transfer buffer and ring buffer will look like this:
> 
>     transfer buffer ring buffer: R A B C
>     command buffer: A B C R
> 
> This patch also loosens the restriction on gpu::RingBuffer that there
> can be only one in use block at any time.  This leads to potential
> exhaustion issues because the ring buffer won't reallocate while there
> are in use blocks.  To avoid this, this optimization is only used when
> there is room in the ring buffer without waiting.
> 
> An alternative to this patch would have been to have yet another
> transfer buffer only for the transfer cache or to rewrite oopr
> serialization, but both of those are more invasive solutions.
> 
> On OSX, on the rendering.desktop telemetry benchmark, on the story
> web_animation_value_type_path, this results in the following results:
> 
> gpu-r:
>   raster cpu time: 2.011ms
>   gpu cpu time: 12.891ms
>   total time: 14.902ms
> 
> oop-r:
>   raster cpu time: 7.314ms <- the bug
>   gpu cpu time: 3.804ms
>   total time: 11.118ms
> 
> oop-r + this patch:
>   raster cpu time: 0.812ms
>   gpu cpu time: 3.901ms
>   total time: 4.713ms
> 
> Bug:  804380 , 877168
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
> Reviewed-on: https://chromium-review.googlesource.com/c/1262955
> Commit-Queue: enne <enne@chromium.org>
> Reviewed-by: Antoine Labour <piman@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#597654}

TBR=enne@chromium.org,piman@chromium.org,ericrk@chromium.org

Change-Id: Ia135febae14f3515a8a6447071d3ddd94bd0e89f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  804380 , 877168
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/c/1269632
Reviewed-by: enne <enne@chromium.org>
Commit-Queue: enne <enne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597708}
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/transfer_buffer_unittest.cc

Project Member

Comment 6 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/458236cb920e35864c7983b2b7e9c301a29edde9

commit 458236cb920e35864c7983b2b7e9c301a29edde9
Author: Adrienne Walker <enne@chromium.org>
Date: Fri Oct 12 22:09:46 2018

Reland "Use transfer buffer for small transfer cache entries"

This is a reland of c8d0d44728928704030f8a901174d649f164cbe5

Original change's description:
> Use transfer buffer for small transfer cache entries
> 
> OOP-R in RasterImplementation currently allocates the entire free space
> available in the transfer buffer so that it doesn't have to measure
> first then serialize and can just optimistically serialize into that
> space.  This means that other transfer buffer consumers can't use it,
> and are instead forced to use mapped memory.  This turns out to not
> be very efficient for large numbers of small allocations.
> 
> To sidestep this problem, small transfer cache entries are serialized
> into the heap when encountered.  Then when raster is complete, it
> shrinks the transfer buffer down to the correct size and these small
> transfer cache entries are added after it.  Then the commands for these
> entries are submitted first before raster.
> 
> For example, for three transfer cache tasks (A B C) and one raster (R)
> the transfer buffer and ring buffer will look like this:
> 
>     transfer buffer ring buffer: R A B C
>     command buffer: A B C R
> 
> This patch also loosens the restriction on gpu::RingBuffer that there
> can be only one in use block at any time.  This leads to potential
> exhaustion issues because the ring buffer won't reallocate while there
> are in use blocks.  To avoid this, this optimization is only used when
> there is room in the ring buffer without waiting.
> 
> An alternative to this patch would have been to have yet another
> transfer buffer only for the transfer cache or to rewrite oopr
> serialization, but both of those are more invasive solutions.
> 
> On OSX, on the rendering.desktop telemetry benchmark, on the story
> web_animation_value_type_path, this results in the following results:
> 
> gpu-r:
>   raster cpu time: 2.011ms
>   gpu cpu time: 12.891ms
>   total time: 14.902ms
> 
> oop-r:
>   raster cpu time: 7.314ms <- the bug
>   gpu cpu time: 3.804ms
>   total time: 11.118ms
> 
> oop-r + this patch:
>   raster cpu time: 0.812ms
>   gpu cpu time: 3.901ms
>   total time: 4.713ms
> 
> Bug:  804380 , 877168
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
> Reviewed-on: https://chromium-review.googlesource.com/c/1262955
> Commit-Queue: enne <enne@chromium.org>
> Reviewed-by: Antoine Labour <piman@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#597654}

Bug:  804380 , 877168
Change-Id: I5c7b0d3b43c0c6f57eb7e5a9c43b4b8ecb22518a
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/c/1277949
Commit-Queue: enne <enne@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#599376}
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer_test.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer_unittest.cc

Status: Fixed (was: Assigned)

Sign in to add a comment