New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 830084 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

GL_COMMANDS_COMPLETED queries are slow on Windows

Project Member Reported by sunn...@chromium.org, Apr 6 2018

Issue description

One copy raster uses GL_COMMANDS_COMPLETED queries to know when it can reuse staging buffers. See attached ETW trace CPU profile screenshots for pinch-zoom on cnn.com on SurfaceBook with Intel HD 620. HandleEndQueryEXT takes up most of the CPU time for GPU main thread, 715ms out of 1143ms for DoCommands and 5s wall time, which is approx 3x the time for the next item, HandleCopySubTextureCHROMIUM. That time is split up ~50-50 between the glFlush (295ms), and creating the sync object (366ms).

piman@ suggested experimenting with GL_COMMANDS_ISSUED which should be sufficient for shm staging buffers. See attached trace screenshot for that. Total CPU time of DoCommands is 715ms for 9s wall time, and HandleEndQueryEXT doesn't take up any time.

If this is worthwhile, we can switch on staging buffer type, and decide whether to use GL_COMMANDS_COMPLETED or GL_COMMANDS_ISSUED.

wdyt reveman@?

geofflang@ let me know if you need more detailed stacks for ANGLE.
 
commands_completed.png
682 KB View Download
commands_issued.png
541 KB View Download
Having a stack trace into ANGLE would be great but I suspect it will mostly be the cost of creating the D3D query objects and then polling them.  The command decoders both poll pending queries when receiving certain commands so maybe we could be smarter about scheduling the polling.
I think we should be switching to GpuFence framework for improved performance for synchronizing access to GMBs [1],[2]. That is what that framework has been built for.

[1] https://cs.chromium.org/chromium/src/ui/gfx/gpu_fence.h
[2] https://chromium.googlesource.com/chromium/src/+/master/docs/design/gpu_synchronization.md

In an earlier prototype of this framework, I implemented a simple cross platform GpuFence based on shmem. This not only solved the COMMANDS_COMPLETED performance problem but also improved over COMMANDS_ISSUES significantly, as no context switches were needed for the client to detect that a GMB can be reused.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/13a440c5640000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/12982351640000
Project Member

Comment 8 by bugdroid1@chromium.org, Aug 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/370c8bc2082a3665ccc45fef39955fd550170b61

commit 370c8bc2082a3665ccc45fef39955fd550170b61
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Thu Aug 30 19:18:53 2018

Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers

GL_COMMANDS_ISSUED_CHROMIUM is sufficient for shared memory
GpuMemoryBuffers because they're uploaded using glTexImage2D (see
GLImageMemory::CopyTexImage).

This should reduce the number of idle wakeups in the GPU process, and
reduce the CPU time spent on such queries which is quite high on
Windows.

Bug: 830084, 622491
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I8111b3d7f12abbf900383d45e2a74394848e83fd
Reviewed-on: https://chromium-review.googlesource.com/1189126
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#587695}
[modify] https://crrev.com/370c8bc2082a3665ccc45fef39955fd550170b61/cc/raster/one_copy_raster_buffer_provider.cc

Project Member

Comment 9 by bugdroid1@chromium.org, Sep 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6eb9eee331affe92c52a2926f57422bcbf40fa43

commit 6eb9eee331affe92c52a2926f57422bcbf40fa43
Author: Eric Karl <ericrk@chromium.org>
Date: Sat Sep 01 06:52:21 2018

Revert "Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers"

This reverts commit 370c8bc2082a3665ccc45fef39955fd550170b61.

Reason for revert: This is breaking desktop sites (SW raster) on Android when OOP-R is enabled. see crbug.com/879823

Original change's description:
> Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers
> 
> GL_COMMANDS_ISSUED_CHROMIUM is sufficient for shared memory
> GpuMemoryBuffers because they're uploaded using glTexImage2D (see
> GLImageMemory::CopyTexImage).
> 
> This should reduce the number of idle wakeups in the GPU process, and
> reduce the CPU time spent on such queries which is quite high on
> Windows.
> 
> Bug: 830084, 622491
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I8111b3d7f12abbf900383d45e2a74394848e83fd
> Reviewed-on: https://chromium-review.googlesource.com/1189126
> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
> Reviewed-by: Eric Karl <ericrk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#587695}

TBR=sunnyps@chromium.org,ericrk@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 830084, 622491, 879823
Change-Id: I7c78d2cc92e9ad274e8f443969d2a72d2410d5db
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/1199629
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#588274}
[modify] https://crrev.com/6eb9eee331affe92c52a2926f57422bcbf40fa43/cc/raster/one_copy_raster_buffer_provider.cc

Project Member

Comment 10 by bugdroid1@chromium.org, Sep 5

Labels: merge-merged-3538
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5646424ddff0c473c68f8dac3dacc7036bc119bc

commit 5646424ddff0c473c68f8dac3dacc7036bc119bc
Author: Eric Karl <ericrk@chromium.org>
Date: Tue Sep 04 22:22:22 2018

[merge to M70] Revert "Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers"

This reverts commit 370c8bc2082a3665ccc45fef39955fd550170b61.

Reason for revert: This is breaking desktop sites (SW raster) on Android when OOP-R is enabled. see crbug.com/879823

Original change's description:
> Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers
> 
> GL_COMMANDS_ISSUED_CHROMIUM is sufficient for shared memory
> GpuMemoryBuffers because they're uploaded using glTexImage2D (see
> GLImageMemory::CopyTexImage).
> 
> This should reduce the number of idle wakeups in the GPU process, and
> reduce the CPU time spent on such queries which is quite high on
> Windows.
> 
> Bug: 830084, 622491
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I8111b3d7f12abbf900383d45e2a74394848e83fd
> Reviewed-on: https://chromium-review.googlesource.com/1189126
> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
> Reviewed-by: Eric Karl <ericrk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#587695}

TBR=sunnyps@chromium.org,ericrk@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 830084, 622491, 879823, 879937
Change-Id: I7c78d2cc92e9ad274e8f443969d2a72d2410d5db
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/1199629
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Eric Karl <ericrk@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#588274}(cherry picked from commit 6eb9eee331affe92c52a2926f57422bcbf40fa43)
Reviewed-on: https://chromium-review.googlesource.com/1204114
Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/branch-heads/3538@{#36}
Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811}
[modify] https://crrev.com/5646424ddff0c473c68f8dac3dacc7036bc119bc/cc/raster/one_copy_raster_buffer_provider.cc

Project Member

Comment 11 by bugdroid1@chromium.org, Sep 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/80d0a681332ba4c48d10e11f17d833eea89758ce

commit 80d0a681332ba4c48d10e11f17d833eea89758ce
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Thu Sep 13 18:57:47 2018

Reland "Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers"

This is a reland of 370c8bc2082a3665ccc45fef39955fd550170b61

This is similar to the original change, except in two significant ways:
1. RasterImplementation now supports COMMANDS_ISSUED queries, and
2. The decision whether to use queries is handled in a single place by
   OneCopyRasterBufferProvider, and communicated to StagingBufferPool by
   setting |query_id| on the StagingBuffer.

Original change's description:
> Use GL_COMMANDS_ISSUED_CHROMIUM query for shared memory GpuMemoryBuffers
>
> GL_COMMANDS_ISSUED_CHROMIUM is sufficient for shared memory
> GpuMemoryBuffers because they're uploaded using glTexImage2D (see
> GLImageMemory::CopyTexImage).
>
> This should reduce the number of idle wakeups in the GPU process, and
> reduce the CPU time spent on such queries which is quite high on
> Windows.
>
> Bug: 830084, 622491
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I8111b3d7f12abbf900383d45e2a74394848e83fd
> Reviewed-on: https://chromium-review.googlesource.com/1189126
> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
> Reviewed-by: Eric Karl <ericrk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#587695}

Bug: 830084, 622491
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: Ia8bec687f42fef8242689fb014c1d18d2194b1e3
Reviewed-on: https://chromium-review.googlesource.com/1205990
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Jonathan Backer <backer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#591097}
[modify] https://crrev.com/80d0a681332ba4c48d10e11f17d833eea89758ce/cc/DEPS
[modify] https://crrev.com/80d0a681332ba4c48d10e11f17d833eea89758ce/cc/raster/one_copy_raster_buffer_provider.cc
[modify] https://crrev.com/80d0a681332ba4c48d10e11f17d833eea89758ce/cc/raster/staging_buffer_pool.cc
[modify] https://crrev.com/80d0a681332ba4c48d10e11f17d833eea89758ce/cc/raster/staging_buffer_pool.h
[modify] https://crrev.com/80d0a681332ba4c48d10e11f17d833eea89758ce/gpu/command_buffer/client/raster_implementation.cc

Sign in to add a comment