New issue
Advanced search Search tips

Issue 877168 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

1%-16379.8% regression in loading.desktop at 585103:585228

Project Member Reported by majidvp@chromium.org, Aug 23

Issue description

Regression on mac low-end bot.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=877168

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=76f412bc1670ebc96ec29b959c979cc68b89ce5237155362898d216329a63b56


Bot(s) for this bug's original alert(s):

Android Nexus5 Perf
android-nexus5x-perf
mac-10_12_laptop_low_end-perf
mac-10_13_laptop_high_end-perf
Cc: enne@chromium.org
Owner: enne@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/139811b1640000

Turn on OOP Raster on mac bots by enne@chromium.org
https://chromium.googlesource.com/chromium/src/+/ee35f329c8e615505326a3b4df9ae4b7764037ae
2.662e+05 → 4.387e+07 (+4.36e+07)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Cc: khushals...@chromium.org
Components: Internals>Compositing>OOP-Raster
This has some Android bugs duped in as well, which are definitely unrelated to the mac regression.
The time to first paint cold regressions all seem like dupes of  issue 877587 .

There's a number of memory issues here (gpu, skia, cc effective size) that need investigating.

There's a number of mean frame time, input event latency, percentage smooth issues that need investigating.

blink perf bindings seem unrelated.

cpu_time_percentage_avg/TrivialFullscreenVideoPageSharedPageState appears to have recovered.
Here's a set of representative examples from each regression category:

ChromiumPerf/Android Nexus5 Perf/system_health.memory_mobile / memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg / load_tools / load_tools_dropbox
  graph: https://chromeperf.appspot.com/report?sid=2174ca7332af0d54b1efae111c34e42b4170461199e5c968972a9bb9cfdc47eb&rev=585228
  pinpoint bisect: https://pinpoint-dot-chromeperf.appspot.com/job/15dfff97640000

ChromiumPerf/mac-10_13_laptop_high_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg / browse_media / browse_media_youtube
  graph: https://chromeperf.appspot.com/report?sid=77c618b51edab76fc747a3e167f8c843b7476aa7eadaff618905d708cd43a678&start_rev=584306&end_rev=592771
  pinpoint bisect: https://pinpoint-dot-chromeperf.appspot.com/job/150390c7640000

ChromiumPerf/mac-10_13_laptop_high_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_os:system_memory:private_footprint_size_avg / browse_media / browse_media_pinterest
  graph: https://chromeperf.appspot.com/report?sid=e9523f720237fda34c7be4288976db8b5319bc9b4cd3fef917cb8d59d0f47fda&rev=585143
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/1328ec3f640000

ChromiumPerf/mac-10_12_laptop_low_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:skia:effective_size_avg / browse_media / browse_media_flickr_infinite_scroll
  graph: https://chromeperf.appspot.com/report?sid=ebb114f166b1861a9165b1624e9a6e451ef301f2ff0311e2e1c5f8832f8bad0d&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/1744b7a8e40000

ChromiumPerf/mac-10_12_laptop_low_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:skia:effective_size_avg / browse_news / browse_news_flipboard
  graph: https://chromeperf.appspot.com/report?sid=d18c4442b50c8d289c332c550afdba1541519bff9d5d36189075d7fc194c5bda&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/132d9fa7640000

ChromiumPerf/mac-10_13_laptop_high_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg / load_news / load_news_bbc
  graph: https://chromeperf.appspot.com/report?sid=4e33ffef8ac779121d2a282a8bfc6fa10dfb98e466b9a2f85543ad5b058315c8&rev=585143
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/127b92b8e40000

ChromiumPerf/mac-10_13_laptop_high_end-perf/system_health.memory_desktop / memory:chrome:all_processes:reported_by_chrome:malloc:allocated_objects_size_avg / browse_news / browse_news_reddit
  graph: https://chromeperf.appspot.com/report?sid=d0e401e8d7d100680e12dc6f6f059bb4f4d6eca9f96e5585fcef27f8ee965443&rev=585143
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/139cc597640000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / thread_raster_cpu_time_per_frame / web_animation_value_type_path
  graph: https://chromeperf.appspot.com/report?sid=a05aa0a4c1b9279003f6de7d6d37cd888430187960bfb0751736e82026a2f746&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/11d4f3db640000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / input_event_latency_discrepancy / twitter_2018
  graph: https://chromeperf.appspot.com/report?sid=b41fb48b73a591cb3d5f189f9b2a3c58567815a971fcf9483de3573352b73694&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/11d47670e40000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / mean_frame_time_renderer_compositor / web_animation_value_type_path
  graph: https://chromeperf.appspot.com/report?sid=bf6fb2acdab99c56da8c8e99f7590cdefd1364ba32d39930953579d8d629ace2&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/17b05188e40000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / mean_frame_time / css_value_type_path
  graph: https://chromeperf.appspot.com/report?sid=a863f344166703e2f26524f68a89d060b40e8e42d7aae257d408c61ab0a97b6f&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/16f796a4e40000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / percentage_smooth / css_value_type_shadow
  graph: https://chromeperf.appspot.com/report?sid=87366d38d304155b0afed5d01f14f7e6cd3480ab4342eba02983783f09017799&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/131626e0e40000

ChromiumPerf/mac-10_12_laptop_low_end-perf/rendering.desktop / percentage_smooth / web_animation_value_type_shadow
  graph: https://chromeperf.appspot.com/report?sid=16f9a635a356b94cca6bc2ba64cd55a1c04078206f0dd6d332390aa22bef16f1&rev=585139
  pinpoint trace: https://pinpoint-dot-chromeperf.appspot.com/job/15d47670e40000
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/150390c7640000

Turn on OOP Raster on mac bots by enne@chromium.org
https://chromium.googlesource.com/chromium/src/+/ee35f329c8e615505326a3b4df9ae4b7764037ae
2.844e+07 → 5.297e+07 (+2.453e+07)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
Cc: twelling...@chromium.org
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/15dfff97640000

Remove Modern flag and hardcode #isChromeModernDesignEnabled to true by twellington@chromium.org
https://chromium.googlesource.com/chromium/src/+/5955e6d67f3799a0cdd9dba7df8fa5460499f99f
2.19e+06 → 2.279e+06 (+8.926e+04)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
Cc: -twelling...@chromium.org
The change in #11 is for Android only (it wouldn't affect a desktop loading metric). There was detected gpu memory regression tracked in 877247.

Removing myself as I don't think my change is applicable to this bug. Please feel free to re-add if that's not the case.
Agreed that  issue 877247  captures the Android changes that were grouped up in this bug, thanks!
Re: "thread_raster_cpu_time_per_frame / web_animation_value_type_path".  Looking at the traces, there's nothing that pops out (e.g. image decodes, etc).  It's just that everything bottoms out in ~10ms RasterCHROMIUM vs ~2ms GpuRasterBuffer::Playback.  I think some more directed profiling on Mac will be required to understand this one.
Cc: ericrk@chromium.org
Re: "thread_raster_cpu_time_per_frame / web_animation_value_type_path".  Running through instruments makes this looks like the majority of the cost is mapped memory when allocating in the transfer cache.

It looks like there's not a lot of paths going on.  There's ~630 paths, where each one repeats twice (so the second one is cached).  The cached path is 1% of the time vs 95% of the time in creating the transfer cache entry.  It looks like paths aren't reused from frame to frame.
Screen Shot 2018-09-27 at 2.23.14 PM.png
336 KB View Download
I guess this is where we need to consider the optimization for inlining the data for small transfer cache entries in the command buffer instead of using mapped memory?
Re: "thread_raster_cpu_time_per_frame / web_animation_value_type_path"

Each path is also only 44 bytes, so I think it's just that "300 transfer cache transactions is too many".  I'll try reverting the "use transfer cache for paths and seeing what that looks like on the bots.
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c8d0d44728928704030f8a901174d649f164cbe5

commit c8d0d44728928704030f8a901174d649f164cbe5
Author: Adrienne Walker <enne@chromium.org>
Date: Mon Oct 08 19:55:31 2018

Use transfer buffer for small transfer cache entries

OOP-R in RasterImplementation currently allocates the entire free space
available in the transfer buffer so that it doesn't have to measure
first then serialize and can just optimistically serialize into that
space.  This means that other transfer buffer consumers can't use it,
and are instead forced to use mapped memory.  This turns out to not
be very efficient for large numbers of small allocations.

To sidestep this problem, small transfer cache entries are serialized
into the heap when encountered.  Then when raster is complete, it
shrinks the transfer buffer down to the correct size and these small
transfer cache entries are added after it.  Then the commands for these
entries are submitted first before raster.

For example, for three transfer cache tasks (A B C) and one raster (R)
the transfer buffer and ring buffer will look like this:

    transfer buffer ring buffer: R A B C
    command buffer: A B C R

This patch also loosens the restriction on gpu::RingBuffer that there
can be only one in use block at any time.  This leads to potential
exhaustion issues because the ring buffer won't reallocate while there
are in use blocks.  To avoid this, this optimization is only used when
there is room in the ring buffer without waiting.

An alternative to this patch would have been to have yet another
transfer buffer only for the transfer cache or to rewrite oopr
serialization, but both of those are more invasive solutions.

On OSX, on the rendering.desktop telemetry benchmark, on the story
web_animation_value_type_path, this results in the following results:

gpu-r:
  raster cpu time: 2.011ms
  gpu cpu time: 12.891ms
  total time: 14.902ms

oop-r:
  raster cpu time: 7.314ms <- the bug
  gpu cpu time: 3.804ms
  total time: 11.118ms

oop-r + this patch:
  raster cpu time: 0.812ms
  gpu cpu time: 3.901ms
  total time: 4.713ms

Bug:  804380 , 877168
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
Reviewed-on: https://chromium-review.googlesource.com/c/1262955
Commit-Queue: enne <enne@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597654}
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/c8d0d44728928704030f8a901174d649f164cbe5/gpu/command_buffer/client/transfer_buffer_unittest.cc

Project Member

Comment 19 by bugdroid1@chromium.org, Oct 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6bbcf5cf66c5e9540c93857d2fac6a357b273884

commit 6bbcf5cf66c5e9540c93857d2fac6a357b273884
Author: enne <enne@chromium.org>
Date: Mon Oct 08 22:33:20 2018

Revert "Use transfer buffer for small transfer cache entries"

This reverts commit c8d0d44728928704030f8a901174d649f164cbe5.

Reason for revert: causing flaky webgl test failures

Original change's description:
> Use transfer buffer for small transfer cache entries
> 
> OOP-R in RasterImplementation currently allocates the entire free space
> available in the transfer buffer so that it doesn't have to measure
> first then serialize and can just optimistically serialize into that
> space.  This means that other transfer buffer consumers can't use it,
> and are instead forced to use mapped memory.  This turns out to not
> be very efficient for large numbers of small allocations.
> 
> To sidestep this problem, small transfer cache entries are serialized
> into the heap when encountered.  Then when raster is complete, it
> shrinks the transfer buffer down to the correct size and these small
> transfer cache entries are added after it.  Then the commands for these
> entries are submitted first before raster.
> 
> For example, for three transfer cache tasks (A B C) and one raster (R)
> the transfer buffer and ring buffer will look like this:
> 
>     transfer buffer ring buffer: R A B C
>     command buffer: A B C R
> 
> This patch also loosens the restriction on gpu::RingBuffer that there
> can be only one in use block at any time.  This leads to potential
> exhaustion issues because the ring buffer won't reallocate while there
> are in use blocks.  To avoid this, this optimization is only used when
> there is room in the ring buffer without waiting.
> 
> An alternative to this patch would have been to have yet another
> transfer buffer only for the transfer cache or to rewrite oopr
> serialization, but both of those are more invasive solutions.
> 
> On OSX, on the rendering.desktop telemetry benchmark, on the story
> web_animation_value_type_path, this results in the following results:
> 
> gpu-r:
>   raster cpu time: 2.011ms
>   gpu cpu time: 12.891ms
>   total time: 14.902ms
> 
> oop-r:
>   raster cpu time: 7.314ms <- the bug
>   gpu cpu time: 3.804ms
>   total time: 11.118ms
> 
> oop-r + this patch:
>   raster cpu time: 0.812ms
>   gpu cpu time: 3.901ms
>   total time: 4.713ms
> 
> Bug:  804380 , 877168
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
> Reviewed-on: https://chromium-review.googlesource.com/c/1262955
> Commit-Queue: enne <enne@chromium.org>
> Reviewed-by: Antoine Labour <piman@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#597654}

TBR=enne@chromium.org,piman@chromium.org,ericrk@chromium.org

Change-Id: Ia135febae14f3515a8a6447071d3ddd94bd0e89f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  804380 , 877168
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/c/1269632
Reviewed-by: enne <enne@chromium.org>
Commit-Queue: enne <enne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597708}
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/6bbcf5cf66c5e9540c93857d2fac6a357b273884/gpu/command_buffer/client/transfer_buffer_unittest.cc

Project Member

Comment 20 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/458236cb920e35864c7983b2b7e9c301a29edde9

commit 458236cb920e35864c7983b2b7e9c301a29edde9
Author: Adrienne Walker <enne@chromium.org>
Date: Fri Oct 12 22:09:46 2018

Reland "Use transfer buffer for small transfer cache entries"

This is a reland of c8d0d44728928704030f8a901174d649f164cbe5

Original change's description:
> Use transfer buffer for small transfer cache entries
> 
> OOP-R in RasterImplementation currently allocates the entire free space
> available in the transfer buffer so that it doesn't have to measure
> first then serialize and can just optimistically serialize into that
> space.  This means that other transfer buffer consumers can't use it,
> and are instead forced to use mapped memory.  This turns out to not
> be very efficient for large numbers of small allocations.
> 
> To sidestep this problem, small transfer cache entries are serialized
> into the heap when encountered.  Then when raster is complete, it
> shrinks the transfer buffer down to the correct size and these small
> transfer cache entries are added after it.  Then the commands for these
> entries are submitted first before raster.
> 
> For example, for three transfer cache tasks (A B C) and one raster (R)
> the transfer buffer and ring buffer will look like this:
> 
>     transfer buffer ring buffer: R A B C
>     command buffer: A B C R
> 
> This patch also loosens the restriction on gpu::RingBuffer that there
> can be only one in use block at any time.  This leads to potential
> exhaustion issues because the ring buffer won't reallocate while there
> are in use blocks.  To avoid this, this optimization is only used when
> there is room in the ring buffer without waiting.
> 
> An alternative to this patch would have been to have yet another
> transfer buffer only for the transfer cache or to rewrite oopr
> serialization, but both of those are more invasive solutions.
> 
> On OSX, on the rendering.desktop telemetry benchmark, on the story
> web_animation_value_type_path, this results in the following results:
> 
> gpu-r:
>   raster cpu time: 2.011ms
>   gpu cpu time: 12.891ms
>   total time: 14.902ms
> 
> oop-r:
>   raster cpu time: 7.314ms <- the bug
>   gpu cpu time: 3.804ms
>   total time: 11.118ms
> 
> oop-r + this patch:
>   raster cpu time: 0.812ms
>   gpu cpu time: 3.901ms
>   total time: 4.713ms
> 
> Bug:  804380 , 877168
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
> Change-Id: I586cbca2acb13b8de7d3490c5eb5d6b415f6eda5
> Reviewed-on: https://chromium-review.googlesource.com/c/1262955
> Commit-Queue: enne <enne@chromium.org>
> Reviewed-by: Antoine Labour <piman@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#597654}

Bug:  804380 , 877168
Change-Id: I5c7b0d3b43c0c6f57eb7e5a9c43b4b8ecb22518a
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Reviewed-on: https://chromium-review.googlesource.com/c/1277949
Commit-Queue: enne <enne@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#599376}
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_serialize_helper.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/client_transfer_cache.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/raster_implementation.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer.h
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/ring_buffer_test.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer.cc
[modify] https://crrev.com/458236cb920e35864c7983b2b7e9c301a29edde9/gpu/command_buffer/client/transfer_buffer_unittest.cc

Owner: khushals...@chromium.org
Khushal said he was going to investigate mac oopr issues, so assigning this bug to him.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/15b00c51940000

All of the runs failed. The most common error (20/20 runs) was:
SwarmingTaskError: The swarming task failed with state "BOT_DIED".
I caught up late on the thread, had started the bisect for the android regression before I realized it was a dupe of  issue 877247 . I've re-triaged those alerts to the same issue to avoid any confusion.
Starting with memory regressions: The one in gpu:effective_size_avg seems to be all transfer buffer memory. Its not totally unexpected, OOP raster does use the transfer buffer instead of command buffer for raster command serialization. But we've had a lot of changes in this area since this change, including some for auto-shrinking the buffer, so I'll do a local run to see what the current state is.
I did a local run for browse:media:imgur to compare GPU and OOP raster and looked at memory:chrome:all_processes:reported_by_os:private_footprint_size. The value for OOP was 429M and GPU 435M over 3 runs. The slight difference in OOP was one run there reporting 362M, otherwise all runs in both the cases reported 430M. Going to try a pinpoint run for this benchmark.
I tried a local run for memory.desktop benchmark and case TrivialGifPageSharedPageState and interestingly, OOP had better numbers for memory:chrome:all_processes:reported_by_os:system_memory:private_footprint_size that GPU. This was over 5 runs and the results were consistent across all runs.

OOP was at ~120M while GPU is at ~150M.

I tried digging through the sub-categories in the 2 traces to see if I could find something to explain the difference. And for the most part they look the same. There is some additional cc resource memory in the GPU process in the OOP case, which I'm assuming is the display compositor (likely a timing thing). Skia has 8M worth of GPU resources in the OOP case, since this memory is cleaned up by an idle time, again something that can be affected by timing because the cleanup moved from renderer to GPU process. Lastly there is 16M worth of extra transfer buffer memory in the OOP case which while not surprising I want to understand better. In comparison GPU just has an addition 1M worth of command buffer memory.

So all in all, OOP has more reported things from chrome categories but GPU has more memory reported_by_os.
Also, for gpu:effective_size_avg the difference due to transfer buffer memory is what is reported by the renderer. The gpu:effective_size_avg reported by the GPU process also has a difference but that's just a reporting change. This memory used to be reported in the renderer under cc:images but now is in the GPU process under gpu:transfer_cache. The delta between the 2 lines up that its just a reporting change.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1522057b940000

guid
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1283d477940000

guid

Sign in to add a comment