New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742345 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocked on:
issue 748841



Sign in to add a comment

13.1%-15.2% regression in performance_browser_tests at 485606:485695

Project Member Reported by pmeenan@chromium.org, Jul 13 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 13 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=742345

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=2a482b609fb91176ff2d92785c46be174f92eda3f620c3882ac4e4c08f4afd79


Bot(s) for this bug's original alert(s):

chromium-rel-mac11-air
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 13 2017

Cc: sunn...@chromium.org
Owner: sunn...@chromium.org

=== Auto-CCing suspected CL author sunnyps@chromium.org ===

Hi sunnyps@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Sunny Sachanandani
  Commit : 63fbdcc12a7ef762e0652c972986fe74835ba00e
  Date   : Tue Jul 11 17:26:47 2017
  Subject: gpu: Enable gpu scheduler by default.

Bisect Details
  Configuration: mac_air_perf_bisect
  Benchmark    : performance_browser_tests
  Metric       : TabCapturePerformance_comp_gpu_novsync/Capture
  Change       : 12.57% | 16.795059 -> 18.9054815

Revision             Result                   N
chromium@485605      16.7951 +- 0.174227      6      good
chromium@485650      16.8906 +- 0.228688      6      good
chromium@485662      16.8738 +- 0.236854      6      good
chromium@485663      18.7626 +- 0.803885      6      bad       <--
chromium@485664      18.7178 +- 1.23281       6      bad
chromium@485665      18.7674 +- 0.672383      6      bad
chromium@485668      18.9298 +- 1.07151       6      bad
chromium@485673      18.5953 +- 0.88885       6      bad
chromium@485695      18.9055 +- 1.34249       6      bad

To Run This Test
  ./src/out/Release/performance_browser_tests --test-launcher-print-test-stdio=always --enable-gpu

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8974178120078147024


For feedback, file a bug with component Speed>Bisection
Cc: m...@chromium.org
Status: Assigned (was: Untriaged)
cc miu@

I enabled this finch feature on waterfall bots (perf bots included) and developer builds over 3 weeks ago. https://chromium-review.googlesource.com/c/517412/

I got some regressions from perf bots then. Do you know why this benchmark only regressed after I set the feature to enabled everywhere? Is the waterfall field trial config not applying here?
Is this using is_chrome_branded = true GN arg by any chance? Because that would prevent the waterfall fieldtrial config from being applied.
Cc: nedngu...@google.com
+Ned: do the field trial configs only work for telemetry (not c++ perf tests)?
miu@ is the owner of performance_browser_tests and can answer this. I have no knowledge on how the C++ perf tests are currently implemented.

Comment 8 by m...@chromium.org, Jul 14 2017

Unfortunately, I don't know anything about when fieldtrial configs are turned on. :) However, it seems that, for whatever reason, the fieldtrial config was not being applied. I'm not even sure what the policy should be: Should the Chrome Perf Infra be testing what's live to end-users, or what we want to make live in the future? I always assumed it was the former...

Back to the issue at-hand: It seems that the latency for Capture on MacBook Air increased due to the GPU scheduler. So, for some reason the scheduler increased latency as a result of its scheduling decisions, and maybe this is WAI? I don't know anything about the GPU Scheduler, so I can't say what is good or bad behavior here.

For tab capture, what matters most is that, over a time window of ~200ms, the number of frames captured has remained the same. And, other metrics, such as dropped frames, capture resolution, etc.; should not be regressing. In other words, the throughput of GPU processing and data xfer, while possibly multiple frames are in-flight, should not be regressing.
Canvas hangs in WaitForTokenInRange (for BufferData called from Skia Ganesh). In the gpu process, the canvas command buffer seems to run ~1 frame behind the display compositor context. The WaitForTokenInRange unblocks when the previous frame's commands for canvas run. Ideally we should have two frames worth of space in the ring buffer but that doesn't seem to be the case for this benchmark. I'll add some logging to verify that we're close to hitting the ring buffer memory limit even without the scheduler.

I also investigated if we had issues with sync tokens but that doesn't seem to be the case. The only sync token waits are by the display compositor (indirect waits by the scheduler) when it waits for the canvas frame from 2 frames ago. Those sync token waits are short because the canvas commands run quickly on the gpu process.

Attaching traces with and without the scheduler on macbook pro. The scheduler trace also has flow events for command buffer flushes.

Stack trace for WaitForTokenInRange:
[93925:775:0714/150821.934166:INFO:command_buffer_proxy_impl.cc(347)] 0   libbase.dylib                       0x00000001057e5a2c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   libgpu.dylib                        0x0000000110bea709 gpu::CommandBufferProxyImpl::WaitForTokenInRange(int, int) + 105
2   libgpu.dylib                        0x0000000110ae8af3 gpu::CommandBufferHelper::WaitForToken(int) + 163
3   libgpu.dylib                        0x0000000110aeb4da gpu::RingBuffer::FreeOldestBlock() + 58
4   libgpu.dylib                        0x0000000110aeb588 gpu::RingBuffer::Alloc(unsigned int) + 56
5   libgpu.dylib                        0x0000000110aecfd1 gpu::ScopedTransferBufferPtr::Reset(unsigned int) + 97
6   libgles2_implementation.dylib       0x00000001175e56d8 gpu::gles2::GLES2Implementation::BufferDataHelper(unsigned int, long, void const*, unsigned int) + 344
7   libskia.dylib                       0x000000010c0ca735 GrGLBuffer::onUpdateData(void const*, unsigned long) + 133
8   libskia.dylib                       0x000000010c003ef7 GrBufferAllocPool::flushCpuData(GrBufferAllocPool::BufferBlock const&, unsigned long) + 199
9   libskia.dylib                       0x000000010c0042ae GrBufferAllocPool::createBlock(unsigned long) + 382
10  libskia.dylib                       0x000000010c004099 GrBufferAllocPool::makeSpace(unsigned long, unsigned long, GrBuffer const**, unsigned long*) + 153
11  libskia.dylib                       0x000000010c00491c GrVertexBufferAllocPool::makeSpace(unsigned long, int, GrBuffer const**, int*) + 44
12  libskia.dylib                       0x000000010c04504f (anonymous namespace)::AAConvexPathOp::onPrepareDraws(GrMeshDrawOp::Target*) const + 1535
13  libskia.dylib                       0x000000010c05e82a GrMeshDrawOp::onPrepare(GrOpFlushState*) + 26
14  libskia.dylib                       0x000000010c029b79 GrRenderTargetOpList::prepareOps(GrOpFlushState*) + 361
15  libskia.dylib                       0x000000010c00cb99 GrDrawingManager::internalFlush(GrSurfaceProxy*, GrResourceCache::FlushType) + 841
16  libskia.dylib                       0x000000010c00cec3 GrDrawingManager::prepareSurfaceForExternalIO(GrSurfaceProxy*) + 67
17  libskia.dylib                       0x000000010c028269 GrRenderTargetContext::prepareForExternalIO(int, GrBackendSemaphore*) + 713
18  libblink_platform.dylib             0x00000001153cc7d2 blink::Canvas2DLayerBridge::Flush() + 114
19  libblink_platform.dylib             0x00000001153ccf39 blink::Canvas2DLayerBridge::NewImageSnapshot(blink::AccelerationHint, blink::SnapshotReason) + 73
20  libblink_platform.dylib             0x00000001153cccf2 blink::Canvas2DLayerBridge::PrepareTextureMailbox(viz::TextureMailbox*, std::__1::unique_ptr<cc::SingleReleaseCallback, std::__1::default_delete<cc::SingleReleaseCallback> >*) + 146
21  libcc.dylib                         0x000000011079704b cc::TextureLayer::Update() + 107
22  libcc.dylib                         0x0000000110849c19 cc::LayerTreeHost::DoUpdateLayers(cc::Layer*) + 1145
23  libcc.dylib                         0x00000001108496b6 cc::LayerTreeHost::UpdateLayers() + 54
24  libcc.dylib                         0x000000011088f6d6 cc::ProxyMain::BeginMainFrame(std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> >) + 1046
25  libcc.dylib                         0x000000011088e6e5 void base::internal::Invoker<base::internal::BindState<void (cc::ProxyMain::*)(std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> >), base::WeakPtr<cc::ProxyMain>, base::internal::PassedWrapper<std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> > > >, void ()>::RunImpl<void (cc::ProxyMain::*)(std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> >), std::__1::tuple<base::WeakPtr<cc::ProxyMain>, base::internal::PassedWrapper<std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> > > >, 0ul, 1ul>(void (cc::ProxyMain::*&&)(std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> >), std::__1::tuple<base::WeakPtr<cc::ProxyMain>, base::internal::PassedWrapper<std::__1::unique_ptr<cc::BeginMainFrameAndCommitState, std::__1::default_delete<cc::BeginMainFrameAndCommitState> > > >&&, base::IndexSequence<0ul, 1ul>) + 165
26  libbase.dylib                       0x00000001057e6362 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 242
27  libblink_platform.dylib             0x00000001154a5b84 blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue(blink::scheduler::internal::WorkQueue*, bool, blink::scheduler::LazyNow, base::TimeTicks*) + 1364
28  libblink_platform.dylib             0x00000001154a3be2 blink::scheduler::TaskQueueManager::DoWork(bool) + 434
29  libbase.dylib                       0x00000001057e6362 base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) + 242
30  libbase.dylib                       0x000000010581990b base::MessageLoop::RunTask(base::PendingTask*) + 507
31  libbase.dylib                       0x0000000105819cf5 base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) + 213
32  libbase.dylib                       0x000000010581a049 base::MessageLoop::DoWork() + 425
33  libbase.dylib                       0x000000010581d00a base::MessagePumpCFRunLoopBase::RunWork() + 42
34  libbase.dylib                       0x000000010580a42a base::mac::CallWithEHFrame(void () block_pointer) + 10
35  libbase.dylib                       0x000000010581c92f base::MessagePumpCFRunLoopBase::RunWorkSource(void*) + 63
36  CoreFoundation                      0x00007fff997bc321 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
37  CoreFoundation                      0x00007fff9979d21d __CFRunLoopDoSources0 + 557
38  CoreFoundation                      0x00007fff9979c716 __CFRunLoopRun + 934
39  CoreFoundation                      0x00007fff9979c114 CFRunLoopRunSpecific + 420
40  Foundation                          0x00007fff9b1af4e2 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 277
41  libbase.dylib                       0x000000010581d76e base::MessagePumpNSRunLoop::DoRun(base::MessagePump::Delegate*) + 126
42  libbase.dylib                       0x000000010581c17c base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 92
43  libbase.dylib                       0x00000001058489d3 base::RunLoop::Run() + 51
44  libcontent.dylib                    0x000000010e0d7d41 content::RendererMain(content::MainFunctionParams const&) + 641
45  libcontent.dylib                    0x000000010e2088f0 content::ContentMainRunnerImpl::Run() + 368
46  libembedder.dylib                   0x000000010bc8738d service_manager::Main(service_manager::MainParams const&) + 2445
47  libcontent.dylib                    0x000000010e207e24 content::ContentMain(content::ContentMainParams const&) + 68
48  libchrome_dll.dylib                 0x0000000105b54ac8 ChromeMain + 168
49  Chromium Helper                     0x00000001057a69f3 main + 1571
50  libdyld.dylib                       0x00007fffaef16235 start + 1 
no scheduler.json.zip
3.6 MB Download
scheduler.json.zip
3.9 MB Download
Cc: piman@chromium.org vmi...@chromium.org
+piman@ and vmiura@ who suggested that we're running out of ring buffer memory
Status: Started (was: Assigned)
CL for dumping traces from the benchmark and extra traces:
https://chromium-review.googlesource.com/c/572158/

The test to run is TabCapturePerformance.Performance/3

Blockedon: 748841
Status: Assigned (was: Started)
sunnyps: any update here?
sunnyps: ping on this bug?
The benchmark has recovered. I expected that because of a scheduling change I made (https://chromium-review.googlesource.com/c/chromium/src/+/706129). However, the improvement is in this range: https://chromium.googlesource.com/chromium/src/+log/c9e738432a1bce852e99ba4deb9f9dcac8450fdd%5E..2e3e5e893c8cfd98aca703f32ec0f97fafaf5f5f?pretty=fuller&n=1000

I've started a bisect.
Project Member

Comment 18 by 42576172...@developer.gserviceaccount.com, Oct 26 2017

๐Ÿ“ Pinpoint job started.
https://pinpoint-dot-chromeperf.appspot.com/job/13f687b0f80000
Project Member

Comment 19 by 42576172...@developer.gserviceaccount.com, Oct 26 2017

Cc: fsam...@chromium.org eseckler@chromium.org
Owner: eseckler@chromium.org
๐Ÿ“ Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/13f687b0f80000

[viz] Ensure that ClientLTFS acknowledges racy BeginFrames.
By eseckler@chromium.org ยท Sun Sep 24 07:41:53 2017
chromium @ 1b697fc546af51e355c458bc4e10f821a1c8d551

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Hm, interesting. Funnily enough, that same patch is also regressing another related benchmark slightly, see  bug 771292 . I haven't had a chance yet to grab a trace though.
eseckler: Any update on this bug?
Status: Fixed (was: Assigned)
The patch in #19 has improved this benchmark (but regressed the one in  bug 771292 ). I don't think there's anything left to do for this bug.

Sign in to add a comment