rendering.desktop/text_hover_50000_pixels_per_second fails on win-10-perf |
|||||
Issue description
Failure log:
Duplicate draw events: 5
at Object.addPipelineHistograms (/tracing/metrics/rendering/pipeline.html:127:19)
at new renderingMetric (/tracing/metrics/rendering/rendering_metric.html:90:26)
at runMetrics (c:\b\s\w\ir\third_party\catapult\tracing\tracing\metrics\metric_map_function.html:61:16)
at metricMapFunction (c:\b\s\w\ir\third_party\catapult\tracing\tracing\metrics\metric_map_function.html:190:24)
at Object.mapSingleTrace (/tracing/mre/map_single_trace.html:39:7)
at eval (c:\b\s\w\ir\third_party\catapult\tracing\tracing\mre\map_single_trace_cmdline.html:61:18)
at Object.runAndConvertErrorsToFailures (/tracing/mre/map_single_trace.html:24:10)
at mapSingleTraceWithResult (c:\b\s\w\ir\third_party\catapult\tracing\tracing\mre\map_single_trace_cmdline.html:52:12)
at Object.mapSingleTraceMain (c:\b\s\w\ir\third_party\catapult\tracing\tracing\mre\map_single_trace_cmdline.html:77:20)
at eval (c:\b\s\w\ir\third_party\catapult\tracing\tracing\mre\map_single_trace_cmdline.html:90:15)
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/1bf9ebde-ae3a-11e8-a992-d89ef387258f
Sadrul@: can you own this as well?
,
Sep 11
,
Sep 11
Sample trace from a failing run: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/text_hover_20000_pixels_per_second_2018-09-10_20-15-14_90819.html It looks like there are overlapping DrawAndSwap()s happening (see in VizCompositorThread in Gpu process): one DrawAndSwap() starts at 2381ms, and while it's waiting for the async-ack, it does another DrawAndSwap() at 2382ms. I seem to remember we avoid doing a DrawAndSwap() while still waiting for the ack for the previous DrawAndSwap(). +sunnyps@: fyi. Anything you can think of to investigate for this? I am continuing to look ...
,
Sep 11
,
Sep 11
Some initial findings: it looks like technically, there can be overlapping DrawAndSwap(), depending on |max_pending_swaps_|, which is determined by the |OutputSurface::Capabilities::max_frames_pending| (which defaults to 1 [0]). GLOutputSurfaceBufferQueue sets |max_frames_pending| to 2 [1], but it looks like that is used only on ozone or mac, so this shouldn't be affecting Windows. GLOutputSurface sets |max_frames_pending| to |gpu::Capabilities::num_surface_buffers - 1| [2]. |gpu::Capabilities::num_surface_buffers| defaults to 2 ([3, 4]), and is set to 3 only for chrome-cast when using triple-buffering [5]. So as far as I can tell, there should still be no overlapping DrawAndSwap() on Windows. [0] https://cs.chromium.org/chromium/src/components/viz/service/display/output_surface.h?l=47 [1] https://cs.chromium.org/chromium/src/components/viz/service/display_embedder/gl_output_surface_buffer_queue.cc?type=cs&sq=package:chromium&g=0&l=38 [2] https://cs.chromium.org/chromium/src/components/viz/service/display_embedder/gl_output_surface.cc?type=cs&sq=package:chromium&g=0&l=41 [3] https://cs.chromium.org/chromium/src/gpu/command_buffer/common/capabilities.h?l=128 [4] https://cs.chromium.org/chromium/src/ui/gl/gl_surface.cc?l=239 [5] https://cs.chromium.org/chromium/src/ui/ozone/platform/cast/gl_surface_cast.cc?l=130
,
Sep 11
/cc some more folks who might have ideas.
,
Sep 12
Hm, looks like there may be two browser windows submitting compositor-frames, which would also explain overlapping DrawAndSwap()s. For example, see "LayerTreeHost::DoUpdateLayers" event in CrBrowserMain at ~2376ms with source_frame_number = 83, followed by another DoUpdateLayers event at ~2381ms, with source_frame_number = 0, which likely means a new browser window (or some other toplevel widget with its own ui::Compositor) opened and is submitting frames. Hrm ... Is it possible to get screenshots at the time of failure?
,
Sep 12
Latest failure is in https://ci.chromium.org/buildbot/chromium.perf/win-10-perf/575 Sadly there are two reasons why we don't have screenshot here: 1) We don't support platform screenshots on windows yet, just the devtool screenshot 2) We currently don't take screenshot on metrics failure. Note that at this point, the browser under test is already closed.
,
Sep 12
The second window could be the URL popup when you hover the mouse over a link, that is a separate window with a separate ui::Compositor on Windows.
,
Sep 12
It could probably be a tooltip too :( I will put up a fix for the benchmark to ignore the second window for now, instead of fataling out. But perhaps we should consider ignoring native input events during the tests.
,
Sep 12
Shouldn't those async events have different IDs?
,
Sep 12
Here: https://cs.chromium.org/chromium/src/components/viz/service/display/display.cc?l=288 Instead of increasing swapped_trace_id_ and using it later in trace events, shouldn't we do the following? int local_swapped_trace_id = GetNextSwappedTraceId(); // <- increments swapped_trace_id_ and returns it. // later traces in DrawAndSwap() TRACE_EVENT...(..., local_swapped_trace_id);
,
Sep 12
Do you mean |swapped_trace_id_| becomes a static variable, so that multiple instances of Display doesn't end up re-using the ids? That becomes trickier with |last_acked_trace_id_| etc.
,
Sep 12
Ah, I thought it's a reentrancy issue. If the issue is that we have multiple display instances, then can we have a static variable that increases only when you create an instance of Display, to build a display number, and then scope trace event IDs with that display number so that they are not mixed with events from other displays? Or, a little bit easier but hackier way is to scope IDs by the memory address of the display, maybe?
,
Sep 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1647a8ece33b4b8bba5d2d59ed3b3c8f0d1da0da commit 1647a8ece33b4b8bba5d2d59ed3b3c8f0d1da0da Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Wed Sep 12 20:27:41 2018 viz: Assign Display-specific trace ids. Assign each Display a separate starting point for the trace ids, so that the trace-ids do not all start from zero, which makes it difficult to associate the trace-events with the Display intances. BUG= 879884 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: I79146409a5acff362c348fb32493a247f6736f72 Reviewed-on: https://chromium-review.googlesource.com/1222586 Reviewed-by: Fady Samuel <fsamuel@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#590805} [modify] https://crrev.com/1647a8ece33b4b8bba5d2d59ed3b3c8f0d1da0da/components/viz/service/display/display.cc [modify] https://crrev.com/1647a8ece33b4b8bba5d2d59ed3b3c8f0d1da0da/components/viz/service/display/display.h
,
Sep 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c741eb8100cf931d3c4a6ffec135dcbc1f42143e commit c741eb8100cf931d3c4a6ffec135dcbc1f42143e Author: Sadrul Habib Chowdhury <sadrul@chromium.org> Date: Fri Sep 14 01:52:43 2018 viz: Fix trace-id for canceled draw events. It is possible that while waiting for the ack of a previous draw, another draw is attempted and aborted. In such cases, the trace-event for the first draw would incorrectly be terminated. Fix that bug by fixing the trace-id for such canceled draw events. Also, instrument such events so that telemetry tests can identify such events. Also, fix the trace-id to be int64_t in SurfaceAggregator, which was mistakenly left unchanged in crrev.com/590805 BUG=883592, 879884 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel Change-Id: Id39bf22559bd9f02e2765e69cc32a8cb1e9c62a2 Reviewed-on: https://chromium-review.googlesource.com/1223326 Reviewed-by: Ehsan Chiniforooshan <chiniforooshan@chromium.org> Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#591259} [modify] https://crrev.com/c741eb8100cf931d3c4a6ffec135dcbc1f42143e/components/viz/service/display/display.cc [modify] https://crrev.com/c741eb8100cf931d3c4a6ffec135dcbc1f42143e/components/viz/service/display/surface_aggregator.cc [modify] https://crrev.com/c741eb8100cf931d3c4a6ffec135dcbc1f42143e/components/viz/service/display/surface_aggregator.h
,
Sep 17
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by nednguyen@chromium.org
, Sep 6