Perfetto: Support flushing or recovering data from worker threads |
||||||||||||
Issue descriptionWe would currently have to flush worker threads (i.e. commit their SMB chunks) after each trace event to ensure that their data isn't lost (perfetto only provides committed SMB chunks to consumers), but this is too inefficient to do. We can't simply flush once stopping, because we can't post messages to the individual worker threads, as they don't have a MessageLoop (and thus they are not part of TraceLog's thread_task_runners_ [1]). This includes TaskScheduler workers as well as Compositor workers (content::CategorizedWorkerPoolThread), and other non-MessageLoop threads. Can we either a) support recovering this uncommitted data on perfetto service side or b) support a generic way to flush all these threads on demand? [1] https://cs.chromium.org/chromium/src/base/trace_event/trace_log.cc?type=cs&sq=package:chromium&g=0&l=417
,
Nov 20
,
Nov 20
,
Nov 20
After some discussion with hjd@ and skyostil@ it sounds like a perfetto service-side solution would be most suitable to this problem. The service would have to scrape the remaining uncommitted chunks out of the shared memory buffers once tracing is disabled (i.e. after all data sources have acked stop).
,
Nov 20
,
Nov 21
,
Nov 27
,
Dec 4
,
Dec 11
Any update on this? This is causing redness on the waterfall because we're not writing clock sync markers that are necessary to import and sync the subtraces and compute trace-based metrics for Telemetry.
,
Dec 11
We're actively working on scraping buffers from worker threads in the perfetto service. It's technically complex (we're basically want to steal data in shared memory from untrusted producers, so we need to be careful...), so likely will stretch into Q1. You can follow b/73828976 which tracks the work in perfetto. Dunno if there's a work around for the clock sync marker issue in the meantime. How are they emitted and by whom? Oystein, do you have any suggestions?
,
Dec 11
If clock syncs don't happen super frequently (do they?) maybe we could force a flush on the recording thread immediately after?
,
Dec 11
What's the current impact of this? Looking at the https://ci.chromium.org/buildbot/internal.client.clank/integration-android-low-end-phone/ waterfall for example, the flakiness is super-rare: One run: PASSED ] 1199 tests. [ FAILED ] 1 test, listed below: [ FAILED ] memory.long_running_dual_browser_test/http_search_yahoo_com_search__ylt_p_google@{'phase': 'on_chrome'} 1 FAILED TEST Another run: [ PASSED ] 1199 tests. [ FAILED ] 1 test, listed below: [ FAILED ] memory.long_running_dual_browser_test/http_en_m_wikipedia_org_wiki_Science@{'phase': 'on_chrome'} 1 FAILED TEST i.e. one random flake out of 1200 memory.long_running_dual_browser_test tests each run. The tree redness is annoying and causes extra sheriff attention drain in the meantime, of course, so I agree a quick fix (if we can find one) would be nice. Random idea: Can we just emit the clocksync markers twice? One at the beginning of the trace, and one at the end (like we currently do)?
,
Dec 12
,
Dec 17
,
Dec 17
#12: Looks like we currently emit at the beginning which is biting us here. See issue 900935#c24.
,
Jan 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0f6182210893904e32d5d623de12cf0f2a2912b0 commit 0f6182210893904e32d5d623de12cf0f2a2912b0 Author: Eric Seckler <eseckler@chromium.org> Date: Wed Jan 09 11:10:30 2019 perfetto: Enable SMB scraping Enable scraping of shared memory buffers used by the perfetto tracing service. This should help reduce missing data in traces. Because of that, this change is expected to cause some changes in perf waterfall metrics. Bug: 907069 Change-Id: I3a4592ffa9d5113885d5859d87a2f5708255ee26 Reviewed-on: https://chromium-review.googlesource.com/c/1402462 Commit-Queue: Eric Seckler <eseckler@chromium.org> Reviewed-by: Primiano Tucci <primiano@chromium.org> Cr-Commit-Position: refs/heads/master@{#621112} [modify] https://crrev.com/0f6182210893904e32d5d623de12cf0f2a2912b0/services/tracing/perfetto/perfetto_service.cc
,
Jan 10
I'm gonna optimistically close this. Let's see how many regressions we'll see :D
,
Jan 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3359b08464e431c49494e8678125b1063d1bbcc5 commit 3359b08464e431c49494e8678125b1063d1bbcc5 Author: Eric Seckler <eseckler@chromium.org> Date: Thu Jan 10 17:06:59 2019 Revert "perfetto: Enable SMB scraping" This reverts commit 0f6182210893904e32d5d623de12cf0f2a2912b0. Reason for revert: No effect yet because trace writer registration is still missing in ProducerClient. Will reland once that's fixed, so that any performance regressions are attributed correctly. Original change's description: > perfetto: Enable SMB scraping > > Enable scraping of shared memory buffers used by the perfetto tracing > service. This should help reduce missing data in traces. Because of > that, this change is expected to cause some changes in perf waterfall > metrics. > > Bug: 907069 > Change-Id: I3a4592ffa9d5113885d5859d87a2f5708255ee26 > Reviewed-on: https://chromium-review.googlesource.com/c/1402462 > Commit-Queue: Eric Seckler <eseckler@chromium.org> > Reviewed-by: Primiano Tucci <primiano@chromium.org> > Cr-Commit-Position: refs/heads/master@{#621112} TBR=primiano@chromium.org,eseckler@chromium.org # Skipping presubmit as this is a safe change without any effect. No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 907069 Change-Id: Id3a5a3fe8864caa2e95cad7051d2faef0cd343f5 Reviewed-on: https://chromium-review.googlesource.com/c/1405309 Commit-Queue: Eric Seckler <eseckler@chromium.org> Reviewed-by: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#621615} [modify] https://crrev.com/3359b08464e431c49494e8678125b1063d1bbcc5/services/tracing/perfetto/perfetto_service.cc
,
Jan 10
,
Jan 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/555e6b16175aabd036ec7d9780c730ae91f9031c commit 555e6b16175aabd036ec7d9780c730ae91f9031c Author: Eric Seckler <eseckler@chromium.org> Date: Thu Jan 10 19:13:57 2019 perfetto: Add support for registering TraceWriters to ProducerHost This is necessary to support SMB scraping. Bug: 907069 Change-Id: I6368dfa10566e1331948d35a4807ff2475e11560 Reviewed-on: https://chromium-review.googlesource.com/c/1405330 Reviewed-by: Primiano Tucci <primiano@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Commit-Queue: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#621685} [modify] https://crrev.com/555e6b16175aabd036ec7d9780c730ae91f9031c/services/tracing/perfetto/producer_host.cc [modify] https://crrev.com/555e6b16175aabd036ec7d9780c730ae91f9031c/services/tracing/perfetto/producer_host.h [modify] https://crrev.com/555e6b16175aabd036ec7d9780c730ae91f9031c/services/tracing/public/cpp/perfetto/producer_client.cc [modify] https://crrev.com/555e6b16175aabd036ec7d9780c730ae91f9031c/services/tracing/public/mojom/perfetto_service.mojom
,
Jan 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/884929d17a33fadadde7c772e08128e9191c56b6 commit 884929d17a33fadadde7c772e08128e9191c56b6 Author: Eric Seckler <eseckler@chromium.org> Date: Thu Jan 10 20:15:30 2019 Reland "perfetto: Enable SMB scraping" This is a reland of 0f6182210893904e32d5d623de12cf0f2a2912b0 Now that the TraceWriters are actually registered, this will actually have some effect :) Original change's description: > perfetto: Enable SMB scraping > > Enable scraping of shared memory buffers used by the perfetto tracing > service. This should help reduce missing data in traces. Because of > that, this change is expected to cause some changes in perf waterfall > metrics. > > Bug: 907069 > Change-Id: I3a4592ffa9d5113885d5859d87a2f5708255ee26 > Reviewed-on: https://chromium-review.googlesource.com/c/1402462 > Commit-Queue: Eric Seckler <eseckler@chromium.org> > Reviewed-by: Primiano Tucci <primiano@chromium.org> > Cr-Commit-Position: refs/heads/master@{#621112} TBR=primiano@chromium.org Bug: 907069 Change-Id: I63c890671ffe12a0d466a7b8667dd81c685b2036 Reviewed-on: https://chromium-review.googlesource.com/c/1405511 Reviewed-by: Eric Seckler <eseckler@chromium.org> Commit-Queue: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#621722} [modify] https://crrev.com/884929d17a33fadadde7c772e08128e9191c56b6/services/tracing/perfetto/perfetto_service.cc
,
Jan 11
This seems to work now. For example, this trace [1] shows data from two worker threads in the last few ms before the trace is completed. Whereas a trace before the change [2] shows that the last few ms from the TaskSchedulerSingleThreadBackgroundBlocking1 task runner are missing. [1] https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/balls_svg_animations_2019-01-11_03-29-51_19152.html [2] https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/balls_svg_animations_2019-01-10_21-56-24_40113.html |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by eseckler@chromium.org
, Nov 20