Optimizing perf dashboard upload to be faster |
||||||||||
Issue descriptionThe current algorithm to deduplicate diagnostics bins diagnostics into lists of candidates by name and then compares candidates exhaustively to one another. We should be able to switch to something hash-based to eliminate this quadratic behavior.
,
Jun 28 2018
We'll keep the quadratic slow path for diagnostics that happen to be more difficult to hash (and that happen to be less common anyway), but introduce a hashing-based fast path for GenericSet and DateRange, the two most common types of shared diagnostics we have.
,
Jun 29 2018
The change Emily did to parallelize the upload didn't help much with improve the runtime (got 1hour upload down to ~40 minutes). So I think this is our best bet.
,
Jun 29 2018
Ethan: if you can compare, you can hash?
,
Jun 29 2018
,
Jun 29 2018
Ethan: resolving this bug is very critical to us. If you're busy, feel free to let me & Emily knows and we can take over this
,
Jun 29 2018
,
Jul 11
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a commit 9e83c9e3c446d4a06910ace4d06b7d70378d4a1a Author: Ethan Kuefner <eakuefner@chromium.org> Date: Wed Jul 11 00:08:48 2018 [TBMv2] Implement deduplication fast path for GenericSet Currently, diagnostic deduplication is a quadratic algorithm, but we can make it linear by comparing diagnostics by key instead of iterating to find the one that a given candidate `.equals()`. Bug: chromium:857283 Change-Id: I3e8d0a8b151e07eb94a15fbe9a8e73a54994ff21 Reviewed-on: https://chromium-review.googlesource.com/1122963 Commit-Queue: Ethan Kuefner <eakuefner@chromium.org> Reviewed-by: Simon Hatch <simonhatch@chromium.org> [modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/generic_set_test.html [modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/generic_set.html [modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/diagnostic_map_test.html [modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/histogram_set.html [modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/tag_map_test.html
,
Jul 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35 commit a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35 Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Wed Jul 11 04:40:20 2018 Roll src/third_party/catapult 1230eb961dc4..10041796ae14 (4 commits) https://chromium.googlesource.com/catapult.git/+log/1230eb961dc4..10041796ae14 git log 1230eb961dc4..10041796ae14 --date=short --no-merges --format='%ad %ae %s' 2018-07-11 cphlipot0@gmail.com [ftrace importer] Use tid instead of pid for kernel threads 2018-07-11 cphlipot0@gmail.com [ftrace importer] Fix recursive binder reply arguments 2018-07-11 eakuefner@chromium.org [TBMv2] Implement deduplication fast path for GenericSet 2018-07-10 dtu@chromium.org [pinpoint] Add stubs for Commit and Change. Created with: gclient setdep -r src/third_party/catapult@10041796ae14 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:857283 TBR=sullivan@chromium.org Change-Id: Ieb82b740e4d4b458f5bc5a80efb511d22ff6d36f Reviewed-on: https://chromium-review.googlesource.com/1132444 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#574074} [modify] https://crrev.com/a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35/DEPS
,
Jul 12
The roll has landed in https://ci.chromium.org/buildbot/chromium.perf/mac-10_13_laptop_high_end-perf/341 However it doesn't seem to improve the perf dashboard uploading latency much :-/ With this change: Duration of Total process_perf_results: 1583 seconds (https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F341%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Flogs%2FMerge_script_log%2F0) Without this change: Duration of Total process_perf_results: 1725 seconds (https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F337%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Flogs%2FMerge_script_log%2F0) Ethan, the integration test is now landed. To test the impact of latency improvement, you can just run: ./tools/perf/process_perf_results_unittest.py ... . ---------------------------------------------------------------------- Ran 1 test in 39.580s OK
,
Jul 16
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bde731479ea836fc5aac72c0f862af28e4db3a05 commit bde731479ea836fc5aac72c0f862af28e4db3a05 Author: Ethan Kuefner <eakuefner@chromium.org> Date: Mon Jul 16 21:27:36 2018 [Perf] Add logging for making HistogramSet performance Let's see how much time making the HistogramSet (which includes the call to add_reserved_diagnostics) takes. TBR=eyaich Bug: 857283 Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi Change-Id: I0f153df2653612fabe627e70e88fcee1c88f9f20 Reviewed-on: https://chromium-review.googlesource.com/1138730 Commit-Queue: Ethan Kuefner <eakuefner@chromium.org> Reviewed-by: Simon Hatch <simonhatch@chromium.org> Reviewed-by: Ethan Kuefner <eakuefner@chromium.org> Cr-Commit-Position: refs/heads/master@{#575426} [modify] https://crrev.com/bde731479ea836fc5aac72c0f862af28e4db3a05/tools/perf/core/upload_results_to_perf_dashboard.py
,
Jul 17
re: #c10 So poking through the logging statements Ethan added, it does look like add_reserved_diagnostics still takes up a large portion of the time. Most likely much smaller than before, but we should investigate to see why it didn't have more impact. Secondly, that script ./tools/perf/process_perf_results_unittest.py mocks out the upload time, making it essentially a wrapper around add_reserved_diagnostics. It will be great as a quick tool to test that, but isn't great as an overall profiling tool. Thirdly, from poking around the implementation of the parallelization, I think there's a lot of low hanging fruit there still. From what I see, each step is parallelized, but runs in sync. ie. if a step on 1 core consists of a 10 second process, 10 second upload, you should just async off the upload and free up the thread to process the next dataset rather than waiting around the full 20s. I believe the suggestion I made was to have 1 pool spread across N cpu's that processed diagnostics, and then async off each upload.
,
Jul 17
#12: I recall the upload time are actually very fast, hence mocking it out shouldn't affect much. To make sure, I will make another logging in results_dasbhoard.SendResults to make sure
,
Jul 17
,
Jul 17
,
Jul 17
Yeah I think they're still faster than the processing, but I'd guess they're minimally between a couple seconds and nearly a minute, depending on the size of the dataset. The point of my suggestion in #c12 is you may be able to reclaim that time entirely with an architectural change and be entirely processing bound.
,
Jul 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f1717d24eb02cbe374d01ef29dff42199bd636de commit f1717d24eb02cbe374d01ef29dff42199bd636de Author: nednguyen <nednguyen@google.com> Date: Tue Jul 17 20:19:57 2018 Add logging to measure time spent sending results to perf dashboard server NOTRY=true # Flake Bug: 857283 Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi Change-Id: I28b99bebbd705045c2f62a74d1324af480483309 Reviewed-on: https://chromium-review.googlesource.com/1140688 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Simon Hatch <simonhatch@chromium.org> Cr-Commit-Position: refs/heads/master@{#575755} [modify] https://crrev.com/f1717d24eb02cbe374d01ef29dff42199bd636de/tools/perf/core/results_dashboard.py
,
Jul 18
With the new log, we found that the step of uploading to the perf dashboard is indeed slow for some test. e.g: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Flinux-perf%2F542%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_NVIDIA_GPU_on_Linux%2F0%2Flogs%2FMerge_script_log%2F0 Time spent sending results to https://chromeperf.appspot.com: 44.4346148968 Duration of system_health.memory_desktop upload time: 214 seconds
,
Jul 18
That jives with the logging statements on the backend, which show the time taken for each /add_histograms call. I think with some minor changes to the script, you can reclaim most, if not all, of the upload time, especially if you re-order the uploads so that the heaviest ones go first.
,
Jul 18
Interesting tip. I will try to reorder the upload based on the size of the data. Thanks Simon!
,
Aug 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0 commit a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0 Author: Ned Nguyen <nednguyen@google.com> Date: Fri Aug 10 00:55:07 2018 Maximize the number of parallel processes for processing & uploading perf data A big portion of perf data uploading latency is due to I/O operation (uploading), hence maximize the number of parallel processes to improve the wall-time. Bug:857283 Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 TBR=simonhatch@chromium.org Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 NOTRY=true # android_arm64_dbg_recipe flake Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 Reviewed-on: https://chromium-review.googlesource.com/1170083 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ethan Kuefner <eakuefner@chromium.org> Cr-Commit-Position: refs/heads/master@{#581991} [modify] https://crrev.com/a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0/tools/perf/process_perf_results.py
,
Aug 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1018bbc64430009d70edff683a55633207dc097f commit 1018bbc64430009d70edff683a55633207dc097f Author: Ned Nguyen <nednguyen@google.com> Date: Fri Aug 10 11:03:36 2018 Revert "Maximize the number of parallel processes for processing & uploading perf data" This reverts commit a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0. Reason for revert: suspect this causes dashboard internal error. BUG: chromium:873102 Original change's description: > Maximize the number of parallel processes for processing & uploading perf data > > A big portion of perf data uploading latency is due to I/O operation (uploading), > hence maximize the number of parallel processes to improve the wall-time. > > Bug:857283 > Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 > > TBR=simonhatch@chromium.org > > Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 > > NOTRY=true # android_arm64_dbg_recipe flake > > Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0 > Reviewed-on: https://chromium-review.googlesource.com/1170083 > Commit-Queue: Ned Nguyen <nednguyen@google.com> > Reviewed-by: Ned Nguyen <nednguyen@google.com> > Reviewed-by: Ethan Kuefner <eakuefner@chromium.org> > Cr-Commit-Position: refs/heads/master@{#581991} TBR=simonhatch@chromium.org,eakuefner@chromium.org,nednguyen@google.com Change-Id: I02d1e84a065bd9c315ec2242fc6a8a0bbd6593a1 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 857283 Reviewed-on: https://chromium-review.googlesource.com/1169800 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#582113} [modify] https://crrev.com/1018bbc64430009d70edff683a55633207dc097f/tools/perf/process_perf_results.py
,
Aug 30
,
Sep 6
John: it looks like you are working on this, so I update you as owner of this bug. If we can get this down to under 5 minutes, we won't need to get high spec highware (issue 880569)
,
Nov 12
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by nednguyen@chromium.org
, Jun 28 2018