New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 857283 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 864621
issue 879159

Blocking:
issue 713345



Sign in to add a comment

Optimizing perf dashboard upload to be faster

Project Member Reported by eakuefner@chromium.org, Jun 28 2018

Issue description

The current algorithm to deduplicate diagnostics bins diagnostics into lists of candidates by name and then compares candidates exhaustively to one another. We should be able to switch to something hash-based to eliminate this quadratic behavior.
 
Blocking: 713345
We'll keep the quadratic slow path for diagnostics that happen to be more difficult to hash (and that happen to be less common anyway), but introduce a hashing-based fast path for GenericSet and DateRange, the two most common types of shared diagnostics we have.
Cc: eyaich@chromium.org
Components: Speed>Benchmarks>Waterfall
The change Emily did to parallelize the upload didn't help much with improve the runtime (got 1hour upload down to ~40 minutes). So I think this is our best bet.
Ethan: if you can compare, you can hash?
Labels: -Pri-2 Pri-1
Ethan: resolving this bug is very critical to us. If you're busy, feel free to let me & Emily knows and we can take over this
Status: Started (was: Assigned)
Project Member

Comment 8 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a

commit 9e83c9e3c446d4a06910ace4d06b7d70378d4a1a
Author: Ethan Kuefner <eakuefner@chromium.org>
Date: Wed Jul 11 00:08:48 2018

[TBMv2] Implement deduplication fast path for GenericSet

Currently, diagnostic deduplication is a quadratic algorithm, but we can make
it linear by comparing diagnostics by key instead of iterating to find the one
that a given candidate `.equals()`.

Bug: chromium:857283
Change-Id: I3e8d0a8b151e07eb94a15fbe9a8e73a54994ff21
Reviewed-on: https://chromium-review.googlesource.com/1122963
Commit-Queue: Ethan Kuefner <eakuefner@chromium.org>
Reviewed-by: Simon Hatch <simonhatch@chromium.org>

[modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/generic_set_test.html
[modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/generic_set.html
[modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/diagnostic_map_test.html
[modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/histogram_set.html
[modify] https://crrev.com/9e83c9e3c446d4a06910ace4d06b7d70378d4a1a/tracing/tracing/value/diagnostics/tag_map_test.html

Project Member

Comment 9 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35

commit a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Wed Jul 11 04:40:20 2018

Roll src/third_party/catapult 1230eb961dc4..10041796ae14 (4 commits)

https://chromium.googlesource.com/catapult.git/+log/1230eb961dc4..10041796ae14


git log 1230eb961dc4..10041796ae14 --date=short --no-merges --format='%ad %ae %s'
2018-07-11 cphlipot0@gmail.com [ftrace importer] Use tid instead of pid for kernel threads
2018-07-11 cphlipot0@gmail.com [ftrace importer] Fix recursive binder reply arguments
2018-07-11 eakuefner@chromium.org [TBMv2] Implement deduplication fast path for GenericSet
2018-07-10 dtu@chromium.org [pinpoint] Add stubs for Commit and Change.


Created with:
  gclient setdep -r src/third_party/catapult@10041796ae14

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:857283
TBR=sullivan@chromium.org

Change-Id: Ieb82b740e4d4b458f5bc5a80efb511d22ff6d36f
Reviewed-on: https://chromium-review.googlesource.com/1132444
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#574074}
[modify] https://crrev.com/a450d72dfbda13ed53aa5ca1dfe78a41bc4bcd35/DEPS

The roll has landed in https://ci.chromium.org/buildbot/chromium.perf/mac-10_13_laptop_high_end-perf/341

However it doesn't seem to improve the perf dashboard uploading latency much :-/

With this change:
Duration of Total process_perf_results: 1583 seconds
(https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F341%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Flogs%2FMerge_script_log%2F0)

Without this change:
Duration of Total process_perf_results: 1725 seconds
(https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F337%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Flogs%2FMerge_script_log%2F0)


Ethan, the integration test is now landed. To test the impact of latency improvement, you can just run:

./tools/perf/process_perf_results_unittest.py
...
.
----------------------------------------------------------------------
Ran 1 test in 39.580s

OK


Project Member

Comment 11 by bugdroid1@chromium.org, Jul 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bde731479ea836fc5aac72c0f862af28e4db3a05

commit bde731479ea836fc5aac72c0f862af28e4db3a05
Author: Ethan Kuefner <eakuefner@chromium.org>
Date: Mon Jul 16 21:27:36 2018

[Perf] Add logging for making HistogramSet performance

Let's see how much time making the HistogramSet (which includes the call
to add_reserved_diagnostics) takes.

TBR=eyaich

Bug: 857283
Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi
Change-Id: I0f153df2653612fabe627e70e88fcee1c88f9f20
Reviewed-on: https://chromium-review.googlesource.com/1138730
Commit-Queue: Ethan Kuefner <eakuefner@chromium.org>
Reviewed-by: Simon Hatch <simonhatch@chromium.org>
Reviewed-by: Ethan Kuefner <eakuefner@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575426}
[modify] https://crrev.com/bde731479ea836fc5aac72c0f862af28e4db3a05/tools/perf/core/upload_results_to_perf_dashboard.py

Cc: nednguyen@chromium.org
re: #c10

So poking through the logging statements Ethan added, it does look like add_reserved_diagnostics still takes up a large portion of the time. Most likely much smaller than before, but we should investigate to see why it didn't have more impact.

Secondly, that script ./tools/perf/process_perf_results_unittest.py mocks out the upload time, making it essentially a wrapper around add_reserved_diagnostics. It will be great as a quick tool to test that, but isn't great as an overall profiling tool.

Thirdly, from poking around the implementation of the parallelization, I think there's a lot of low hanging fruit there still. From what I see, each step is parallelized, but runs in sync. ie. if a step on 1 core consists of a 10 second process, 10 second upload, you should just async off the upload and free up the thread to process the next dataset rather than waiting around the full 20s.

I believe the suggestion I made was to have 1 pool spread across N cpu's that processed diagnostics, and then async off each upload.
#12: I recall the upload time are actually very fast, hence mocking it out shouldn't affect much. To make sure, I will make another logging in results_dasbhoard.SendResults to make sure
Summary: Optimizing perf dashboard upload to be faster (was: deduplicateDiagnostics is quadratic)
Blocking: 864621
Yeah I think they're still faster than the processing, but I'd guess they're minimally between a couple seconds and nearly a minute, depending on the size of the dataset. The point of my suggestion in #c12 is you may be able to reclaim that time entirely with an architectural change and be entirely processing bound.
Blockedon: 864621
Blocking: -864621
Project Member

Comment 18 by bugdroid1@chromium.org, Jul 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f1717d24eb02cbe374d01ef29dff42199bd636de

commit f1717d24eb02cbe374d01ef29dff42199bd636de
Author: nednguyen <nednguyen@google.com>
Date: Tue Jul 17 20:19:57 2018

Add logging to measure time spent sending results to perf dashboard server

NOTRY=true  # Flake

Bug: 857283
Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi
Change-Id: I28b99bebbd705045c2f62a74d1324af480483309
Reviewed-on: https://chromium-review.googlesource.com/1140688
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Simon Hatch <simonhatch@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575755}
[modify] https://crrev.com/f1717d24eb02cbe374d01ef29dff42199bd636de/tools/perf/core/results_dashboard.py

With the new log, we found that the step of uploading to the perf dashboard is indeed slow for some test.

e.g: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Flinux-perf%2F542%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_NVIDIA_GPU_on_Linux%2F0%2Flogs%2FMerge_script_log%2F0

Time spent sending results to https://chromeperf.appspot.com: 44.4346148968
Duration of system_health.memory_desktop upload time: 214 seconds

That jives with the logging statements on the backend, which show the time taken for each /add_histograms call.

I think with some minor changes to the script, you can reclaim most, if not all, of the upload time, especially if you re-order the uploads so that the heaviest ones go first.
Interesting tip. I will try to reorder the upload based on the size of the data. Thanks Simon!
Project Member

Comment 22 by bugdroid1@chromium.org, Aug 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0

commit a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0
Author: Ned Nguyen <nednguyen@google.com>
Date: Fri Aug 10 00:55:07 2018

Maximize the number of parallel processes for processing & uploading perf data

A big portion of perf data uploading latency is due to I/O operation (uploading),
hence maximize the number of parallel processes to improve the wall-time.

Bug:857283
Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0

TBR=simonhatch@chromium.org

Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0

NOTRY=true  # android_arm64_dbg_recipe flake

Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0
Reviewed-on: https://chromium-review.googlesource.com/1170083
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Ethan Kuefner <eakuefner@chromium.org>
Cr-Commit-Position: refs/heads/master@{#581991}
[modify] https://crrev.com/a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0/tools/perf/process_perf_results.py

Project Member

Comment 23 by bugdroid1@chromium.org, Aug 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1018bbc64430009d70edff683a55633207dc097f

commit 1018bbc64430009d70edff683a55633207dc097f
Author: Ned Nguyen <nednguyen@google.com>
Date: Fri Aug 10 11:03:36 2018

Revert "Maximize the number of parallel processes for processing & uploading perf data"

This reverts commit a92b95bb53e21fc4874ab3b8b1833fe3b990e3b0.

Reason for revert: suspect this causes dashboard internal error. 
BUG: chromium:873102 

Original change's description:
> Maximize the number of parallel processes for processing & uploading perf data
> 
> A big portion of perf data uploading latency is due to I/O operation (uploading),
> hence maximize the number of parallel processes to improve the wall-time.
> 
> Bug:857283
> Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0
> 
> TBR=simonhatch@chromium.org
> 
> Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0
> 
> NOTRY=true  # android_arm64_dbg_recipe flake
> 
> Change-Id: I0c3246849676a56801fe99846ade7265d13d94c0
> Reviewed-on: https://chromium-review.googlesource.com/1170083
> Commit-Queue: Ned Nguyen <nednguyen@google.com>
> Reviewed-by: Ned Nguyen <nednguyen@google.com>
> Reviewed-by: Ethan Kuefner <eakuefner@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#581991}

TBR=simonhatch@chromium.org,eakuefner@chromium.org,nednguyen@google.com

Change-Id: I02d1e84a065bd9c315ec2242fc6a8a0bbd6593a1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: 857283
Reviewed-on: https://chromium-review.googlesource.com/1169800
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#582113}
[modify] https://crrev.com/1018bbc64430009d70edff683a55633207dc097f/tools/perf/process_perf_results.py

Blockedon: 879159
Cc: eakuefner@chromium.org simonhatch@chromium.org
Owner: jbudorick@chromium.org
John: it looks like you are working on this, so I update you as owner of this bug. If we can get this down to under 5 minutes, we won't need to get high spec highware (issue 880569)
Cc: -eakuefner@chromium.org

Sign in to add a comment