Issue metadata
Sign in to add a comment
|
353.4% regression in rasterize_and_record_micro.top_25 at 514314:514418 |
||||||||||||||||||||||
Issue descriptionCycle time of the benchmark increases from 20 minutes to 100 minutes (a 80 minutes regression). This is too much. Walter: can you take a look at this?
,
Nov 12 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8963092629050850080
,
Nov 12 2017
,
Nov 13 2017
I am in office tomorrow and will look after bisect completes. Obvious suspect is the recent update to static snapshots. We may need to do the work to have telemetry handle un-archiving archives fetched from GCS. Where are you viewing the cycle time? I want to make sure I'm looking at the same thing you are.
,
Nov 13 2017
=== BISECT JOB RESULTS === Bisect was unable to run to completion Error: INFRA_FAILURE The bisect was able to narrow the range, you can try running with: good_revision: e659fbe5f6aadc4dcd655bf7e9f3768fa4d561c3 bad_revision : b5624e0de59bf01c17073f12b3a9f0ce9f83c87d If failures persist contact the team (see below) and report the error. Bisect Details Configuration: android_one_perf_bisect Benchmark : rasterize_and_record_micro.top_25 Metric : benchmark_duration/benchmark_duration Revision Result N chromium@514313 22.0163 +- 1.7245 21 good chromium@514340 22.0481 +- 0.717216 9 good chromium@514366 22.4394 +- 10.863 9 bad chromium@514418 26.513 +- 63.4267 14 bad To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests rasterize_and_record_micro.top_25 More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8963092629050850080 For feedback, file a bug with component Speed>Bisection
,
Nov 13 2017
Walter: I view the cycle time in https://chromeperf.appspot.com/report?sid=e6bf0ec942d2b5ffdaf2add09059a27129207da33f7f7c80600a2d99878503ac&rev=514418
,
Nov 14 2017
That's so crazy. How could even 2500 files make it take 80mins longer? Locally it didn't seem at all slow to load pages, so I am just guessing it's due to time to check out the add'l .sha1 and then download them from GCS? But would never have guessed +80 mins.
,
Nov 14 2017
I suppose I should ask -- does benchmark_duration include time to check out the repo and download from GCS?
,
Nov 14 2017
Walter: it doesn't include time to check out the repo. But it might include time to download those files from GCS as the logic is inside the benchmark running code at the moment (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=755a485cfb8a6e9010cdcf106933236aad114832&l=161)
,
Nov 14 2017
Do you know what the current parallelism of the download looks like thread-count-wise? The duration increase is so long, I wonder whether maybe it's using one thread and doing them all serially. Again I can investigate more tomorrow but you probably know or know immediately where to look.
,
Nov 14 2017
Oh, I think there is no parallelism at all. It's doing all the files fetching in serial
,
Nov 14 2017
Created reverts for the updated static snapshots for now so that we can look at archive or parallelism without urgency around current state: https://chromium-review.googlesource.com/c/chromium/src/+/769368 https://chromium-review.googlesource.com/c/chromium/src/+/769528
,
Nov 14 2017
Walter: for the next step, I think we can consider adding support for Telemetry's page/story to specify a dependency file (with dependency_manager). This system already supports unzipping file. The core lib is https://cs.chromium.org/chromium/src/third_party/catapult/dependency_manager/ The way to use it is to add a dep file like this: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/binary_dependencies.json With this dep file, one can fetch data from cloud storage with: config = dependency_manager.BaseConfig(<path to dep file>) manager = dependency_manager.DependencyManager(config) manager.PrefetchPaths(<target_platform>) It will automatically unzip the file if the json include "path_within_archive" (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/binary_dependencies.json?rcl=c44654bacfc5866bbafe00b7fac50f87c82a4103&l=61)
,
Nov 15 2017
Thanks for your revert, the cycle time is back down to normal range now. I might some time spend this Thursday & Friday prototyping switching Telemetry to use dependency_manager to manage the image files.
,
Nov 15 2017
OK, I'll check with you before I do more here. It seems like it's also worth looking at adding parallel download in any case. Maybe we should file two separate issues for these, (1) download/unpack zip archives, and (2) parallelize GCS download, and close this one. Defer to your thought.
,
Nov 15 2017
I think once we zip the file to significant size, there is not much time saving in parallelizing the download as the bandwidth is fixed. I might be proven wrong though. I filed issue 785397 for download/unpack zip archives
,
Nov 15 2017
,
Nov 15 2017
On parallelizing -- yes, I was thinking for the overall system there's still value. There are 237 .sha1 files under tools/perf outside of static_top_25. If each takes one second to round-trip negotiate and download (I made that up but it's not crazy) that's almost 4 mins which adds up when it's going on all day every day. Parallelizing to 25 threads would reduce to ~10 secs. I realize we probably don't always download all .sha1 files on all perf bots every time but it still seems a worthwhile improvement at some point. Lower priority than the unpacking archives for sure.
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Nov 12 2017