Consider multithreading TBM measurement |
||||||
Issue descriptionRunning smoothness.top_25_smooth with JSON output takes 6.5 minutes to run on my MBP. By removing the part of TimelineBasedMeasurement.Measure() which processes trace results, this drops to 4.5 minutes, a 30% saving. Should we consider threading the result processing? This runs after tracing is complete and the browser can shut down. It seems this step could run in parallel with other page tests. Included steps: - Serializing a TraceResult to a trace HTML file. - Adding the trace file path to PageTestResults. - Computing and adding timeline based metrics to PageTestResults. These steps may also get slower with TBMv2, as that involves loading and running JS measurements on the trace file.
,
Apr 5 2017
Some initial timings on my MBP before & after patch in #1. Looks promising. Some of the stories run solo so there isn't much to parallelize and they would benefit from being more grouped. ╔═════════════════════════════════════════════╦═════════╦═════════╦════════╗ ║ Benchmark ║ Before ║ After ║ Diff ║ ╠═════════════════════════════════════════════╬═════════╬═════════╬════════╣ ║ smoothness.desktop_tough_pinch_zoom_cases ║ 0:04:59 ║ 0:03:25 ║ 31.44% ║ ║ smoothness.gpu...tough_path_rendering_cases ║ 0:01:45 ║ 0:01:12 ║ 31.43% ║ ║ smoothness.gpu...image_decoding_cases ║ 0:00:17 ║ 0:00:16 ║ 5.88% ║ ║ smoothness.image_decoding_cases ║ 0:00:17 ║ 0:00:16 ║ 5.88% ║ ║ smoothness.maps ║ 0:00:21 ║ 0:00:20 ║ 4.76% ║ ║ smoothness.top_25_smooth ║ 0:06:30 ║ 0:04:32 ║ 30.26% ║ ║ smoothness.tough_ad_cases ║ 0:04:15 ║ 0:02:57 ║ 30.59% ║ ║ smoothness.tough_canvas_cases ║ 0:08:22 ║ 0:04:53 ║ 41.63% ║ ║ smoothness.tough_filters_cases ║ 0:01:40 ║ 0:01:08 ║ 32.00% ║ ║ smoothness.tough_image_decode_cases ║ 0:01:16 ║ 0:00:45 ║ 40.79% ║ ║ smoothness.tough_path_rendering_cases ║ 0:01:32 ║ 0:01:11 ║ 22.83% ║ ║ smoothness.tough_scrolling_cases ║ 0:07:16 ║ 0:04:06 ║ 43.58% ║ ║ smoothness.tough_texture_upload_cases ║ 0:01:36 ║ 0:01:10 ║ 27.08% ║ ║ smoothness.tough_webgl_ad_cases ║ 0:03:44 ║ 0:02:52 ║ 23.21% ║ ║ smoothness.tough_webgl_cases ║ 0:02:26 ║ 0:01:47 ║ 26.71% ║ ╠═════════════════════════════════════════════╬═════════╬═════════╬════════╣ ║ Total ║ 0:46:16 ║ 0:30:50 ║ 33.36% ║ ╚═════════════════════════════════════════════╩═════════╩═════════╩════════╝
,
Apr 5 2017
I wonder whether this is partly because smoothness is not yet in TBMv2 yet. The trace process code in JS (run by d8 engine) is much much faster than the python version. We probably should try to compare the import time in python vs d8 to see how big the differences are. Threading the result computation could be an option, though we have been trying hard to keep Telemetry single threaded for simplicity.
,
Apr 5 2017
> I wonder whether this is partly because smoothness is not yet in TBMv2 yet. > The trace process code in JS (run by d8 engine) is much much faster than the python version. Most of the time seems to be in serialization of the trace to HTML which we also do for TBMv2. IIUC TBMv2 loads the JS from the HTML file, while TBMv1 runs on already parsed trace. TBMv1 flow seems like it would be faster but I haven't checked.
,
Apr 5 2017
TBMv1 doesn't the run on the parsed trace, it also needs to convert the trace string received from browser to the object form with "json.loads(...)". On another, the default json parsing implementation in Python is much slower & consumes a lot memory compared with v8.
,
Apr 5 2017
> TBMv1 doesn't the run on the parsed trace, it also needs to convert the trace string received from browser to the object form I see, we need to build the timeline model. Computing TBMv1 is a small part overall. Saving the trace to HTML is the largest part, which is used by TMBv2. Do we have some data on how fast loading the trace into v8 for TBMv2 is? ╔══════════════════════════════════════════════════════╦══════════╦════════╗ ║ Step ║ Time (s) ║ % ║ ╠══════════════════════════════════════════════════════╬══════════╬════════╣ ║ trace_result (receive trace_data from browser) ║ 41.2 ║ 22.92% ║ ║ trace_value (save trace_data to HTML) ║ 122.8 ║ 68.28% ║ ║ compute_tbm_metrics (parse model + compute metrics) ║ 15.8 ║ 8.80% ║ ║ Total ║ 179.8 ║ ║ ╚══════════════════════════════════════════════════════╩══════════╩════════╝
,
Apr 5 2017
The table in #6 is based on total time running smoothness.top_25_smooth.
,
Apr 6 2017
Here are results from switching smoothness.top_25_smooth to use TBMv2 'responsivenessMetric'. Overall the benchmark duration increased from 6.5 to 10.7 minutes Trace result and metric processing times: ╔══════════════════════════════════════════════════════╦══════════╦════════╗ ║ Step ║ Time (s) ║ % ║ ╠══════════════════════════════════════════════════════╬══════════╬════════╣ ║ trace_result (receive trace_data from browser) ║ 202.0 ║ 48.84% ║ ║ trace_value (save trace_data to HTML) ║ 119.9 ║ 28.99% ║ ║ compute_tbm_metrics (TBMv2) ║ 91.7 ║ 22.17% ║ ║ Total ║ 413.6 ║ ║ ╚══════════════════════════════════════════════════════╩══════════╩════════╝ Perhaps we can better tune trace categories required for the TBMv2 metric. For the moment it looks like switching all the benchmarks may be a problem.
,
Apr 7 2017
Hmhh, for some reason a sample trace of smoothness.top_25_smooth in https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_7-2017-04-06_16-02-41-63624.html shows that "receiving trace_data from browser" is extremely fast: 2ms (see attached screenshot) It could be that our implementation of python tracing is buggy, hmhh... Though, thanks for bring our attention to this! Can you also show me the CL that you switch smoothness.top_25_smooth to use TBMv2 'responsivenessMetric'?
,
May 1 2017
,
May 1 2017
Assuming we have 2 stories, A and B, that run in that order: Couldn't processing the trace for story A while running story B cause problems? It will affect metrics that are sensitive to system wide changes (such as power metrics).
,
May 2 2017
#11: Yes that's true especially for total system power metrics. Potentially measurement could pipeline with some of the test setup such as up until we complete navigation, and before we start measurement on many benchmarks.
,
May 2 2017
One design I think we can pursue without worrying about race condition is delay processing the TBMv2 metrics until the very end. At that point, we have N (=number of stories run) traces, and just do a trivial parallel computation to get N json results back. Telemetry then just takes those N json results & augment them to the Telemetry results object (serial, but this should be fairly fast). The trace mapper kinda already support this in https://github.com/catapult-project/catapult/blob/master/tracing/tracing/mre/map_runner.py#L83
,
May 11 2017
Ok, I do more measurement locally to understand the differences between python based vs JS based. In particular, I look at the cost of computing tracing metrics between python vs js for https://www.google.com/#hl=en&q=barack+obama in smoothness.top_25_smooth on my Mac: Python version: 0.15 s Tracing version: 1.44s To reproduce the JS's timing, one can add timing code to https://github.com/catapult-project/catapult/blob/master/tracing/bin/run_metric#L51 & run ./third_party/catapult/tracing/bin/run_metric https___www_google_com__hl_en_q_barack_obama02017-05-11_16-53-34.html responsivenessMetric
,
Jan 16
,
Jan 16
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by vmi...@chromium.org
, Apr 5 2017