New issue
Advanced search Search tips

Issue 753591 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

system_health benchmarks very slow [4+min for a second page]

Project Member Reported by erikc...@chromium.org, Aug 8 2017

Issue description

time ./tools/perf/run_benchmark --browser=exact --browser-executable=./out/gn/Chromium.app/Contents/MacOS/Chromium system_health.common_desktop --story-filter='load:search:amazon' --pageset-repeat=1 --output-format=json --output-format=html

(WARNING) 2017-08-08 16:09:06,898 desktop_browser_finder.FindAllAvailableBrowsers:171  Chrome build location for mac_x86_64 not found. Browser will be run without Flash.
(WARNING) 2017-08-08 16:09:06,904 desktop_browser_finder.FindAllAvailableBrowsers:171  Chrome build location for mac_x86_64 not found. Browser will be run without Flash.
(WARNING) 2017-08-08 16:09:11,435 desktop_browser_finder.FindAllAvailableBrowsers:171  Chrome build location for mac_x86_64 not found. Browser will be run without Flash.
[ RUN      ] load:search:amazon@{'case': 'load', 'group': 'search'}
[ RUN      ] /var/folders/5q/c5bkrfs948b4rr1kwtfspm1w003db3/T/tmpRH1G4k.html
[       OK ] /var/folders/5q/c5bkrfs948b4rr1kwtfspm1w003db3/T/tmpRH1G4k.html (32029 ms)
(WARNING) 2017-08-08 16:13:24,459 timeline_based_measurement._ComputeTimelineBasedMetrics:325  Processing resulting traces took 32.143 seconds
[       OK ] load:search:amazon@{'case': 'load', 'group': 'search'} (252975 ms)
[  PASSED  ] 1 test.

View result at file:///Users/erikchen/projects/chromium/src/tools/perf/results.json
View result at file:///Users/erikchen/projects/chromium/src/tools/perf/results.html
(WARNING) 2017-08-08 16:13:28,288 ps_util._ListAllSubprocesses:61  psutil is not installed on the system. Not listing possible leaked processes. To install psutil, see: https://pypi.python.org/pypi/psutil

real	4m22.467s
user	6m38.757s
sys	0m45.113s


Some debugging shows that the bulk of time [80%+] is spent in inspector websockot stream parsing. e.g. _ReadChunkFromStream and _GotChunkFromStream in tracing_backend.py. 
 
My first thought is that this was caused by a too-small request size. The comment about https://github.com/liris/websocket-client/issues/163 seems out of date, as the issue has been fixed. But removing the size parameter resulted in no change to run time.
Profiling with cProfile:

"""
Tue Aug  8 16:33:28 2017    /Users/erikchen/profile_stats

         465481263 function calls (464611972 primitive calls) in 300.790 seconds

   Ordered by: cumulative time
   List reduced from 5554 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003  300.799  300.799 ./tools/perf/run_benchmark:6(<module>)
        1    0.000    0.000  300.136  300.136 ./tools/perf/run_benchmark:18(main)
        1    0.000    0.000  300.135  300.135 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/benchmark_runner.py:283(main)
        1    0.000    0.000  299.884  299.884 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/benchmark_runner.py:221(Run)
        1    0.000    0.000  299.884  299.884 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/benchmark.py:95(Run)
        1    0.000    0.000  299.884  299.884 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:263(RunBenchmark)
        1    0.000    0.000  297.661  297.661 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:144(Run)
        1    0.000    0.000  291.891  291.891 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py:89(_RunStoryAndProcessErrorIfNeeded)
        1    0.000    0.000  269.257  269.257 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:282(Measure)
     5395    0.046    0.000  236.234    0.044 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:141(_Receive)
     5395    0.013    0.000  230.680    0.043 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:287(recv)
     5395    0.007    0.000  230.667    0.043 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:301(recv_data)
     5395    0.029    0.000  230.660    0.043 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_core.py:313(recv_data_frame)
     5336    0.012    0.000  228.401    0.043 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py:126(DispatchNotifications)
        1    0.000    0.000  227.890  227.890 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:45(StopTracing)
        1    0.000    0.000  227.890  227.890 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:107(StopTracing)
        1    0.000    0.000  227.670  227.670 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:235(CollectAgentTraceData)
        1    0.000    0.000  227.670  227.670 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py:377(CollectChromeTracingData)
        1    0.000    0.000  227.670  227.670 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py:227(CollectTraceData)
        1    0.016    0.016  227.670  227.670 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py:233(_CollectTracingData)
     5395    0.013    0.000  209.094    0.039 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_abnf.py:414(extract)
     5395    0.003    0.000  209.081    0.039 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_utils.py:92(validate_utf8)
     5395  103.753    0.019  209.078    0.039 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_utils.py:79(_validate_utf8)
198852127   93.316    0.000   93.316    0.000 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_utils.py:70(_decode)
        1    0.000    0.000   30.688   30.688 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:314(_ComputeTimelineBasedMetrics)
        1    0.000    0.000   30.677   30.677 /Users/erikchen/projects/chromium/src/third_party/catapult/tracing/tracing/metrics/metric_runner.py:34(RunMetric)
        1    0.000    0.000   30.677   30.677 /Users/erikchen/projects/chromium/src/third_party/catapult/tracing/tracing/metrics/metric_runner.py:40(RunMetricOnTraces)
        1    0.000    0.000   30.670   30.670 /Users/erikchen/projects/chromium/src/third_party/catapult/tracing/tracing/mre/map_runner.py:83(RunMapper)
        1    0.000    0.000   30.670   30.670 /Users/erikchen/projects/chromium/src/third_party/catapult/tracing/tracing/mre/threaded_work_queue.py:27(Run)
        1    4.701    4.701   30.670   30.670 /Users/erikchen/projects/chromium/src/third_party/catapult/tracing/tracing/mre/threaded_work_queue.py:101(_RunMultiThreaded)
"""

Comment 3 Deleted

*The current pipeline is:

Telemetry collect trace data in chunks from Chrome through websocket. (1)
Telemetry write trace data to a file on disk for each chunk it received. (2)

An idea I can think of is instead of transfer tracing using websocket, we send a devtool command that tells Chrome to dump all trace to a file on disk directly.

Though I suspect the biggest problem here is the trace is too big, hence any attempt to transfer this blob of data from one place to another is slow.

+caseq: who added the trace streaming to fix the tracing timeout back then.
/Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_utils.py:92(validate_utf8)
     5395  103.753    0.019  209.078    0.039 /Users/erikchen/projects/chromium/src/third_party/catapult/telemetry/third_party/websocket-client/websocket/_utils.py:79(_validate_utf8)
198852127   93.316    0.000   93.316    0.000 

The problem is that websocket-client has implemented a custom utf8 validator which is called 198 million times.
Setting skip_utf8_validation to true in from third_party/websocket-client/websocket/_abnf.p brings the total run time down to 1m36s. This shaves off 3 minutes, as expected from the profiling. 
Owner: erikc...@chromium.org
Status: Started (was: Untriaged)
Cc: perezju@chromium.org charliea@chromium.org
Hey, that's cool detective work using sampling tool!

+charlie & perejzu@ may be interested as well
Wow, that's awesome!
Awesome work Erik! Really excited to see these sped up.
Project Member

Comment 11 by bugdroid1@chromium.org, Aug 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ad1dada273fc9bfb77dfce1052332739a68fd8cf

commit ad1dada273fc9bfb77dfce1052332739a68fd8cf
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Mon Aug 21 23:07:42 2017

Roll src/third_party/catapult/ 28fb1abdf..333dc8e65 (3 commits)

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/28fb1abdfc1e..333dc8e655b2

$ git log 28fb1abdf..333dc8e65 --date=short --no-merges --format='%ad %ae %s'
2017-08-21 erikchen Disable UTF8 validation from websocket-client for telemetry.
2017-08-21 simonhatch Dashboard - Add some logging around /email_summary.
2017-08-21 simonhatch Trying to do a test migration fails out right now with AttributeError since stoppage_alert was removed from TestMetadata.

Created with:
  roll-dep src/third_party/catapult
BUG=753591,751747


Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: Ifd45eff47fa1a2011529581c6db09a31237a4063
Reviewed-on: https://chromium-review.googlesource.com/624900
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#496109}
[modify] https://crrev.com/ad1dada273fc9bfb77dfce1052332739a68fd8cf/DEPS

Components: Test>Telemetry
Components: -Tests>Telemetry

Sign in to add a comment