[telemetry] page_cycler_v2.intl_ja_zh is broken on ChromeOS |
|||||||||||||||||||
Issue descriptionWhen I try to run this page_cycler test in ChromeOS it fails (in WillRunStory, in timeline_based_measurement.py) because the CrosPlatformBackend apparently returns False for IsChromeTracingSupported. This needs to be fixed asap.
,
Jul 29 2016
Note that at least 30% of telemetry benchmarks go through that same code path.
,
Jul 29 2016
Hopefully this is just a matter of returning True there. Ben, can you take a look?
,
Jul 29 2016
,
Aug 1 2016
,
Aug 1 2016
Note: This is a time-critical bug. If it is not fixed by Weds (California time), the Chrome PFQ waterfall is going to start failing.
,
Aug 1 2016
Let me revert PCv1 benchmarks on CrOS for now to make this not time-critical
,
Aug 1 2016
Heads up chromeos gardener
,
Aug 2 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f00d65e14a65a308a1f7b9e07c96859fff8b5280 commit f00d65e14a65a308a1f7b9e07c96859fff8b5280 Author: kouhei <kouhei@chromium.org> Date: Tue Aug 02 01:08:44 2016 Re-enable some PCv1 benchmarks for CrOS This CL re-enables page_cycler.intl_{ja_zh,es_fr_pt-BR,ar_fa_he} PCv1 benchmarks for ChromeOS This is a temporary solution until crbug.com/632815 is resolved TBR=nednguyen@google.com BUG= 632815 , 611329 Review-Url: https://codereview.chromium.org/2203593002 Cr-Commit-Position: refs/heads/master@{#409108} [modify] https://crrev.com/f00d65e14a65a308a1f7b9e07c96859fff8b5280/tools/perf/benchmarks/page_cycler.py
,
Aug 2 2016
It seems that the tracing agent used by page_cycler_v2 was never tested on CrOS. After adding chromeos to the OS list, I can get the test to go a little further:
+++ b/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.p
y
@@ -17,7 +17,7 @@ from telemetry.internal.platform import tracing_agent
from telemetry.internal.platform.tracing_agent import (
chrome_tracing_devtools_manager)
-_DESKTOP_OS_NAMES = ['linux', 'mac', 'win']
+_DESKTOP_OS_NAMES = ['chromeos', 'linux', 'mac', 'win']
_STARTUP_TRACING_OS_NAMES = _DESKTOP_OS_NAMES + ['android']
The current failure in my local tree is from attempting to record a clock sync marker. Here is the exception message:
(ERROR) 2016-08-02 21:20:09,778 chrome_tracing_agent._RecordClockSyncMarkerDevTools:146 Failed to record clock sync marker with sync_id='bf88a8b5-6cef-4796-af39-fdbb048ff0b8' via DevTools client <telemetry.internal.backends.chrome_inspector.devtools_client_backend.DevToolsClientBackend object at 0x7fd36b645b90>:
And here is the call stack:
Traceback (most recent call last):
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 87, in _RunStoryAndProcessErrorIfNeeded
test.Measure(state.platform, results)
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 281, in Measure
trace_result = platform.tracing_controller.StopTracing()
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 47, in StopTracing
return self._tracing_controller_backend.StopTracing()
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 108, in StopTracing
self._IssueClockSyncMarker()
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 199, in _IssueClockSyncMarker
self._RecordIssuerClockSyncMarker)
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 181, in RecordClockSyncMarker
devtools_clients)
File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 148, in _RecordClockSyncMarkerDevTools
raise ChromeClockSyncError()
ChromeClockSyncError
In telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py, the record clock-sync operation is attempted via the inspector_websocket but apparently it failed on cros-chrome:
def RecordClockSyncMarker(self, sync_id):
assert self.is_tracing_running, 'Tracing must be running to clock sync.'
req = {
'method': 'Tracing.recordClockSyncMarker',
'params': {
'syncId': sync_id
}
}
rc = self._inspector_websocket.SyncRequest(req, timeout=2)
if 'error' in rc:
raise ClockSyncResponseException(rc['error']['message'])
I looked at the symbols on my Samus chrome and I can see a bunch of functions related to recordClockSyncMarker. Is there a way to directly test the browser why the Tracing.recordClockSyncMarker method failed?
,
Aug 2 2016
The error looks like the cros version does not support Tracing.recordClockSyncMarker yet. Randy, do you know if we check the chrome version before doing clock sync?
,
Aug 2 2016
We have two clock sync types for chrome. One uses devtools, the other is a fallback. It is gated on the version of chrome being over branch 2661. https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py?q=%22version)+%3E%3D+2661%22&sq=package:chromium&l=178
,
Aug 3 2016
I added a print and the version on my Samus is 2785. version is 2785 (ERROR) 2016-08-03 09:44:29,986 chrome_tracing_agent._RecordClockSyncMarkerDevTools:146 Failed to record clock sync marker with sync_id='1de26df9-b51e-494f-a9e9-bdac963fd380' via DevTools client <telemetry.internal.backends.chrome_inspector.devtools_client_backend.DevToolsClientBackend object at 0x7f04a508f990>:
,
Aug 3 2016
I think the problem is that the initial CrOS logon screen doesn't support tracing, and that disrupts the agent setup flow.
First of all, the clock sync failure is due to the follwoing IsTracing check in Chrome. If I comment it out, the test can run to finish without problems.
+++ b/content/browser/devtools/protocol/tracing_handler.cc
@@ -290,8 +290,11 @@ void TracingHandler::OnMemoryDumpFinished(DevToolsCommandId command_id,
}
Response TracingHandler::RecordClockSyncMarker(const std::string& sync_id) {
+/*
if (!IsTracing())
return Response::InternalError("Tracing is not started");
+*/
+
TracingControllerImpl::GetInstance()->RecordClockSyncMarker(
Then I added a lot of logging to Telemetry code and found that the initial call to _StartStartupTracing claims that chrome_tracing_devtools_manager is not supported.
[ RUN ] http://msn.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing(chrome_tracing_devtools_manager is not supported) <<-- not supported
version is 2817
In RecordClockSyncMarker
[ OK ] http://msn.co.il/ (17004 ms)
In subsequent tests the tracing is started via the devtools completely, so record clock sync operations all pass too.
[ RUN ] http://msn.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup <<-- setup properly
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
[ OK ] http://msn.co.il/ (6775 ms)
[ RUN ] http://ynet.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
[ OK ] http://ynet.co.il/ (10368 ms)
[ RUN ] http://ynet.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
:
:
In chrome_tracing_agent.py/StartAgentTracing I noticed the following comment:
# Chrome tracing Agent needs to start tracing for chrome browsers that are
# not yet started, and for the ones that already are. For the former, we
# first setup the trace_config_file, which allows browsers that starts after
# this point to use it for enabling tracing upon browser startup. For the
# latter, we invoke start tracing command through devtools for browsers that
# are already started and tracked by chrome_tracing_devtools_manager.
Achuith/Ned, do you know if we have ways to delay the first call of WillRunStory after the logon activity has finished? I think historically it affects the score of the first tab in tab switch test too.
,
Aug 3 2016
The fact that logon activity affect the score of the first tab should be considered a feature, not a bug. The reason is for metrics related to startup part of the browser, we want the trace to cover as much activities of the browser as possible. Is the real bug here about chromeOs can't support startup tracing?
,
Aug 3 2016
The bug here is tracing doesn't work for the first tab/page. The subsequent ones are fine.
,
Aug 3 2016
If startup tracing work for ChromeOs, it should work for first tab/page. I believe the right fix is to make startup tracing working for chromeOs, not by changing telemetry.
,
Aug 3 2016
Whatever the final correct fix, we need some kind of workaround fix ASAP! The waterfall is going to start failing very soon. Is there any way to bypass the tracing stuff for ChromeOS, for now, until you figure out the right fix?
,
Aug 3 2016
I won't accept workaround to telemetry in this case since it complicates the architecture of the framework further. This code path has been using for many tracing based benchmarks since last year, so I am not sure why now it is urgent to do the workaround.
,
Aug 3 2016
This only started failing in the ChromeOS telemetry benchmarks about a week ago, and they've been running these benchmarks on ChromeOS for quite a while. Did someone add or turn on tracing in the page_cycler benchmarks recently?
,
Aug 3 2016
There is legacy page_cycler benchmarks which don't use tracing & the new one that are using tracing. We had an announcement to replace the legacy ones with the new one in https://groups.google.com/a/chromium.org/forum/#!topic/telemetry-announce/oqlifxmeOks THe question here is there are other dozen benchmarks also use tracing: smoothness thread_times memory blob_storage v8,... How come this only affect page_cycler_v2
,
Aug 3 2016
Actually is doesn't only affect page_cycler_v2. It's affecting the old page_cyclers as well. At first we thought it was just page_cycler_v2, and thought it was due to the recent page_cycler upgrade, but that turns out to be incorrect; it was just coincidental timing. We're running the old page_cycler test and seeing this failure, which started about one week ago.
,
Aug 3 2016
As kouhei updated in comment #9 page_cycler should start to pass again as the problematic v2 pages have been reverted. Based on additional logging the old page cycler calls RegisterDevToolsClient early: $ ./run_benchmark --browser=cros-chrome --remote=10.10.10.8 page_cycler_v2.intl_ja_zh [ RUN ] http://www.amazon.co.jp In RegisterDevToolsClient But the page_cycler_v2 attempts to write clock sync event before setting up the DevToolsClient, hence the failure: [ RUN ] http://www.amazon.co.jp In WillRunStory In tracing_controller_backend.py StartTracing In chrome_tracing_agent.py:StartAgentTracing In chrome_tracing_agent.py: _StartStartupTracing {} In chrome_tracing_agent.py: _StartDevToolsTracing(chrome_tracing_devtools_manager is not supported) In RegisterDevToolsClient Based on the current code in telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py, it is never tested on CrOS otherwise the early abort should have been noticed: DESKTOP_OS_NAMES = ['linux', 'mac', 'win'] I will try to find the cleanest fix but this seems another case that not enough testing is done on CrOS before major changes.
,
Aug 3 2016
I am confused here: talking about v1 version of page cyclers. This is still broken after the revert. As of last night 8 PM. So, there is a problem with the revert!! 08/02 20:27:00.770 INFO |telemetry_AFDOGene:0189| Starting run for Telemetry benchmark page_cycler.intl_ja_zh 08/02 20:27:00.770 DEBUG| telemetry_runner:0380| Running telemetry benchmark: page_cycler.intl_ja_zh 08/02 20:27:00.771 DEBUG| telemetry_runner:0312| Running Telemetry: ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh 08/02 20:27:00.771 DEBUG| telemetry_runner:0273| Running: ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh 08/02 20:27:00.771 DEBUG| base_utils:0185| Running 'ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh' 08/02 20:27:02.136 DEBUG| telemetry_runner:0284| Error occurred executing. 08/02 20:27:02.136 DEBUG| telemetry_runner:0290| Completed with exit code: 255. stdout: stderr:Warning: Permanently added '100.107.160.1' (ECDSA) to the list of known hosts. No benchmark named "page_cycler.intl_ja_zh". Do you mean any of those benchmarks below? Available benchmarks are: page_cycler.basic_oopif A benchmark measuring performance of the out-of-process iframes page page_cycler.intl_hi_ru Page load time benchmark for a variety of pages in Hindi and Russian. page_cycler.intl_ko_th_vi Page load time for a variety of pages in Korean, Thai and Vietnamese. page_cycler.top_10_mobile Page load time benchmark for the top 10 mobile web pages. page_cycler.tough_layout_cases Page loading for the slowest layouts observed in the Alexa top 1 million. page_cycler.typical_25 Page load time benchmark for a 25 typical web pages. page_cycler_site_isolation.basic_oopif A benchmark measuring performance of out-of-process iframes. page_cycler_v2.basic_oopif A benchmark measuring performance of the out-of-process iframes page page_cycler_v2.intl_ar_fa_he Page load time for a variety of pages in Arabic, Farsi and Hebrew. page_cycler_v2.intl_es_fr_pt-BR Page load time for a pages in Spanish, French and Brazilian Portuguese. page_cycler_v2.intl_hi_ru Page load time benchmark for a variety of pages in Hindi and Russian. page_cycler_v2.intl_ja_zh Page load time benchmark for a variety of pages in Japanese and Chinese. page_cycler_v2.intl_ko_th_vi Page load time for a variety of pages in Korean, Thai and Vietnamese. page_cycler_v2.top_10_mobile Page load time benchmark for the top 10 mobile web pages. page_cycler_v2.tough_layout_cases Page loading for the slowest layouts observed in the Alexa top 1 million. page_cycler_v2.typical_25 Page load time benchmark for a 25 typical web pages. page_cycler_v2_site_isolation.basic_oopif A benchmark measuring performance of out-of-process iframes. Pass --browser to list benchmarks for another browser. (INFO) 2016-08-02 20:28:07,695 atexit_with_log._wrapped_function:10 Try running <function _ListAllSubprocesses at 0x7f7147390a28> (INFO) 2016-08-02 20:28:07,702 atexit_with_log._wrapped_function:12 Did run <function _ListAllSubprocesses at 0x7f7147390a28>
,
Aug 3 2016
actually, the revert DOES seem to work. The timing does not match but builder 8949 passes: https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/8949 Builder 8948 should have passed too but it failed with the error on #24. I don't understand why 8948 did not pass but we seem to be ok now after the revert.
,
Aug 3 2016
The PFQ does not build from the chrome ToT, it uses a tag. 8948 built with chrome tag 54.0.2816.4 8949 built with chrome tag 54.0.2817.0 So if the revert occurred between those two tags, then the result is expected.
,
Aug 3 2016
(The tag can be found looking at the SyncChrome stage)
,
Aug 3 2016
I did not know the PFQ was not picking up from Chrome ToT. That should explain it.
,
Aug 8 2016
I have root caused the failure: it is because the launch command of Chrome on CrOS needs proper setup of --trace-config-file=xxx in telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py. This part of fix is easy, as we just need to add the following to the end of GetBrowserStartupArgs like the code in desktop_browser_backend.py:
trace_config_file = (self.platform_backend.tracing_controller_backend
.GetChromeTraceConfigFile())
if trace_config_file:
args.append('--trace-config-file=%s' % trace_config_file)
The more difficult part is how to put the chrome-trace-config.json on the DUT. Since we can run Telemetry directly on the DUT or from the workstation, we have to support both ways and use the proper utility to create the trace config file.
,
Aug 8 2016
bccheng: I think your fix is right, but it's meant to apply to cros_browser_backend? https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py
,
Aug 8 2016
Yes the --trace-config-file part will be added to cros_browser_backend. Do you have any suggestion on where to find similar utilities to write files on the DUT like what it does for Android? https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py#L248
,
Aug 8 2016
For android, we push the trace file to the device using adb: https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py#L253 https://github.com/catapult-project/catapult/blob/master/devil/devil/android/device_utils.py#L1565
,
Aug 8 2016
Ben: We push unpackaged extensions to the DUT like so: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?l=32-40 cros_interface.PushFile correctly handles both the remote and local case, so maybe you can leverage that?
,
Aug 9 2016
Thanks Ned and Achuith! CL uploaded for review: https://codereview.chromium.org/2223963003/
,
Aug 10 2016
,
Aug 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/14483d2d05a7b823e00485b92263a6b36ff25d90 commit 14483d2d05a7b823e00485b92263a6b36ff25d90 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Wed Aug 10 21:06:10 2016 Roll src/third_party/catapult/ 28adad6ee..49d354d56 (6 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/28adad6ee466..49d354d564ab $ git log 28adad6ee..49d354d56 --date=short --no-merges --format='%ad %ae %s' BUG= 632815 TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2230253003 Cr-Commit-Position: refs/heads/master@{#411145} [modify] https://crrev.com/14483d2d05a7b823e00485b92263a6b36ff25d90/DEPS
,
Aug 11 2016
Ben: is this bug fixed?
,
Aug 11 2016
Yes fixed.
,
Aug 29 2016
,
Oct 7 2016
,
Nov 19 2016
,
Jan 21 2017
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||
Comment 1 by achuith@chromium.org
, Jul 29 2016