New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 632815 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 611329



Sign in to add a comment

[telemetry] page_cycler_v2.intl_ja_zh is broken on ChromeOS

Project Member Reported by cmt...@chromium.org, Jul 29 2016

Issue description

When I try to run this page_cycler test in ChromeOS it fails (in WillRunStory, in timeline_based_measurement.py) because the CrosPlatformBackend  apparently returns False for IsChromeTracingSupported.

This needs to be fixed asap.
 
Owner: bccheng@chromium.org
Is this a regression? Did Chromeos support tracing before?
Note that at least 30% of telemetry benchmarks go through that same code path.
Status: Assigned (was: Untriaged)
Hopefully this is just a matter of returning True there. 

Ben, can you take a look?

Comment 4 by cmt...@chromium.org, Jul 29 2016

Cc: laszio@chromium.org
Status: Started (was: Assigned)
Note:  This is a time-critical bug.  If it is not fixed by Weds (California time), the Chrome PFQ waterfall is going to start failing.
Let me revert PCv1 benchmarks on CrOS for now to make this not time-critical
Cc: steve...@chromium.org
Heads up chromeos gardener
Project Member

Comment 9 by bugdroid1@chromium.org, Aug 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f00d65e14a65a308a1f7b9e07c96859fff8b5280

commit f00d65e14a65a308a1f7b9e07c96859fff8b5280
Author: kouhei <kouhei@chromium.org>
Date: Tue Aug 02 01:08:44 2016

Re-enable some PCv1 benchmarks for CrOS

This CL re-enables page_cycler.intl_{ja_zh,es_fr_pt-BR,ar_fa_he} PCv1 benchmarks for ChromeOS
This is a temporary solution until  crbug.com/632815  is resolved

TBR=nednguyen@google.com
BUG= 632815 , 611329 

Review-Url: https://codereview.chromium.org/2203593002
Cr-Commit-Position: refs/heads/master@{#409108}

[modify] https://crrev.com/f00d65e14a65a308a1f7b9e07c96859fff8b5280/tools/perf/benchmarks/page_cycler.py

Cc: zh...@chromium.org shunhsingou@chromium.org deanliao@chromium.org nednguyen@chromium.org
It seems that the tracing agent used by page_cycler_v2 was never tested on CrOS. After adding chromeos to the OS list, I can get the test to go a little further:

+++ b/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.p
y
@@ -17,7 +17,7 @@ from telemetry.internal.platform import tracing_agent
 from telemetry.internal.platform.tracing_agent import (
     chrome_tracing_devtools_manager)
 
-_DESKTOP_OS_NAMES = ['linux', 'mac', 'win']
+_DESKTOP_OS_NAMES = ['chromeos', 'linux', 'mac', 'win']
 _STARTUP_TRACING_OS_NAMES = _DESKTOP_OS_NAMES + ['android']

The current failure in my local tree is from attempting to record a clock sync marker. Here is the exception message:

(ERROR) 2016-08-02 21:20:09,778 chrome_tracing_agent._RecordClockSyncMarkerDevTools:146  Failed to record clock sync marker with sync_id='bf88a8b5-6cef-4796-af39-fdbb048ff0b8' via DevTools client <telemetry.internal.backends.chrome_inspector.devtools_client_backend.DevToolsClientBackend object at 0x7fd36b645b90>:

And here is the call stack:

Traceback (most recent call last):
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 87, in _RunStoryAndProcessErrorIfNeeded
    test.Measure(state.platform, results)
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 281, in Measure
    trace_result = platform.tracing_controller.StopTracing()
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 47, in StopTracing
    return self._tracing_controller_backend.StopTracing()
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 108, in StopTracing
    self._IssueClockSyncMarker()
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 199, in _IssueClockSyncMarker
    self._RecordIssuerClockSyncMarker)
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 181, in RecordClockSyncMarker
    devtools_clients)
  File "/work/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py", line 148, in _RecordClockSyncMarkerDevTools
    raise ChromeClockSyncError()
ChromeClockSyncError

In telemetry/telemetry/internal/backends/chrome_inspector/tracing_backend.py, the record clock-sync operation is attempted via the inspector_websocket but apparently it failed on cros-chrome:
 
def RecordClockSyncMarker(self, sync_id):
    assert self.is_tracing_running, 'Tracing must be running to clock sync.'
    req = {
      'method': 'Tracing.recordClockSyncMarker',
      'params': {
        'syncId': sync_id
      }
    }
    rc = self._inspector_websocket.SyncRequest(req, timeout=2)
    if 'error' in rc:
      raise ClockSyncResponseException(rc['error']['message'])

I looked at the symbols on my Samus chrome and I can see a bunch of functions related to recordClockSyncMarker. Is there a way to directly test the browser why the Tracing.recordClockSyncMarker method failed?
Cc: rnep...@chromium.org
The error looks like the cros version does not support Tracing.recordClockSyncMarker yet. Randy, do you know if we check the chrome version before doing clock sync?
We have two clock sync types for chrome. One uses devtools, the other is a fallback. It is gated on the version of chrome being over branch 2661.

https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py?q=%22version)+%3E%3D+2661%22&sq=package:chromium&l=178
I added a print and the version on my Samus is 2785.

version is 2785
(ERROR) 2016-08-03 09:44:29,986 chrome_tracing_agent._RecordClockSyncMarkerDevTools:146  Failed to record clock sync marker with sync_id='1de26df9-b51e-494f-a9e9-bdac963fd380' via DevTools client <telemetry.internal.backends.chrome_inspector.devtools_client_backend.DevToolsClientBackend object at 0x7f04a508f990>:

I think the problem is that the initial CrOS logon screen doesn't support tracing, and that disrupts the agent setup flow.

First of all, the clock sync failure is due to the follwoing IsTracing check in Chrome. If I comment it out, the test can run to finish without problems.

+++ b/content/browser/devtools/protocol/tracing_handler.cc
@@ -290,8 +290,11 @@ void TracingHandler::OnMemoryDumpFinished(DevToolsCommandId command_id,
 }
 
 Response TracingHandler::RecordClockSyncMarker(const std::string& sync_id) {
+/*
   if (!IsTracing())
     return Response::InternalError("Tracing is not started");
+*/
+
 
   TracingControllerImpl::GetInstance()->RecordClockSyncMarker(


Then I added a lot of logging to Telemetry code and found that the initial call to _StartStartupTracing claims that chrome_tracing_devtools_manager is not supported.

[ RUN      ] http://msn.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing(chrome_tracing_devtools_manager is not supported)  <<-- not supported
version is 2817
In RecordClockSyncMarker
[       OK ] http://msn.co.il/ (17004 ms)


In subsequent tests the tracing is started via the devtools completely, so record clock sync operations all pass too.

[ RUN      ] http://msn.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup  <<-- setup properly
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
[       OK ] http://msn.co.il/ (6775 ms)
[ RUN      ] http://ynet.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
[       OK ] http://ynet.co.il/ (10368 ms)
[ RUN      ] http://ynet.co.il/
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
In chrome_tracing_agent.py: _StartDevToolsTracing setup
In devtools_client_backend.py: StartChromeTracing
version is 2817
In RecordClockSyncMarker
:
:


In chrome_tracing_agent.py/StartAgentTracing I noticed the following comment:

# Chrome tracing Agent needs to start tracing for chrome browsers that are
# not yet started, and for the ones that already are. For the former, we
# first setup the trace_config_file, which allows browsers that starts after
# this point to use it for enabling tracing upon browser startup. For the
# latter, we invoke start tracing command through devtools for browsers that
# are already started and tracked by chrome_tracing_devtools_manager.

Achuith/Ned, do you know if we have ways to delay the first call of WillRunStory after the logon activity has finished? I think historically it affects the score of the first tab in tab switch test too.
The fact that logon activity affect the score of the first tab should be considered a feature, not a bug. The reason is for metrics related to startup part of the browser, we want the trace to cover as much activities of the browser as possible.

Is the real bug here about chromeOs can't support startup tracing?
The bug here is tracing doesn't work for the first tab/page. The subsequent ones are fine.
If startup tracing work for ChromeOs, it should work for first tab/page. I believe the right fix is to make startup tracing working for chromeOs, not by changing telemetry.
Whatever the final correct fix, we need some kind of workaround fix ASAP!  The waterfall is going to start failing very soon.  Is there any way to bypass the tracing stuff for ChromeOS, for now, until you figure out the right fix?
I won't accept workaround to telemetry in this case since it complicates the architecture of the framework further. 
This code path has been using for many tracing based benchmarks since last year, so I am not sure why now it is urgent to do the workaround.
This only started failing in the ChromeOS telemetry benchmarks about a week ago, and they've been running these benchmarks on ChromeOS for quite a while. Did someone add or turn on tracing in the page_cycler benchmarks recently?
There is legacy page_cycler benchmarks which don't use tracing & the new one that are using tracing. We had an announcement to replace the legacy ones with the new one in https://groups.google.com/a/chromium.org/forum/#!topic/telemetry-announce/oqlifxmeOks

THe question here is there are other dozen benchmarks also use tracing:
smoothness
thread_times
memory
blob_storage
v8,...

How come this only affect page_cycler_v2
Actually is doesn't only affect page_cycler_v2.  It's affecting the old page_cyclers as well. At first we thought it was just page_cycler_v2, and thought it was due to the recent page_cycler upgrade, but that turns out to be incorrect; it was just coincidental timing.  We're running the old page_cycler test and seeing this failure, which started about one week ago.
As kouhei updated in comment #9 page_cycler should start to pass again as the problematic v2 pages have been reverted. 

Based on additional logging the old page cycler calls RegisterDevToolsClient early:

$ ./run_benchmark --browser=cros-chrome --remote=10.10.10.8 page_cycler_v2.intl_ja_zh
[ RUN      ] http://www.amazon.co.jp
In RegisterDevToolsClient

But the page_cycler_v2 attempts to write clock sync event before setting up the DevToolsClient, hence the failure:

[ RUN      ] http://www.amazon.co.jp
In WillRunStory
In tracing_controller_backend.py StartTracing
In chrome_tracing_agent.py:StartAgentTracing
In chrome_tracing_agent.py: _StartStartupTracing
{}
In chrome_tracing_agent.py: _StartDevToolsTracing(chrome_tracing_devtools_manager is not supported)
In RegisterDevToolsClient

Based on the current code in telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py, it is never tested on CrOS otherwise the early abort should have been noticed:

DESKTOP_OS_NAMES = ['linux', 'mac', 'win']

I will try to find the cleanest fix but this seems another case that not enough testing is done on CrOS before major changes.
I am confused here: 
talking about v1 version of page cyclers. This is still broken after the revert.
As of last night 8 PM.
So, there is a problem with the revert!!

08/02 20:27:00.770 INFO |telemetry_AFDOGene:0189| Starting run for Telemetry benchmark page_cycler.intl_ja_zh
08/02 20:27:00.770 DEBUG|  telemetry_runner:0380| Running telemetry benchmark: page_cycler.intl_ja_zh
08/02 20:27:00.771 DEBUG|  telemetry_runner:0312| Running Telemetry: ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh
08/02 20:27:00.771 DEBUG|  telemetry_runner:0273| Running: ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh
08/02 20:27:00.771 DEBUG|        base_utils:0185| Running 'ssh 100.107.160.1 python /home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src/tools/perf/run_benchmark --verbose --browser=cros-chrome --output-format=chartjson --output-dir=/home/chromeos-test/images/lumpy-chrome-pfq/R54-8666.0.0-rc1/telemetry_src/src --remote=chromeos2-row3-rack9-host13 --pageset-repeat=1 page_cycler.intl_ja_zh'
08/02 20:27:02.136 DEBUG|  telemetry_runner:0284| Error occurred executing.
08/02 20:27:02.136 DEBUG|  telemetry_runner:0290| Completed with exit code: 255.
stdout:
stderr:Warning: Permanently added '100.107.160.1' (ECDSA) to the list of known hosts.
No benchmark named "page_cycler.intl_ja_zh".

Do you mean any of those benchmarks below?
Available benchmarks are:
  page_cycler.basic_oopif                    A benchmark measuring performance of the out-of-process iframes page
  page_cycler.intl_hi_ru                    Page load time benchmark for a variety of pages in Hindi and Russian.
  page_cycler.intl_ko_th_vi                 Page load time for a variety of pages in Korean, Thai and Vietnamese.
  page_cycler.top_10_mobile                 Page load time benchmark for the top 10 mobile web pages.
  page_cycler.tough_layout_cases            Page loading for the slowest layouts observed in the Alexa top 1 million.
  page_cycler.typical_25                    Page load time benchmark for a 25 typical web pages.
  page_cycler_site_isolation.basic_oopif     A benchmark measuring performance of out-of-process iframes. 
  page_cycler_v2.basic_oopif                 A benchmark measuring performance of the out-of-process iframes page
  page_cycler_v2.intl_ar_fa_he              Page load time for a variety of pages in Arabic, Farsi and Hebrew.
  page_cycler_v2.intl_es_fr_pt-BR           Page load time for a pages in Spanish, French and Brazilian Portuguese.
  page_cycler_v2.intl_hi_ru                 Page load time benchmark for a variety of pages in Hindi and Russian.
  page_cycler_v2.intl_ja_zh                 Page load time benchmark for a variety of pages in Japanese and Chinese.
  page_cycler_v2.intl_ko_th_vi              Page load time for a variety of pages in Korean, Thai and Vietnamese.
  page_cycler_v2.top_10_mobile              Page load time benchmark for the top 10 mobile web pages.
  page_cycler_v2.tough_layout_cases         Page loading for the slowest layouts observed in the Alexa top 1 million.
  page_cycler_v2.typical_25                 Page load time benchmark for a 25 typical web pages.
  page_cycler_v2_site_isolation.basic_oopif  A benchmark measuring performance of out-of-process iframes. 
Pass --browser to list benchmarks for another browser.

(INFO) 2016-08-02 20:28:07,695 atexit_with_log._wrapped_function:10  Try running <function _ListAllSubprocesses at 0x7f7147390a28>
(INFO) 2016-08-02 20:28:07,702 atexit_with_log._wrapped_function:12  Did run <function _ListAllSubprocesses at 0x7f7147390a28>


actually, the revert DOES seem to work.
The timing does not match but builder 8949 passes:
https://uberchromegw.corp.google.com/i/chromeos/builders/lumpy-chrome-pfq/builds/8949
Builder 8948 should have passed too but it failed with the error on #24. 
I don't understand why 8948 did not pass but we seem to be ok now after the revert.

The PFQ does not build from the chrome ToT, it uses a tag.

8948 built with chrome tag 54.0.2816.4 
8949 built with chrome tag 54.0.2817.0 

So if the revert occurred between those two tags, then the result is expected.

(The tag can be found looking at the SyncChrome stage)

I did not know the PFQ was not picking up from Chrome ToT. That should explain it.

Cc: cylee@chromium.org
I have root caused the failure: it is because the launch command of Chrome on CrOS needs proper setup of --trace-config-file=xxx in telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py. This part of fix is easy, as we just need to add the following to the end of GetBrowserStartupArgs like the code in desktop_browser_backend.py:

    trace_config_file = (self.platform_backend.tracing_controller_backend
                         .GetChromeTraceConfigFile())
    if trace_config_file:
      args.append('--trace-config-file=%s' % trace_config_file)

The more difficult part is how to put the chrome-trace-config.json on the DUT. Since we can run Telemetry directly on the DUT or from the workstation, we have to support both ways and use the proper utility to create the trace config file.
bccheng: I think your fix is right, but it's meant to apply to cros_browser_backend?

https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py
Yes the --trace-config-file part will be added to cros_browser_backend. Do you have any suggestion on where to find similar utilities to write files on the DUT like what it does for Android?

https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py#L248
Ben: We push unpackaged extensions to the DUT like so:
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?l=32-40

cros_interface.PushFile correctly handles both the remote and local case, so maybe you can leverage that?
Thanks Ned and Achuith!
CL uploaded for review: https://codereview.chromium.org/2223963003/
Blocking: 611329
Project Member

Comment 36 by bugdroid1@chromium.org, Aug 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/14483d2d05a7b823e00485b92263a6b36ff25d90

commit 14483d2d05a7b823e00485b92263a6b36ff25d90
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Wed Aug 10 21:06:10 2016

Roll src/third_party/catapult/ 28adad6ee..49d354d56 (6 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/28adad6ee466..49d354d564ab

$ git log 28adad6ee..49d354d56 --date=short --no-merges --format='%ad %ae %s'

BUG= 632815 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2230253003
Cr-Commit-Position: refs/heads/master@{#411145}

[modify] https://crrev.com/14483d2d05a7b823e00485b92263a6b36ff25d90/DEPS

Ben: is this bug fixed?
Status: Fixed (was: Started)
Yes fixed.
Labels: VerifyIn-54
Labels: VerifyIn-55

Comment 41 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56

Comment 42 by dchan@google.com, Jan 21 2017

Labels: VerifyIn-57

Comment 43 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 44 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 45 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 47 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment