New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 813168 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug

Blocked on:
issue 820077



Sign in to add a comment

Make it easy to load perf regression testcases locally for debugging

Project Member Reported by chrishtr@chromium.org, Feb 16 2018

Issue description

Right now, it is very hard to figure out how to load testcases reported on chromeperf.appspot.com. See "Details..." section below for the paint I
had to go through.

I think the right flow should be something like:
[existing] 1. Receive bug that a regression happened.

[existing] 2. Click on bug to open, click from bug to chromeperf.appspot.com, see regression graphs.

[new] 3. Click on 'local repro commandlines' button in UI. This will spit out two commandlines, one to
start a local server that serves the content, and the second a commandline to start a local chrome
that sets the cert override magic and loads the page, including the de-obfuscation step from (c) below.

This way the only work to do for a developer is to cut and paste two strings into terminal windows
in a Chromium checkout, and build Chrome. I think this will result in a lot more people being willing
to look at regressions.


Another thing that should be done, IMO:

* Add a script at tools/perf/load_wprgo that always works. Running it with no args will say that there
should be one arg: the location of a wprgo file. When you run it with the wprgo file, it will start the
local server (i.e. equivalent of step (a) below), and *also* print out the exact commandline to run a
local Chrome with the cert overrides (i.e. the other URL specified at
https://github.com/catapult-project/catapult/blob/master/web_page_replay_go/README.md#replay-mode)


*******************

Details of what I had to do to make it work I had to:

a. Figure out how to load data is in wprgo files. The instructions at https://github.com/catapult-project/catapult/blob/master/web_page_replay_go/README.md#replay-mode were incomplete, I had to ask Ned
for lots of details, leading to a commandline such as:

./third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/wpr replay
  tools/perf/page_sets/data/key_silk_cases_010.wprgo --http_port=8080
  --https_key_file=third_party/catapult/web_page_replay_go/wpr_key.pem
  --https_cert_file=third_party/catapult/web_page_replay_go/wpr_cert.pem
  --inject_scripts=third_party/catapult/web_page_replay_go/deterministic.js

b. Find the right wpr go files for various testcases. I was able to find the wprgo via code search
with the name of the testcase, in all cases except for pathological_mobile_sites_000.wprgo. That file
is not in my checkout, still not sure where to find it.

c. De-obfuscate the URLs, guessing what character to put for underscores. Is it a slash? A question mark?

Example #1: frame_times/css_value_type_transform_complex.html?api_css_animations_N_0316
Example #2: memory:webview:all_processes:reported_by_chrome:cc:effective_size_avg/background/after_http_yandex_ru_touchsearch_text_science

d. When running the testcases via run_benchmark, I had to hack the story runner to force it to run a test
which was not deemed compatible with my Linux desktop (mobile site). It should be easy to run such a test
with an override parameter.
 
Cc: simonhatch@chromium.org dtu@chromium.org
+dtu, simonhatch: some examples of why it's important to be able to run locally even if the regression doesn't reproduce on your workstation.

Look at some of chrishtr's recent comments to see why it's so important to be able to look at the actual pages:
https://bugs.chromium.org/p/chromium/issues/detail?id=811608#c9
https://bugs.chromium.org/p/chromium/issues/detail?id=811468#c4
https://bugs.chromium.org/p/chromium/issues/detail?id=811449#c4

Chris, looking at the details you posted here and the comments in the perf bug, it seems like understanding/inspecting the contents of the web page was a big part of your use case, and would have been useful regardless if you were able to actually reproduce the regression locally?

Chris: may I know why you want to start your own wprgo server & load the page manually?

Usually people just re run the benchmark command with "--story-filter=css_value_type_transform_complex" to reproduce the regression locally. It's also recommended to use the same class of device (mobile vs desktop) the test is run on to get accurate reproduction 
Re comment 1: understanding/inspecting the contents of the page was a big part of the use-case for sure.

Re comment 2: running the testcase in the benchmark command is not enough to debug root causes, or is
unnecessarily clumsy. It's often necessary to dig deep into the code, look at log output, and attach a
debugger. For this, running a custom chrome via well-known methods like ./out/Debug/chrome is extremely useful.

Regarding using the same class of device: to reproduce the actual values of the perf job, it's true that
running on the same class of device is necessary. However, it's in a lot of cases not that hard to figure
out what might have caused it even when running on a different device, because the platform hardware
is not the real cause. Examples:
 a. often it's a regression on all platforms, but only noticeable in the perf job on one platform.
 b/ often the regression has to do with a couple of platform parameters which we know about and can tweak (e.g.
prefer-compositing-to-lcd-text, ganesh mode) in local builds
 c. DevTools emulation can help you figure out the way the page is rendering.

The above three items were sufficient for me to debug the regressions mentioned in comment 1. Nevertheless
I am quite confident in my conclusions, because I know a lot about the specifics of how Chrome functions
on various configs/platforms with given content. It's always the case that we as developers have to use some
heuristics to judge how deep to go on "verifying" that perf regressions are acceptable/expected tradeoffs.

Finally, I didn't actually run any tests on Android. One reason I did not is that it's a lot harder to read
debug output and iterate on possible solutions on Android. This is yet another reason to make it easier to
load a test site on your local desktop.
Cc: perezju@chromium.org sullivan@chromium.org
Owner: ----
Status: Available (was: Untriaged)
#3: Running a Chrome with an attached debugger is a legit use case. 

I think we can add a  --serving-test-content-only command to Telemetry's run_benchmark command that would only launch the wpr/local server. E.g:

./tools/perf/run_benchmark smoothness.pathological_mobile_sites --story-filter=css_value_type_transform_complex --serving-test-content-only 

Upon executing this command, it would also print out what flags you should start Chrome with to connect with the local servers we have launched.

+Chris: do you think this would make it easier for you?
Components: -Tests>Telemetry Speed>Telemetry
I really like the idea in #4: minor tweak: could the command line look like:

./tools/perf/run_benchmark smoothness.pathological_mobile_sites --serving-test-content-only 

And then the output has both the command to run chrome, and the list of urls for stories?
#6: that's totally doable. Cases like multi-tab cases might need a bit more thinking but the general cases should be simple.
Sounds good to me. Bonus points if there is a consistent and easy-to-remember way to
start Chrome?  e.g. "./out/Release/chrome --perf-mode" as an alias for

./out/Release/chrome
 --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost"
 --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=

Also, there needs to be a flag to force a testcase to load even if it's not supposed to
run on that platform. Can print out a warning to this effect.

Re comment 6: this is necessary actually, since it solves the problem of (c): de-obfuscation.
For whoever have bandwidth to take this, some starting point are:

1) Starting the network is done in the shared_state classes. The one that 90% of Telemetry test using is shared_page_state.

It starts the local network server in: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=290

Starts the wprgo server in: 
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=230

Some refactoring to make those two in the same place would be nice.

2) One way to implement this is pause right after launching the servers with s.t like:

<launch local file server/wpr replay>
raw_input("<print relevant info>")

3) The required chrome command line to hook up with those server conveniently can be figured out with chrome_startup_args.GetReplayArgs (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_startup_args.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=92 - thanks to Juan)

4) Print out the page's URL:
For wprgo server, print out page.url should just work.
For local file server, the correct URL need to be augmented with the local server's port. One just need to call http_server.UrlOf(page.file_path_url)

5) Where should  --serving-test-content-only flag be added? While the flag space is still messy, I would just add it to right after https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=55 for now.

6) What about browser option?
When --browser option flag is not specified, Telemetry has some convoluted logic to find the default browser. If one is unlucky and the default browser happen to be Android, the browser could end up being an Android browser, which means Telemetry will fired off forwarder. 

To keep it easy, one may want to bypass the step of determining the possible browser & just use platform.GetHostPlatform() so they have access to the platform object (which is required for API calls like platform.network_controller...).

This would mean that this will not work for the case of manually launch Android mobile browser against server started by Telemetry but that's probably not a popular usecase.
Just wanted to register that this happened again to me today. I followed the same set of steps
as in comment #0, but was for some reason unable to load the page under test.
Juan: can you help Chris with running the wprgo server & Chrome directly while I am gone? 
Sure, I'm glad to help. Chris, what was the problem you encountered this time?
I am trying to load https://www.google.com/search?hl=en&q=define%3Aboogie from key_silk_cases_017.wprgo. I can't get it to work with similar techniques
to what I posted in comment 0.
Right, not exactly sure what happens, sounds like when trying to launch the browser chrome makes some requests to google.com, so later then you try to load the URL the state of the WPR replay is borked. It's probably missing some flags.

Anyway, I found this the easiest to do:

- Replace the condition on this if branch to True:
  https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=6fbfa7cb20f44fdf0a0336137a33febc44f6fd84&l=312

- Then run, e.g.:

  $ ./tools/perf/run_benchmark smoothness.key_silk_cases --story-filter boogie --browser beta --pause before-run-story

This will set up wpr server for you, launch Chrome with the correct flags, and stop *just* before loading the page.

You can then manually navigate to https://www.google.com/search?hl=en&q=define%3Aboogie which this time should work.

Let me know if that helps.
I've filed issue 820077 to provide a flag so you don't have to do my suggested first hacky step above.
The instructions in comment 14 worked well, thanks!

I didn't know about --pause before-run-story. The old telemetry code
a few years ago supported it, then I had though it was removed. Glad to
see it back.
Blockedon: 820077
Owner: perezju@chromium.org
Status: Assigned (was: Available)
Assigning to self and blocking on issue 820077.

After that is resolved, perhaps only missing would be to update and describe somewhere in the docs these techniques from #14 for local debugging.

Anything else you think might be needed?
tools/perf/run_tests also has this problem, but does not have the --pause argument.
chrishtr@, the control flow in unittest varies a lot, so it doesn't make sense to add one "--pause" argument for all of them. I recommend just add pdb breakpoint for those
I hit  issue 18  again today. I was able to hack it by checking _AllowInteractionForStage
against a pretend argument. I still think it is useful to add --pause.

I'm also trying to figure out how to save off a copy of the next website I'm debugging.

Commandline:

tools/perf/run_tests --browser=release SystemHealthBenchmarkSmokeTest.system_health.memory_mobile/browse:shopping:amazon

(hacked to stop before story starts)

I then navigate to http://www.amazon.in/s/?field-keywords=Mobile and it displays
the URL as expected, but saving the URL via Chrome UI downloads a different page (a product
detail for one of the search results).

I need the raw HTML of the page, so I can edit to create a reduced testcase for a crash
reproduced at http://www.amazon.in/s/?field-keywords=Mobile on this test (but not the
live version of the site).

The test seems to be starting chrome with a custom --user-data-dir and a proxy, but then I can't install the
extensions I usually use to save off pages.
Cc: cbruni@chromium.org u...@chromium.org mythria@chromium.org

Comment 23 by benhenry@google.com, Jan 16 (6 days ago)

Components: Test>Telemetry

Comment 24 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Speed>Telemetry

Sign in to add a comment