Make it easy to load perf regression testcases locally for debugging |
|||||||
Issue descriptionRight now, it is very hard to figure out how to load testcases reported on chromeperf.appspot.com. See "Details..." section below for the paint I had to go through. I think the right flow should be something like: [existing] 1. Receive bug that a regression happened. [existing] 2. Click on bug to open, click from bug to chromeperf.appspot.com, see regression graphs. [new] 3. Click on 'local repro commandlines' button in UI. This will spit out two commandlines, one to start a local server that serves the content, and the second a commandline to start a local chrome that sets the cert override magic and loads the page, including the de-obfuscation step from (c) below. This way the only work to do for a developer is to cut and paste two strings into terminal windows in a Chromium checkout, and build Chrome. I think this will result in a lot more people being willing to look at regressions. Another thing that should be done, IMO: * Add a script at tools/perf/load_wprgo that always works. Running it with no args will say that there should be one arg: the location of a wprgo file. When you run it with the wprgo file, it will start the local server (i.e. equivalent of step (a) below), and *also* print out the exact commandline to run a local Chrome with the cert overrides (i.e. the other URL specified at https://github.com/catapult-project/catapult/blob/master/web_page_replay_go/README.md#replay-mode) ******************* Details of what I had to do to make it work I had to: a. Figure out how to load data is in wprgo files. The instructions at https://github.com/catapult-project/catapult/blob/master/web_page_replay_go/README.md#replay-mode were incomplete, I had to ask Ned for lots of details, leading to a commandline such as: ./third_party/catapult/telemetry/telemetry/internal/bin/mac/x86_64/wpr replay tools/perf/page_sets/data/key_silk_cases_010.wprgo --http_port=8080 --https_key_file=third_party/catapult/web_page_replay_go/wpr_key.pem --https_cert_file=third_party/catapult/web_page_replay_go/wpr_cert.pem --inject_scripts=third_party/catapult/web_page_replay_go/deterministic.js b. Find the right wpr go files for various testcases. I was able to find the wprgo via code search with the name of the testcase, in all cases except for pathological_mobile_sites_000.wprgo. That file is not in my checkout, still not sure where to find it. c. De-obfuscate the URLs, guessing what character to put for underscores. Is it a slash? A question mark? Example #1: frame_times/css_value_type_transform_complex.html?api_css_animations_N_0316 Example #2: memory:webview:all_processes:reported_by_chrome:cc:effective_size_avg/background/after_http_yandex_ru_touchsearch_text_science d. When running the testcases via run_benchmark, I had to hack the story runner to force it to run a test which was not deemed compatible with my Linux desktop (mobile site). It should be easy to run such a test with an override parameter.
,
Feb 16 2018
Chris: may I know why you want to start your own wprgo server & load the page manually? Usually people just re run the benchmark command with "--story-filter=css_value_type_transform_complex" to reproduce the regression locally. It's also recommended to use the same class of device (mobile vs desktop) the test is run on to get accurate reproduction
,
Feb 16 2018
Re comment 1: understanding/inspecting the contents of the page was a big part of the use-case for sure. Re comment 2: running the testcase in the benchmark command is not enough to debug root causes, or is unnecessarily clumsy. It's often necessary to dig deep into the code, look at log output, and attach a debugger. For this, running a custom chrome via well-known methods like ./out/Debug/chrome is extremely useful. Regarding using the same class of device: to reproduce the actual values of the perf job, it's true that running on the same class of device is necessary. However, it's in a lot of cases not that hard to figure out what might have caused it even when running on a different device, because the platform hardware is not the real cause. Examples: a. often it's a regression on all platforms, but only noticeable in the perf job on one platform. b/ often the regression has to do with a couple of platform parameters which we know about and can tweak (e.g. prefer-compositing-to-lcd-text, ganesh mode) in local builds c. DevTools emulation can help you figure out the way the page is rendering. The above three items were sufficient for me to debug the regressions mentioned in comment 1. Nevertheless I am quite confident in my conclusions, because I know a lot about the specifics of how Chrome functions on various configs/platforms with given content. It's always the case that we as developers have to use some heuristics to judge how deep to go on "verifying" that perf regressions are acceptable/expected tradeoffs. Finally, I didn't actually run any tests on Android. One reason I did not is that it's a lot harder to read debug output and iterate on possible solutions on Android. This is yet another reason to make it easier to load a test site on your local desktop.
,
Feb 16 2018
#3: Running a Chrome with an attached debugger is a legit use case. I think we can add a --serving-test-content-only command to Telemetry's run_benchmark command that would only launch the wpr/local server. E.g: ./tools/perf/run_benchmark smoothness.pathological_mobile_sites --story-filter=css_value_type_transform_complex --serving-test-content-only Upon executing this command, it would also print out what flags you should start Chrome with to connect with the local servers we have launched. +Chris: do you think this would make it easier for you?
,
Feb 16 2018
,
Feb 16 2018
I really like the idea in #4: minor tweak: could the command line look like: ./tools/perf/run_benchmark smoothness.pathological_mobile_sites --serving-test-content-only And then the output has both the command to run chrome, and the list of urls for stories?
,
Feb 16 2018
#6: that's totally doable. Cases like multi-tab cases might need a bit more thinking but the general cases should be simple.
,
Feb 16 2018
Sounds good to me. Bonus points if there is a consistent and easy-to-remember way to start Chrome? e.g. "./out/Release/chrome --perf-mode" as an alias for ./out/Release/chrome --host-resolver-rules="MAP *:80 127.0.0.1:8080,MAP *:443 127.0.0.1:8081,EXCLUDE localhost" --ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I= Also, there needs to be a flag to force a testcase to load even if it's not supposed to run on that platform. Can print out a warning to this effect. Re comment 6: this is necessary actually, since it solves the problem of (c): de-obfuscation.
,
Feb 16 2018
For whoever have bandwidth to take this, some starting point are: 1) Starting the network is done in the shared_state classes. The one that 90% of Telemetry test using is shared_page_state. It starts the local network server in: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=290 Starts the wprgo server in: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/shared_page_state.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=230 Some refactoring to make those two in the same place would be nice. 2) One way to implement this is pause right after launching the servers with s.t like: <launch local file server/wpr replay> raw_input("<print relevant info>") 3) The required chrome command line to hook up with those server conveniently can be figured out with chrome_startup_args.GetReplayArgs (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_startup_args.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=92 - thanks to Juan) 4) Print out the page's URL: For wprgo server, print out page.url should just work. For local file server, the correct URL need to be augmented with the local server's port. One just need to call http_server.UrlOf(page.file_path_url) 5) Where should --serving-test-content-only flag be added? While the flag space is still messy, I would just add it to right after https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=e7298f36f7912f2caa122086cfbe71734d04b73f&l=55 for now. 6) What about browser option? When --browser option flag is not specified, Telemetry has some convoluted logic to find the default browser. If one is unlucky and the default browser happen to be Android, the browser could end up being an Android browser, which means Telemetry will fired off forwarder. To keep it easy, one may want to bypass the step of determining the possible browser & just use platform.GetHostPlatform() so they have access to the platform object (which is required for API calls like platform.network_controller...). This would mean that this will not work for the case of manually launch Android mobile browser against server started by Telemetry but that's probably not a popular usecase.
,
Mar 6 2018
Just wanted to register that this happened again to me today. I followed the same set of steps as in comment #0, but was for some reason unable to load the page under test.
,
Mar 7 2018
Juan: can you help Chris with running the wprgo server & Chrome directly while I am gone?
,
Mar 7 2018
Sure, I'm glad to help. Chris, what was the problem you encountered this time?
,
Mar 7 2018
I am trying to load https://www.google.com/search?hl=en&q=define%3Aboogie from key_silk_cases_017.wprgo. I can't get it to work with similar techniques to what I posted in comment 0.
,
Mar 8 2018
Right, not exactly sure what happens, sounds like when trying to launch the browser chrome makes some requests to google.com, so later then you try to load the URL the state of the WPR replay is borked. It's probably missing some flags. Anyway, I found this the easiest to do: - Replace the condition on this if branch to True: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=6fbfa7cb20f44fdf0a0336137a33febc44f6fd84&l=312 - Then run, e.g.: $ ./tools/perf/run_benchmark smoothness.key_silk_cases --story-filter boogie --browser beta --pause before-run-story This will set up wpr server for you, launch Chrome with the correct flags, and stop *just* before loading the page. You can then manually navigate to https://www.google.com/search?hl=en&q=define%3Aboogie which this time should work. Let me know if that helps.
,
Mar 8 2018
I've filed issue 820077 to provide a flag so you don't have to do my suggested first hacky step above.
,
Mar 15 2018
The instructions in comment 14 worked well, thanks! I didn't know about --pause before-run-story. The old telemetry code a few years ago supported it, then I had though it was removed. Glad to see it back.
,
Mar 19 2018
Assigning to self and blocking on issue 820077. After that is resolved, perhaps only missing would be to update and describe somewhere in the docs these techniques from #14 for local debugging. Anything else you think might be needed?
,
Jun 25 2018
tools/perf/run_tests also has this problem, but does not have the --pause argument.
,
Jun 25 2018
chrishtr@, the control flow in unittest varies a lot, so it doesn't make sense to add one "--pause" argument for all of them. I recommend just add pdb breakpoint for those
,
Aug 2
I hit issue 18 again today. I was able to hack it by checking _AllowInteractionForStage against a pretend argument. I still think it is useful to add --pause. I'm also trying to figure out how to save off a copy of the next website I'm debugging. Commandline: tools/perf/run_tests --browser=release SystemHealthBenchmarkSmokeTest.system_health.memory_mobile/browse:shopping:amazon (hacked to stop before story starts) I then navigate to http://www.amazon.in/s/?field-keywords=Mobile and it displays the URL as expected, but saving the URL via Chrome UI downloads a different page (a product detail for one of the search results). I need the raw HTML of the page, so I can edit to create a reduced testcase for a crash reproduced at http://www.amazon.in/s/?field-keywords=Mobile on this test (but not the live version of the site).
,
Aug 2
The test seems to be starting chrome with a custom --user-data-dir and a proxy, but then I can't install the extensions I usually use to save off pages.
,
Aug 30
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by sullivan@chromium.org
, Feb 16 2018