Issue metadata
Sign in to add a comment
|
16.1% regression in v8.infinite_scroll_tbmv2 at 411850:411870 |
||||||||||||||||||||
Issue descriptionDid the discourse page change?
,
Aug 15 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9004219612618019024
,
Aug 16 2016
===== BISECT JOB RESULTS ===== Status: completed === Bisection aborted === The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression. Please contact the the team (see below) if you believe this is in error. === Warnings === The following warnings were raised by the bisect job: * Bisect failed to reproduce the regression with enough confidence. ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@411849 76145191 6817540 12 good chromium@411870 75727971 7010140 18 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max/memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max Relative Change: 6.59% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2301 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9004219612618019024 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5228703013928960 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 16 2016
fmeawad, petr, does this look like a real regression in the discourse page for v8 memory?
,
Aug 16 2016
This looks like a real regression to me, I have also checked the v8-rolls and nothing there is a red flag. The catapult roll does not change the metric, also both of them are still running yosemite (some bots migrated to elcapitain recently) Adding Ulan.
,
Aug 17 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9004076830297826768
,
Aug 17 2016
Yes, it looks like a genuine regression in the "discourse" story. I kicked off a bisect on that single story.
,
Aug 17 2016
,
Aug 17 2016
===== BISECT JOB RESULTS ===== Status: completed === Bisection aborted === The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression. Please contact the the team (see below) if you believe this is in error. === Warnings === The following warnings were raised by the bisect job: * Bisect failed to reproduce the regression with enough confidence. ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@411849 53264384 19541121 12 good chromium@411870 57346341 24853368 14 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max/tumblr Relative Change: 1.57% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2312 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9004076830297826768 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5910730331652096 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 17 2016
,
Aug 19 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9003908029181022976
,
Aug 19 2016
Running a wider bisect ↑
,
Aug 19 2016
===== BISECT JOB RESULTS ===== Status: completed === Bisection aborted === The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression. Please contact the the team (see below) if you believe this is in error. === Warnings === The following warnings were raised by the bisect job: * Bisect failed to reproduce the regression with enough confidence. ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@411737 44788395 14600410 12 good chromium@411870 47468089 14103963 18 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_avg/tumblr Relative Change: 6.51% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2314 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9003908029181022976 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5318588693479424 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 19 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9003886704654578432
,
Aug 19 2016
The bisect can't reproduce the regression at all (even though it's very clear). I kicked of another bisect (#14) to check if the bisect can represent recent values on the dashboard.
,
Aug 19 2016
===== BISECT JOB RESULTS ===== Status: completed === Bisection aborted === The bisect was aborted because The metric values for the initial "good" and "bad" revisions do not represent a clear regression. Please contact the the team (see below) if you believe this is in error. === Warnings === The following warnings were raised by the bisect job: * Bisect failed to reproduce the regression with enough confidence. ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@412644 100887211 16740436 12 good chromium@412670 100013397 14323265 18 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max/tumblr Relative Change: 0.46% Score: 0 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2315 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9003886704654578432 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5906878551293952 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 22 2016
Prasad, Roberto, check comment #15.
,
Aug 22 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9003610503062281328
,
Aug 23 2016
=== Auto-CCing suspected CL author csharrison@chromium.org === Hi csharrison@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Add testing configs for ParseHTMLOnMainThread experiment Author : csharrison Commit description: BUG=623165 Review-Url: https://codereview.chromium.org/2221193002 Cr-Commit-Position: refs/heads/master@{#411880} Commit : 46be1b831ffec878df1b258a4f26872451d7795e Date : Sat Aug 13 06:18:55 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@411849 69196914 25602722 18 good chromium@411874 53439147 17197683 12 good chromium@411878 77863481 26772025 12 good chromium@411879 52477952 3852714 5 good chromium@411880 106479616 8606225 12 bad <-- chromium@411881 100974592 26802496 8 bad chromium@411887 105955328 9408012 5 bad chromium@411899 110359347 468937 5 bad chromium@411949 110411776 741455 8 bad chromium@412048 106165043 9506832 5 bad chromium@412247 97217195 11166806 18 bad chromium@412644 103945557 8628939 12 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max/tumblr Relative Change: 34.45% Score: 99.5 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2321 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9003610503062281328 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5899454129897472 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 23 2016
csharrison: I assume your patch (r411880) only modifies testing code that doesn't run in production, right?
,
Aug 23 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9003543458991093376
,
Aug 23 2016
No, it made the bots use my experimental feature. It is reasonable that that CL caused a regression, but I'm not sure how it could cause a memory regression (overall). The experiment pulls the background parser into the main thread (from the background thread). Do the memory metrics fully account for off-thread memory allocations? +kouhei for any ideas.
,
Aug 23 2016
I see the change in the effective_size metric of the "discourse" story. The story seem to have two modes, and the CL has fixed the mode to the "regression" mode. This is totally possible if the memory consumption depends on particular task scheduling, as the patch changes how parser tasks are scheduled. However, I think it is questionable if we should treat this as a regression.
,
Aug 24 2016
===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Add testing configs for ParseHTMLOnMainThread experiment Author : csharrison Commit description: BUG=623165 Review-Url: https://codereview.chromium.org/2221193002 Cr-Commit-Position: refs/heads/master@{#411880} Commit : 46be1b831ffec878df1b258a4f26872451d7795e Date : Sat Aug 13 06:18:55 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@411849 72745543 3915742 8 good chromium@411874 74792122 4994849 12 good chromium@411878 75705528 5481875 18 good chromium@411879 75785587 6499378 18 good chromium@411880 82364787 1858700 12 bad <-- chromium@411881 83434326 1106590 12 bad chromium@411887 81840653 3043056 8 bad chromium@411899 83688350 778709 5 bad chromium@411949 82260598 2075426 5 bad chromium@412048 81108116 4755766 8 bad chromium@412247 83686103 3397496 5 bad chromium@412644 82386654 1778358 5 bad Bisect job ran on: mac_10_10_perf_bisect Bug ID: 638012 Test Command: src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --also-run-disabled-tests v8.infinite_scroll_tbmv2 Test Metric: memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max/memory:chrome:all_processes:reported_by_chrome:v8:effective_size_max Relative Change: 14.12% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/mac_10_10_perf_bisect/builds/2325 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9003543458991093376 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=6469746598346752 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Aug 24 2016
Another bisect confirmed that r411880 is the most likely culprit for a +6.3 MiB in the maximum effective size of V8. I found that the following V8 values regressed (https://chromeperf.appspot.com/report?sid=3bc3861624431439e2deafdd55e0326d57be182da9baac2bd426ecf6331de593&rev=411870): memory:chrome:all_processes:reported_by_chrome:v8:effective_size_avg memory:chrome:all_processes:reported_by_chrome:v8:allocated_objects_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:effective_size_avg memory:chrome:all_processes:reported_by_chrome:v8:heap:allocated_objects_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:old_space:effective_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:old_space:allocated_objects_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:new_space:effective_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:map_space:effective_size_max memory:chrome:all_processes:reported_by_chrome:v8:heap:map_space:allocated_objects_size_max csharrison,ulan: I leave it up to you to decide if this is a genuine regression and what should be done.
,
Aug 24 2016
Note that the page that actually regressed is tumblr (https://github.com/catapult-project/catapult/issues/2694).
,
Sep 9 2016
Charles, Ulan, Any updates you can share on this bug?
,
Sep 12 2016
I plan on investigating this soon, just haven't had the time. I consider investigating this a blocker to launching the experiment. Two TODOs on my end: 1. Repro locally and observe traces seeing what exactly is happening here on both versions 2. Check out V8 memory size for the different experiment variations.
,
Sep 14 2016
I couldn't repro this locally. I started a try job on mac here: https://codereview.chromium.org/2335313005
,
Sep 23 2016
Ping Chris: any update from your tryjob?
,
Sep 23 2016
Here are two tries that worked: http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-09-14_18-58-31 http://storage.googleapis.com/chromium-telemetry/html-results/results-2016-09-14_18-35-33 For one of these I see that my revert caused a gain in effective_size_max and the other I see a loss. Same with allocated_objects_size_max. tumblr doesn't always look to be the odd page out.
,
Oct 5 2016
,
Oct 5 2016
,
Oct 12 2016
Chris, are you the right owner to continue driving this? What are the next steps here?
,
Oct 12 2016
I think you mean me (I'm Charlie :), and yeah I'm the correct driver for this. I'll try to get to it this week. Sorry for the delays.
,
Oct 12 2016
I'm inclined to WontFix this. I can't reproduce the tumblr regression in perf jobs. I see a regression in flickr but I checked the traces and it seems like we're doing a *lot* more js execution in the single-threaded patch. I don't exactly know what to make of that but it doesn't really seem like the parser's fault. I also tested this locally to see what it looks like on Linux and the tumblr test scrolled to different positions with and without the patch. Do we have a metric for "amount scrolled" for these infinite scrolling cases? I imagine a patch that causes the page to scroll more before timing out (after 70s it looks like) would look like a memory regression.
,
Oct 12 2016
I forgot to mention: this would be a bit easier to analyze if we could specify which tracing categories to apply to the bisect. Right now the traces are pretty bare.
,
Oct 13 2016
Charlie, does the ParseHTMLOnMainThread experiment move parsing work to the main thread? If so, then maybe there is less idle time available for garbage collection, which causes the regression. Note that the memory graphs recovered (and improved) because we landed GC optimizations recently.
,
Oct 13 2016
You're correct we're moving parsing work to the main thread, and it could conceivably result in less idle time for GC, etc. It is really hard to see from traces because we trace so few categories for this metric. The one thing I could tell was that with my experiment on, the flickr page did a lot more v8 work: Experiment turned off (with patch): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2016-10-12_15-07-29-78148.html Experiment on (without patch): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2016-10-12_16-01-10-64975.html The parsing tasks can be seen when the experiment is turned off. It looks like they take up something like 7ms when they're on the off thread. It is very hard to see them when they are mixed in with the main thread though.
,
Dec 3 2016
+rmcilroy as it seems there are some questions about the efficacy of this specific metric in the last few comments.
,
Dec 5 2016
I'm travelling right now so can't look closely. If I read the comments right this isn't a question of the efficacy of the metric, but a question of whether the regression is a side effect of moving more work to the main thread? I'm not sure I can provide much input on how to (or whether we need to) address this metric. +Hannes in case he has any thoughts.
,
Dec 5 2016
I think it's both. In investigating issue 663032 I learned how to selectively enable trace categories for certain tbms so I could try doing that with this one locally and see why we're seeing differences. It would be great to be able to do that from the perf dashboards so I could automatically get those traces on the bots. Do the infinite scroll metrics have any metrics related to how much was scrolled before the 70s timeout?
,
Aug 16 2017
Looks like we haven't made progress on this bug in 8 months. Should we wontfix? We did add the ability to enable trace categories from the perf dashboard, but unfortunately it won't work for regressions this old.
,
Aug 16 2017
Yeah, let's WontFix. Sorry, this regression was very difficult for me to understand since the work we are moving to the main thread was so minimal (7ms out of 70 seconds) and completely unrelated to Javascript or V8. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by benjhayden@chromium.org
, Aug 15 2016