Issue metadata
Sign in to add a comment
|
168.4% regression in system_health.memory_mobile at 510791:510889 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Oct 26 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8964676780520069536
,
Oct 26 2017
=== Auto-CCing suspected CL author shivanisha@chromium.org === Hi shivanisha@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Shivani Sharma Commit : c18f976cdcb74086ce6bab2d2574718166c53517 Date : Mon Oct 23 16:43:23 2017 Subject: Integrate HttpCache::Writers with HttpCache and HttpCache::Transaction layers. Bisect Details Configuration: android_nexus5X_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg/long_running_tools/long_running_tools_gmail-foreground Change : 168.00% | 1248312.0 -> 3345464.0 Revision Result N chromium@510790 1248312 +- 0.0 6 good chromium@510815 1248312 +- 0.0 9 good chromium@510817 1481329 +- 1977214 9 good chromium@510818 1248312 +- 0.0 6 good chromium@510819 3112447 +- 1977214 9 bad <-- chromium@510822 3345464 +- 0.0 6 bad chromium@510828 3345464 +- 0.0 6 bad chromium@510840 3345464 +- 0.0 6 bad chromium@510889 3345464 +- 0.0 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=long.running.tools.gmail.foreground system_health.memory_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8964676780520069536 For feedback, file a bug with component Speed>Bisection
,
Jan 5 2018
shivanisha, any update here? Bisect is reproducing a 1MiB memory regression on android.
,
Jan 5 2018
Looking into this.
,
Jan 5 2018
It seems unlikely for this CL to be the cause of the regression. This CL is not a functionality change since it is for the reduced case of at most 1 transaction writing to the cache which was already happening. It does add a new object HttpCache::Writers for every URLRequest but it's a lightweight object so shouldn't be the root cause.
,
Jan 5 2018
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8958199865643720528
,
Jan 5 2018
Starting a bisect again: https://chromeperf.appspot.com/buildbucket_job_status/8958199865643720528
,
Jan 6 2018
=== Auto-CCing suspected CL author shivanisha@chromium.org === Hi shivanisha@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Shivani Sharma Commit : c18f976cdcb74086ce6bab2d2574718166c53517 Date : Mon Oct 23 16:43:23 2017 Subject: Integrate HttpCache::Writers with HttpCache and HttpCache::Transaction layers. Bisect Details Configuration: android_nexus5X_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg/long_running_tools/long_running_tools_gmail-foreground Change : 168.00% | 1248312.0 -> 3345464.0 Revision Result N chromium@510790 1248312 +- 0.0 6 good chromium@510815 1248312 +- 0.0 6 good chromium@510817 1248312 +- 0.0 6 good chromium@510818 1248312 +- 0.0 6 good chromium@510819 3345464 +- 0.0 6 bad <-- chromium@510822 3345464 +- 0.0 6 bad chromium@510828 2995939 +- 1914429 6 bad chromium@510840 2995939 +- 1914429 6 bad chromium@510889 3345464 +- 0.0 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=long.running.tools.gmail.foreground system_health.memory_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8958199865643720528 For feedback, file a bug with component Speed>Bisection
,
Jan 8 2018
Those are two rock-solid bisects pointing to the same CL. I would recommend to look at and compare a couple of traces before: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2017-10-23_10-47-57_36721.html after: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2017-10-23_21-07-24_69612.html In the renderer process, looking at the last memory dump, the "atfer" gpu memory shows an extra 2MiB. Maybe those traces can figure out the source of the regression?
,
Jan 8 2018
Compared the "before" and "after" runs and looking at the renderer process's gpu memory. For "after" it shows 2.1 MiB starting from the very first memory dump till the last so its not a gradual increase (suggesting a leak) or an abrupt increase. Moreover, the CL pointed to by the bisect is not a functionality change and all changes completely reside in the browser process so not sure how this could be related. Adding erikchen and primiano as per perezju@'s suggestion. erikchen@, primiano@, Could you PTAL to help debug further , Thanks!
,
Jan 8 2018
attachign before and after screenshots. ericrk, can you help with interpretation? does "free_size" mean that there "real" increase is 0?
,
Jan 8 2018
"Free size" in this case just indicates that the buffer is available for immediate use, not that it's not using up space, so I think this is a real regression. Looking at this, it appears that something has caused us to create a media or raster command buffer, where previously we didn't have one. Things that might cause this are: longer load times resulting in a spinner / progress bar being displayed, etc... so maybe something not directly related to the GPU could have tripped this. On the other hand, this could be a timing issue. Do you know at what point these memory dumps are taken? Is it after a low memory signal etc... or just at periodic intervals? I might suspect a timing change that causes us to grab a dump at a different point from before... however, the consistency with which this extra buffer appears makes me doubt this a bit.
,
Jan 9 2018
> Do you know at what point these memory dumps are taken? At periodic intervals. On the "after" trace all memory dumps (from the very first to up to the last one) show the regressed value of ~ 2.1MiB
,
Jan 18 2018
Ran the test locally and both before and after memory on renderer gpu shows as 2.1MB. Here are the traces: Before: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2018-01-18_16-53-51_79436.html Revision checked out: 7b3d579f00d83dd6403cf8cb2883e1658a1cabcb After: https://storage.cloud.google.com/chrome-telemetry-output/long_running_tools_gmail_foreground_2018-01-18_16-35-43_3423.html Revision checked out: c18f976cdcb74086ce6bab2d2574718166c53517
,
Jan 19 2018
Marking as wontfix as the memory regression cannot be reproduced locally on an android device. (as per comment #15)
,
Jan 19 2018
I don't think being unable to reproduce locally is a reason for WontFix-ing. Both dashboards and bisects *are* reproducing a pretty clear and large regression. Which kind of device did you use? Note the regression was detected on a N5X. Also have a look at go/telemetry-device-setup, to set up your device as similar as possible to what the bots on the waterfall do. Alternatively, if you don't have access to the right kind of device, you can run a try-job.
,
Jan 19 2018
Hi perezju, I spoke with shivanisha during this process. Her change is unrelated to the GPU-stack - it's a network stack change. If you see the traces she uploaded locally, both of them have ~2.1MB of GPU buffer in the renderer process, which matches the GPU buffer size in the renderer process on the Nexus 5X post change. This suggests that this is a timing issue related to how network requests get loaded by WPR on the Nexus 5X device. Looking at the Nexus 5X trace pre-change, there's only 0.1MB reported by the GPU MDP in the renderer process. This is quite suspicious, as we'd expect the command buffer + some textures to occupy much more than that, again, that points at loading/timing issues. While it would be possible to further debug this issue using try-jobs, the marginal utility of doing so what appears to be an unrelated change appears low. In general, we don't have good mechanisms of noticing/debugging subtle loading/timing changes in WPR pages.
,
Jan 19 2018
That sounds good to me. Thanks for the detailed explanation. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Oct 26 2017