Issue metadata
Sign in to add a comment
|
3.3% regression in system_health.memory_desktop at 545505:545650 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Mar 26 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/13dfd15d440000
,
Mar 26 2018
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/13dfd15d440000 RDH: Ensure there's at most one LoadInfoList in flight. by mmenke@chromium.org https://chromium.googlesource.com/chromium/src/+/d03adf5a2c53f00c4663a93917a9fb0a53f082ad Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Mar 26 2018
Given that the CL was reverted, and the metric hasn't recovered (https://chromium-review.googlesource.com/c/chromium/src/+/979674 / 545616 is the revert, but the regression graph shows it continues into 545729), the bisect is wrong. Or the benchmark isn't measuring what it thinks it is measuring. Punting into untriaged bin.
,
Mar 26 2018
The commit position I see on the revert is r545741, and the graphs are still at 545729. Adding a next action item for myself to check the graphs tomorrow, but the bisect looks pretty clear.
,
Mar 26 2018
Oops, 545741 is the revert. Too many numbers. Re-assigning to myself. Too many numbers.
,
Mar 26 2018
So the benchmark calls itself " memory:chrome:all_processes:reported_by_chrome:malloc:allocated_objects_size_avg". But looking at the Total line under malloc, I see 55.8 MiB in the trace from before, and 63.8 after (Difference is mostly in the renderer...Which is curious, since this CL doesn't touch the renderer). The benchmark graph shows about 252 MB total size. Am I looking at the right thing?
,
Mar 26 2018
+benchmark owner erikchen, can you help with mmenke's questions?
,
Mar 26 2018
The numbers in the overview report "discounted" malloc usage - "effective_size", which discounts memory htat is being allocated from malloc that is included in other MDPs. allocated_objects_size includes everything - that would be the "size" column. In this case, both "allocated_objects_size" and "effective_size" increase by 9MB in the renderer process. Given that this is a WPR test, I'd guess that the original change is a networking-related change that happens to change the resources that get loaded by WPR, thus causing a difference in renderer memory usage. sullivan: Do we have a list of false-positives we suspect are caused by WPR non-determinism? Anecdotally, I feel like I see this a lot - it might be good to quantify this.
,
Mar 26 2018
The change was rate limiting the data we send to the WebContents that displays the little status bar in the lower-left (The data object was pretty big, and this was OOMing the browser when the UI thread was blocked. csharrison already reduced the size of the object, and my CL just made sure there was only one status update message at a time). I'm not sure why fewer/different status updates would make the renderer consumer more memory, but since this is renderer memory, but I think that's what we're seeing (If it were a browser process memory change, I could look for leaks, but in the renderer, this couldn't have created a new leak). WontFixing this, per above analysis. If either of you disagrees, please say so, and we can figure out how to proceed from there.
,
Mar 26 2018
+Ned, Juan, has either of you looked at regressions caused by WPR non-determinism? I remember something similar with mojo loading in August, see Times of India investigation here: https://docs.google.com/document/d/1wissv31WUjeHaSaf78uAH0FiFQfSU6sqv14Lcuo1toA/edit#heading=h.voebn3wvz3na But I don't see it that often on bugs I'm cc-ed on. Ned, Juan, any thoughts?
,
Mar 27 2018
The NextAction date has arrived: 2018-03-27
,
Mar 27 2018
#11:No, I haven't looked into regressions caused by WPR non-determinism at scale. It seems hard to capture in theory. I also think it's hard to distinguish those from legit regressions caused by changing the ordering of network loads & how resources are fetched. The best tool we have is tracing to look at what going on & manually evaluate whether the regression is expected.
,
Mar 27 2018
Theoretically, we could look at which WPR resources get served [and how many times] for each telemetry page run, and then compare that between page runs. This would give us an up-front warning when different resources are getting served.
,
Mar 27 2018
Erik: that's a nice proposal. Probably something like a WPR trace.
,
Mar 27 2018
#14: I filed issue 826309 to track the wpr tracing. Can't promise when we can staff that project but it's a good direction moving forward for wpr
,
Mar 27 2018
Thanks! I feel like I see issues of this type quite frequently [maybe observation bias due to being owner of memory benchmarks] - I'll make sure to add more examples when I run across them. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Mar 26 2018