New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 751896 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

chrome_child.dll!content::RenderFrameImpl::UpdatePeakMemoryStats is taking a lot of CPU time

Project Member Reported by dcheng@chromium.org, Aug 3 2017

Issue description

Chrome Version       : 61.0.3163.14
OS Version: 10.0
URLs (if applicable) :
Other browsers tested:
  Add OK or FAIL after other browsers where you have tested this issue:
     Safari 5:
  Firefox 4.x:
     IE 7/8/9:

What steps will reproduce the problem?
1. Open Inbox. A corp account will probably help exacerbate the problem.
2. Try typing.

What is the expected result?
Typing should not be janky.

What happens instead of that?
Typing is janky.

Please provide any additional information below. Attach a screenshot if
possible.
brucedawson@ helped debug this. It looks like Inbox results in many calls to CreateURLLoader: ~3500 calls in ~8.5 seconds. As of https://chromium-review.googlesource.com/c/569327, CreateURLLoader() calls ReportPeakMemoryStats(), and it looks like base::ProcessMetrics::GetMallocUsage is accounting for the majority of time in this function, since it's walking the heap.


UserAgentString: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.14 Safari/537.36



 
callee.png
251 KB View Download
Labels: ReleaseBlock-Stable
I think this should probably be RB-S.
To clarify the meaning of the screenshot for non-ETW-experts, it says that 3472 samples (representing ~3,468 ms) were recorded in UpdatePeakMemoryStats and its descendants, over an 8.48 s time period. So that's a lot.
Status: Assigned (was: Unconfirmed)
Thanks for identifying this!!  I think we should probably call this only once for some reasonable interval?
Given the power and performance implications of this it would seem appropriate to measure the CPU cost of this function before deciding how frequently we can run it without regressing. The cost is likely to vary depending on how much data is in the CRT heap, so run tests in a range of scenarios, with the heap well loaded up (I'm not sure how to do that).

Note that if this code runs when the system is under memory pressure then walking the heap will require paging in memory - paging in all of the heap. If this code runs on multiple tabs simultaneously then it could convert a benign low-memory paged-out situation and turn it into a page-storm.

Comment 5 by lfg@chromium.org, Aug 3 2017

Cc: jbroman@chromium.org haraken@chromium.org lfg@chromium.org
FYI, this is a 5% performance regression on the speedometer VanillaJS bechmark.

I'm wondering if the perf trybots didn't catch this, and if it is because the feature is controlled via a finch experiment.

keishi@: Any thoughts on this?

I looked at the trace to see how wide (in ms) the blocks of samples associated with UpdatePeakMemoryStats are. It looks like the are 70-77 ms, followed by a gap with no samples. This is not quite proof but is a very strong indicator that each call to UpdatePeakMemoryStats is taking 70-77 ms.

The actual time will vary depending on the state of the heap (number of allocations, whether it has been paged out) and the system load. Note that this trace was taken on a very beefy machine.

UpdatePeakMemoryStats_cpu_spikes.PNG
48.6 KB View Download
Barring any strong arguments against I think we need to revert crrev.com/c/569327. The cost (~10 calls per second to UpdatePeakMemoryStats when typing in Inbox chat window, each costing ~70 ms on a fast machine) is far too great and I don't think we want to experiment with mitigation changes in the release branch.

Comment 9 by lfg@chromium.org, Aug 3 2017

Re #8: This particular case that gets called on UpdatePeakMemoryStats is behind a finch experiment though, and the experiment is only enabled on Canary/Dev channels. I'm not sure if we need to rush reverting it, but it certainly shouldn't be enabled long-term.

There's another place where we do full heap walks in RenderFrameImpl::DidFinishLoad(), and that's already in the stable channel. Just removing the call to GetRendererMemoryMetrics yields ~2% improvement on the speedometer benchmark.

Cc: tasak@chromium.org
Yes this is controlled with a finch flag (and it can be enabled/disabled independently of the ResourceLoadScheduler experiment itself). Probably we should disable the flag on Windows.
Issue 749405 has been merged into this issue.
We have three users for GetRendererMemoryMetrics
- keishi@ using it for general memory usage UMA
- tasak@ using it for PurgeAndSuspend UMA
- ksakamoto@ using it for RequestThrottling finch

The three of use talked and decided to disable malloc usage reporting on Windows.
The heap walk is also causing crashes and we think it is best to remove it.
Owner: keishi@chromium.org
Status: Started (was: Assigned)
That's odd that the heap walking is crashing. The heap is locked during the HeapWalk (which will, BTW, block other threads during GetMallocUsage) which should make heap walking a safe and robust operation. The crashes suggest that something bad is going on. Either the heap is corrupt or else some threads are doing allocations despite the heap being locked. Either possibility is worrisome.
Project Member

Comment 16 by bugdroid1@chromium.org, Aug 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8cc2c3abcda7a8e3e33c1927ac07afbd218e01e7

commit 8cc2c3abcda7a8e3e33c1927ac07afbd218e01e7
Author: Keishi Hattori <keishi@chromium.org>
Date: Fri Aug 04 20:55:30 2017

Remove support for GetMallocUsage on Windows

Getting malloc usage on Windows requires a heap walk which is very slow and causes random crashes.
This CL disables it.

Bug:  751896 
Change-Id: I71d9edd4d18deef1259f0bd3d78f351968bf650c
Reviewed-on: https://chromium-review.googlesource.com/601810
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Commit-Queue: Keishi Hattori <keishi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#492118}
[modify] https://crrev.com/8cc2c3abcda7a8e3e33c1927ac07afbd218e01e7/base/process/process_metrics_win.cc

[Bulk Edit]
URGENT - PTAL.
Your bug is labelled as M61 Stable ReleaseBlock, pls make sure to land the fix and get it merged into the release branch ASAP.

Know that this issue shouldn't block the release?  Remove the ReleaseBlock-Stable label.

Thank you.

merge should not be needed because the study is not enabled and won't be enabled on Beta and Stable channels.

I think we can just close this bug with keishi's last CL, IIUC.

keishi: is my understanding correct?

Also, we could not answer brucedawson's question at #c15. If something seems wrong on the heap walk, would we file another dedicated bug for that? Since this bug is filed as ReleaseBlock-Stable, we should not keep it open for a long time.
Yes

I am not a Windows person but according to a Google search other people seem to have crashing issues when using heap walk too.
https://cs.chromium.org/chromium/src/base/trace_event/malloc_dump_provider.cc?q=HeapUnlock&dr=C&l=164

This existing code does not check HeapLock result, and calls HeapUnlock later.
If this code run on a different thread while GetMallocUsage runs, this could unlock the heap handle while the heap walk is running inside GetMallocUsage, and could cause a crash?
Status: Fixed (was: Started)
Anyway, let me close this now. Feel free to reopen of file another bug for unresolved things.
Labels: Merge-TBD
[Auto-generated comment by a script] We noticed that this issue is targeted for M-61; it appears the fix may have landed after branch point, meaning a merge might be required. Please confirm if a merge is required here - if so add Merge-Request-61 label, otherwise remove Merge-TBD label. Thanks.
Labels: -Merge-TBD
merge should not be needed (see comment 18)
> This existing code does not check HeapLock result, and calls HeapUnlock later.

Putting a CHECK on HeapLock seems reasonable to see if it is failing for some reason. However I can't think of any reason why it would fail. Rather, the only reason I can think of it for failing are if the heap was create with the HEAP_NO_SERIALIZATION flag, but the HeapLock documentation says that that gives undefined results, not a HeapLock failure.

That said, I have a theory about why it is causing crashes. The heap's main data structures are protected by the lock and are safe to access while holding the heap lock. However the heap also has a number of look-aside lists that are designed to be locklessly accessed. If HeapWalk is walking all memory that is allocated from the heap and if HeapAlloc and HeapFree are able to continue running on some paths on other threads due to these lockless lists then it seems unavoidable that HeapWalk would be either unreliable or incomplete or both.

I have confirmed that alloc/free continue when the heap is locked (as long as the LFH is used) but I have not proved that this leads to HeapWalk crashes.

https://twitter.com/BruceDawson0xB/status/897154347800289281

Sign in to add a comment