New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872253 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Reading /proc/<pid>/totmaps is very slow under memory pressure

Project Member Reported by chinglinyu@chromium.org, Aug 8

Issue description

Chrome Version: 69.0.3497.0
OS: Chrome OS

What steps will reproduce the problem?
I opened several very memory-hungry tabs and cycle through them constantly to put the system under heavy memory pressure. Sometimes the system enters an observable freeze (e.g. seconds). I noticed sometimes resource_coordinator::TabManager::OnMemoryPressure() can block the UI thread for 500+ or even 1000+ ms.

Then I profile the browser process UI thread with perf with:
# /usr/bin/perf record -a -g -t 1377 -F 100

And in a time interval where TabManager::OnMemoryPressure() takes 500+ ms, I got:
	93.45%     0.00%  chrome   chrome                 [.] _start
			|
			---_start
			   __libc_start_main
			   ChromeMain
			   service_manager::Main
			   (deleted for brevity. Refer to the attachment for deleted content.)
			   base::MemoryPressureListener::Notify
			   resource_coordinator::TabManager::OnMemoryPressure
			   resource_coordinator::TabManager::LogMemoryAndDiscardTab
			   resource_coordinator::TabManagerDelegate::LowMemoryKill
			   resource_coordinator::TabManagerDelegate::LowMemoryKillImpl
			   |
			   |--43.73%--resource_coordinator::TabManagerDelegate::KillTab
			   |          resource_coordinator::TabLifecycleUnitSource::TabLifecycleUnit::Discard
			   |          resource_coordinator::TabLifecycleUnitSource::TabLifecycleUnit::FinishDiscard
			   |          |
                                       (deleted for brevity. Refer to the attachment for deleted content.)
			   |
			   |--42.98%--resource_coordinator::TabLifecycleUnitSource::TabLifecycleUnit::GetEstimatedMemoryFreedOnDiscardKB
			   |          base::ProcessMetrics::GetTotalsSummary
			   |          base::ReadFileToStringWithMaxSize
			   |          GI_libc_read
			   |          entry_SYSCALL_64_fastpath
			   |          sys_read
			   |          __vfs_read
			   |          seq_read
			   |          totmaps_proc_show
			   |          walk_page_vma
			   |          __walk_page_range
			   |          smaps_pte_range
			   |          |
			   |           --30.82%--swp_swapcount
			   |                     |
			   |                     |--18.78%--swap_info_get
			   |                     |          _raw_spin_lock
			   |                     |          do_raw_spin_lock
			   |                     |
			   |                      --12.04%--_raw_spin_unlock
			   |                                |
			   |                                 --6.08%--do_raw_spin_unlock


It can be seen that to get how much memory we can free from killing a process (USS + swap), we read /proc/<pid>/totmaps, and it could be very slow under memory pressure, when the UI thread and kswapd step on each other's toes (30% time spent in swp_swapcount, and lots of cycles spent in spin locks). The estimations for handling memory pressures actually worsen the memory pressure. We could consider using a lighter but less accurate estimation for handling memory pressures so that system performance doesn't fall off the cliff when free memory drops to some extent.
 
perf_report.txt
12.1 KB View Download
Cc: cywang@chromium.org semenzato@chromium.org sonnyrao@chromium.org
that's an interesting result  -- 43% of the time in LowMemoryKillerImpl is spent reading totmaps? am I reading it correctly?

I think we're using totmaps (which is a chrome os specific thing -- we need to switch to smaps_rollup which is the upstream version) because we used to rely on rss from /proc/<pid>/stat but that wasn't very accurate.  We could revisit that and see if RSS from /proc/<pid>/stat is good enough.
Cc: cylee@chromium.org
Yes, LowMemoryKillerImpl can spend 43% in reading totmaps. Anything walking the address space for the process, like smaps or smaps_rollup, is expected to give a similar result.

We should consider using RSS+swap as a faster approximation. When a tab process consumes lots of memory, its shared memory could take a small fraction, where RSS is closer to USS. I need to have experiments to get some numbers.
Some experiment result with top 10 web sites on https://moz.com/top500 :
RSS of of a tab process ranges from 300 MB to 110 MB.
Correspoding USS numbers are 200 MB to 33 MB.
Shared memory for these processes ranges from 100 to 70 MB.

We need to be careful in using RSS for estimating freed memory for a process that it's an overestimation and could lead to too few processes killed. Then the system doesn't recover from low memory condition and needs to take the path from low memory notification to getting a process killed again, making the system stay under memory pressure for longer.

We can add a negative offset of freed memory using RSS. Adding hysteresis to tab discard ( https://crbug.com/872253 ) also alleviates the precision problem of RSS.
Cc: yuzhao@chromium.org
Owner: chinglinyu@chromium.org
Status: Assigned (was: Untriaged)
It looks like in 4.14 and later /proc/<pid>/status contains enough information to get a USS number that is accurate for estimating memory freed by discard.

/proc/<pid>/statm reports rss which is a sum of anon, file, and shmem
file can be a large amount of memory which is mostly used for text

In 4.14 status file we have anon, file, and shmem broken out:

VmPeak:   614280 kB
VmSize:   612744 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    164448 kB
VmRSS:    134844 kB
RssAnon:          106320 kB
RssFile:           28244 kB
RssShmem:            280 kB
VmData:   189276 kB
VmStk:       132 kB
VmExe:    145020 kB
VmLib:     44712 kB
VmPTE:      1044 kB
VmPMD:       496 kB
VmSwap:        0 kB


so we can just parse out Anon from this and use that as our estimation.  We will need to backport this stat to older kernels, but that should be relatively easy.

To use RssAnon as an approximation to USS, I backported
* https://chromium.googlesource.com/chromiumos/third_party/kernel/+/eca56ff906bdd0239485e8b47154a6e73dd9a2f3 and
* https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8cee852ec53fb530f10ccabf1596734209ae336b 
to v4.4 and tested on my device with top 10 web sites (search and mail are tested on google.com).

Using RssAnon as our estimation on these sites works pretty well. Deltas of RssAnon and USS of the renderer processes consistently fall within 35 to 40 MB in 9 of 10 sites. The only exception of the top 10 sites is wikipedia.org, where RssAnon - USS is 27 MB:

Tab             USS     Anon    Delta
=====================================
Facebook.com    127004  166664  39660
Twitter.com     84948   125592  40644
Google search   49536   90324   40788
Gmail           185808  222292  36484
Youtube.com     94664   134620  39956
Instagram.com   44052   84532   40480
Linkedin.com    119060  159332  40272
Wordpress.org   34564   70104   35540
Pinterest.com   77832   118300  40468
Wikipedia.org   57924   85824   27900
Wordpress.com   57860   98140   40280

The delta comes mostly from anonymous shared memory shared between zygote and renderer processes, like the following:

5a673fe00000-5a6741c00000 r-xp 00000000 00:00 0
Size:              30720 kB
Rss:               30720 kB
Pss:                1616 kB
Shared_Clean:          0 kB
Shared_Dirty:      30720 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:        30720 kB
Anonymous:         30720 kB
AnonHugePages:     30720 kB
(omitted for brevity)

And with zygote, we can expect that there won't be many file-mapped private pages. I am going to proceed with backporting to v3.x kernel and then the chrome part.
re #7 -- that's interesting.  the PSS of that process is also tiny which to me says this shared anon memory vma is probably shared among all of the renderer processes.  I agree that it seems like a good enough approximation for what we need.
Cc: fdoray@chromium.org
Project Member

Comment 10 by bugdroid1@chromium.org, Sep 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/891c27ed82da443e7373152fdef17984852b6eae

commit 891c27ed82da443e7373152fdef17984852b6eae
Author: Chinglin Yu <chinglinyu@chromium.org>
Date: Thu Sep 20 06:38:53 2018

Estimate freed memory in killing a process faster.

Use memory_instrumentation::OSMetrics::FillOSMemoryDump(), which
reads /proc/<pid>/statm, to get private bytes of a process, to avoid
contention with kswapd under heavy memory pressure.

BUG= chromium:872253 
TEST=manual
R=cylee@chromium.org, sonnyrao@chromium.org, fdoray@chromium.org

Change-Id: I4d43933a39c3c89d8ebb81e3ccef20277cedb258
Reviewed-on: https://chromium-review.googlesource.com/1212246
Commit-Queue: Chinglin Yu <chinglinyu@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Reviewed-by: Cheng-Yu Lee <cylee@chromium.org>
Cr-Commit-Position: refs/heads/master@{#592701}
[modify] https://crrev.com/891c27ed82da443e7373152fdef17984852b6eae/chrome/browser/resource_coordinator/tab_lifecycle_unit.cc
[modify] https://crrev.com/891c27ed82da443e7373152fdef17984852b6eae/chrome/browser/resource_coordinator/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/891c27ed82da443e7373152fdef17984852b6eae/chrome/browser/resource_coordinator/tab_manager_delegate_chromeos.h
[modify] https://crrev.com/891c27ed82da443e7373152fdef17984852b6eae/chrome/browser/resource_coordinator/utils.cc
[modify] https://crrev.com/891c27ed82da443e7373152fdef17984852b6eae/chrome/browser/resource_coordinator/utils.h

Status: Fixed (was: Assigned)

Sign in to add a comment