New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 713463 link

Starred by 10 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 3
Type: Bug

Blocking:
issue 770042



Sign in to add a comment

Implement a custom calculation for MemoryPressure on macOS.

Project Member Reported by erikc...@chromium.org, Apr 20 2017

Issue description

The current implementation uses sysctlbyname("kern.memorystatus_vm_pressure_level",...)
https://cs.chromium.org/chromium/src/base/memory/memory_pressure_monitor_mac.cc?q=memorypressure+package:%5Echromium$&dr=Ss&l=105

This is implemented in xnu-37891.32/bsd/kern/kern_memorystatus.c:6612, which just forwards to xnu-37891.32/osfmk/vm/vm_pageout.c:4341 (vm_pressure_response).

In that function, there are two values calculated. An enum [memorystatus_vm_pressure_level], and a percentage [memorystatus_level].

-----------------------------------------
memorystatus_level calculation:
roughly, memorystatus_level = AVAILABLE_NON_COMPRESSED_MEMORY * 100 / total_pages.
AVAILABLE_NON_COMPRESSED_MEMORY = vm_page_active_count + vm_page_inactive_count + vm_page_free_count + vm_page_speculative_count

all memory = AVAILABLE_NON_COMPRESSED_MEMORY + VM_PAGE_COMPRESSOR_COUNT + wired pages

You can run this calculation yourself using the values in vm_stat, and confirm that it produces the "System-wide memory free percentage" from memory_pressure.

-----------------------------------------
memorystatus_vm_pressure_level is determined by a state machine. The transition functions are at xnu-37891.32/osfmk/vm/vm_pageout.c:11120 

VM_PRESSURE_WARNING_TO_CRITICAL() pseudocode:
  if compressed memory > 0.98 * min(64 GB, 16 * physical_memory)
    return true
  if compressor's memory footprint > 0.98 * min(64GB, 16 * physical_memory))
    return true
  if max_mem < 3GB:
    if AVAILABLE_NON_COMPRESSED_MEMORY < 1.2 * 0.5 * max_mem
      return true
  if max_mem >= 3GB:
    if AVAILABLE_NON_COMPRESSED_MEMORY < 1.2 * 0.28 * max_mem
      return true
  return false
The first two conditions are almost impossible to hit. The latter two conditions...are kind of crazy. Keep in mind that AVAILABLE_NON_COMPRESSED_MEMORY includes memory that's resident [in use]. The only types of memory that are *not* included are compressed and wired memory.

VM_PRESSURE_NORMAL_TO_WARNING() pseudocode:
  if max_mem < 3GB:
    if AVAILABLE_NON_COMPRESSED_MEMORY < 0.9 * max_mem
      return true
  if max_mem >= 3GB:
    if AVAILABLE_NON_COMPRESSED_MEMORY < 0.5 * max_mem
      return true
  return false

-----------------------------------------
With all of this said, perhaps the most important point is that we had a massive memory leak (56GB of leaked GL textures), and the OS memory pressure levels never left NORMAL.
https://bugs.chromium.org/p/chromium/issues/detail?id=700928#c52

I think we need a new calculation. As a rough starting point, I propose:
total_pages_in_use = host_statistics64.wire_count + host_statistics64.inactive_count + host_statistics64.active_count + host_statistics64.total_uncompressed_pages_in_compressor

GetMemoryPressureLevel:
if pages_free > 0.1 * physical_memory
  return normal
if total_pages_in_use < 1.0 * physical_memory:
  return normal
if total_pages_in_use < 1.5 * physical_memory:
  return warning
return critical

This is very rough, and I'd appreciate feedback/revisions, but seems like a better starting point than the OS level counters.
 

Comment 1 by bashi@chromium.org, Apr 20 2017

This is a great investigation. Very informative!

Leaking is surely a bad thing, but I'm not sure it should be reflected to memory pressure. Does the leak actually affect system's responsiveness? IMHO memory pressure should represent something like "we are running out physical memory", regardless there are leaks. Probably we should have other ways to detect leaks?

That said having platform-specific calculations for memory pressure itself makes sense to me because we can make use of all information the system provides. In memory coordinator we are trying to unify the calculation but my current feeling is that we may want to build memory coordinator on top of existing MemoryPressureMonitor (mapping MemoryPressureLevel -> MemoryCondition). There are some ongoing efforts on improving MemoryPressureMonitor (macOS and ChromeOS) and memory coordinator would want to take advantage of them.
> Leaking is surely a bad thing, but I'm not sure it should be reflected to memory pressure. Does the leak actually affect system's responsiveness? IMHO memory pressure should represent something like "we are running out physical memory", regardless there are leaks. Probably we should have other ways to detect leaks?

The system was literally unusable [most applications were suspended, could not continue until chrome was killed]

I'm just using this leak as an example. My point is that a system can be unresponsive because of excessive memory usage, but the memory pressure level might still show "normal". This just means that we shouldn't trust the macOS memory pressure level.
Another real world example where a user was seeing 5-second page loads due to severe memory pressure, but memory pressure level was only "warning".

https://bugs.chromium.org/p/chromium/issues/detail?id=700580#c32
I agree with Erik that the memory pressure should be calculated so that a 'critical' state is set before a terrible UX happens.

(I'm not familiar with Mac OS's implementation but) if you think that we cannot trust the OS's pressure level, I agree that we should calculate the pressure level manually.

I might want to hear thoughts of shrike@.

Comment 5 by shrike@chromium.org, Apr 20 2017

erikchen@ - can you relay more details about why you want to synthesize a memory pressure value? I'm not saying it doesn't make sense, I'm just wondering where it would be used. Originally I thought this was related to MemoryMonitorMac, which has a similar concept (it outputs an estimate of free memory, which the MemoryCoordinator uses to estimate memory pressure).

shrike: Perhaps I'm mistaken, but I thought that MemoryPressureMonitor was the entity used by Memory Coordinator to decide policy [such as whether to discard tabs]. The two examples I give in c#1 and c#3 show cases where the system is under a significant amount of memory pressure, but the OS thinks the memory pressure level is normal, or warning, but not critical.

The goal of implementing a synthetic memory pressure value is to allow the memory coordinator to make better policy decisions.

Comment 7 by shrike@chromium.org, Apr 20 2017

OK, got it. Yes, if the goal is to create a better MemoryPressureMonitor[Mac], generating the pressure level synthetically seems like a good plan.

The MemoryCoordinator system has changed over time so I'm not exactly sure of its current design, but the original plan was to replace the MemoryPressureMonitors with per-platform "MemoryMonitors", which return an estimate of available free memory. There currently is not a MemoryMonitorMac, which is in-part why the MemoryCoordinator does not run on the Mac. It occurred to me that what you're describing for the synthetic memory pressure signal could be what we want to return from the MemoryMonitorMac.

Comment 8 by bashi@chromium.org, Apr 20 2017

Re: #2

Thanks for the clarification. If the system slows down but the system still thinks its memory state is normal, that's bad. In that case distrusting the system's pressure level makes sense.

Re: #7

Yes, your explanation is perfect. The original plan was to replace existing MemoryPressureMonitor (located in base/) with MemoryMonitor (located in content/browser/memory). But as I wrote in #1 it may make sense to use MemoryPressureMonitor for memory coordinator. I'll think it over a bit more.
Throwing an idea out there: a UMA / slow-reports study which:
- collects a bunch of memory stats. with reference to the example above things like:  total_pages_in_use, pages_free, total_pages, wired_pages etc
- measures the time that it takes to do: x = malloc(8 MB), memset(x), free(x) ?

Essentially the idea here would be to grab raw data from the field and see if/where there is a correlation with the time it takes to do pure memory-based operations. If so come with a model that matches what we measured from the field.
Cc: vmi...@chromium.org ericrk@chromium.org
Hey ericrk, vmiura: Does CC discardable memory currently rely on these signals? If so, we may want to reconsider how we work with image caches in the near future, since these signals don't work very well.

haraken: Are there other sources of discardable memory that currently rely on these signals?
Discardable memory is used by many things -- CC, Skia, Blink resources etc.

Our plan is to replace the pressure signals with Memory Coordinator (for not only discardable memory but also everything). Can we try to implement a "right" pressure triggerer for Memory Coordinator?


>  Can we try to implement a "right" pressure triggerer for Memory Coordinator?

This sounds good. :)
Cc: w...@chromium.org
Status: Available (was: Untriaged)
erikchen@ - in your definition of AVAILABLE_NON_COMPRESSED_MEMORY, what are vm_page_active_count, vm_page_inactive_count, and vm_page_speculative_count (I assume it's clear what vm_page_free_count is)?


This definition for AVAILABLE_NON_COMPRESSED_MEMORY comes from ./osfmk/vm/vm_compressor.h. The terms used in the definition are directly available via host_statistics64 as well as vm_stat. 

Note that in host_statistics64, free_count = vm_page_free_count + vm_page_speculative_count, and speculative_count = vm_page_speculative_count. Whereas in vm_stat, "Pages free" is just vm_page_free_count.

Also note that for vm_stat [free + active + inactive + speculative + wired + compressor] pages will add up the number of pages of physical memory. :)

Meanings of these numbers:
free: pages available for use by the system.
speculative: pages that have been speculatively paged in, but are available for use by the system.
active: Page is in use [referenced by a pmap].
inactive: Page *was* in use, but has not been faulted recently, and is ready to be compressed.
wired: Page is *always* in use [e.g. cannot be compressed/paged out].
compressor: Pages used by the compressor.

The reason that AVAILABLE_NON_COMPRESSED_MEMORY includes active and inactive pages is because those pages *could* be used by the compressor to compress more memory, which in turn frees up more pages to be used. 

Comment 16 by zh...@chromium.org, Dec 16 2017

Blocking: 770042
Another potentially useful syscall:

"""
 1021  * mach_vm_pressure_monitor() collects past statistics about memory pressure.   
 1022  * The caller provides the number of seconds ("nsecs") worth of statistics      
 1023  * it wants, up to 30 seconds.                                                  
 1024  * It computes the number of pages reclaimed in the past "nsecs" seconds and    
 1025  * also returns the number of pages the system still needs to reclaim at this   
 1026  * moment in time.         
"""
Status: Untriaged (was: Available)
Available, but no owner or component? Please find a component, as no one will ever find this without one.
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
Erik, can you route?
+ chrisha -- what are next steps here? Does Seb have a framework/ideas for how to best do this on Windows? It would be easy to whip up a simple implementation for macOS, but we'd need to be careful to confirm that we're causing net improvements.
FWIW we're experimenting with manually dispatching critical memory pressures on Android when the memory usage reaches 95%-tile numbers (e.g., 150 MB on 1 GB Android devices). The rationale is that tasak's ablation study has demonstrated that there's a high correlation between high memory usage and regressions on user-facing performance metrics [we'll publish a doc soon]. The assumption is that dispatching critical memory pressures on 95%-tile cases will lead to a better user experience. We'll send a design doc soon :)



Sign in to add a comment