New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 723767 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug
Hotlist-MemoryInfra

Blocking:
issue 783513



Sign in to add a comment

measure hard faults in chrome

Project Member Reported by wfh@chromium.org, May 17 2017

Issue description

Hard faults per second are the MS recommended measure of memory pressure - they are highlighted at the top level in the MS resource monitor alongside physical memory used and MSDN articles recommend using this as a primary measure of memory causing slowdowns, as a hard fault hits disk.

https://blogs.technet.microsoft.com/askperf/2008/01/25/an-overview-of-troubleshooting-memory-issues/

We should measure this. It's available in SYSTEM_PROCESS_INFORMATION_EX, there's code already doing this in GetHardFaultCountForCurrentProcess

1. Add to the task manager
2. Add UMA metrics

 

Comment 1 by wfh@chromium.org, May 17 2017

what the default view on Win10 looks like for resource monitor.
resource_monitor.png
349 KB View Download
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
Assigning to erikchen@, who owns memory work right now. Potentially could be handed off to bashi@, who wants to implement a better memory monitor on each platform?
Cc: erikc...@chromium.org bashi@chromium.org
Owner: w...@chromium.org
Over to wez@, who's working on coming up with a consistent story for memory pressure signals.

Comment 4 by w...@chromium.org, Jun 12 2017

Labels: M-61
wfh: Thanks for pointing this out; excitingly that is exactly the conclusion we'd been moving toward, which is encouraging. :)

Comment 5 by w...@chromium.org, Jun 12 2017

Components: Internals>Instrumentation>Memory UI>TaskManager

Comment 6 by w...@chromium.org, Jun 14 2017

Cc: borisv@chromium.org

Comment 7 by borisv@chromium.org, Jun 14 2017

Isn't this also a good measure for memory fragmentation? I assume that in such cases we will start trashing the disk, even if the system shows plenty of memory.

Comment 8 by w...@chromium.org, Jun 14 2017

We won't start seeing hard faults until the system actually has a reason to
remove pages from physical memory; until then we can be accumulating
fragmentation without the hard fault count rising.

The simplest measure I can see for fragmentation would be a calculation
based on the committed size (i.e. pages) of the process, and the allocated
size (i.e. bytes) and cache size (i.e. committed pages the allocator is
holding to serve new allocations cheaply) for each heap in the process.

Comment 9 by borisv@chromium.org, Jun 14 2017

Cc: awong@chromium.org
I see. I don't want to hijack the bug, do we have another bug for tracking the memory fragmentation?

For instance, I am wondering if we can measure the CPU time used by the allocations themselves. There should be CPU counters that we can pick to detect those. The more fragmented the memory is, the more work the heap/allocators will have to do to find a suitable block.

Comment 10 by w...@chromium.org, Jun 15 2017

We have a fragmentation measure of sorts, in the form of the allocator
"metadata_fragmentation_caches" numbers, but I agree that breaking off a
separate bug to evaluate that and perhaps come up with a more useful
measure sounds like a grand plan.

Comment 11 by w...@chromium.org, Jun 18 2017

Status: Started (was: Assigned)

Comment 12 by w...@chromium.org, Jul 19 2017

Labels: -M-61 M-62

Comment 13 by w...@chromium.org, Nov 10 2017

Blocking: 783513

Comment 14 by w...@chromium.org, Nov 10 2017

Labels: -M-62 M-65
We have a CL ready-to-land to add a Hard Faults column under Windows, which we'll land after the M64 branch, for M65.

Linux provides a similar metric, so I've filed issue 783513 to track implementing that.
> MSDN articles recommend using this as a primary measure of memory
> causing slowdowns, as a hard fault hits disk.

Yep, this is true.

> Hard faults per second are the MS recommended measure of memory pressure

Be careful with that extrapolation. Hard faults do not indicate that there is currently memory pressure. They indicate that at some point in the past there was memory pressure. This is very different. This can still be useful, but not always.

For instance, if something (a huge page, a set of ten pages, a misbehaving driver, Photoshop) allocates 75% of system RAM then many of Chrome's pages will get pushed out to memory. Later on, after the user stops using that page/page-set/device/application and switches to their gmail tab they may get a lot of page faults as the private bytes are pulled back into memory. This will cause slowdowns, as the hard faults hit disk, but they do not indicate that the system is currently under memory pressure.

Writing pages to disk is a measure of current memory pressure. Reading pages from disk (hard faults) is a measure of *past* memory pressure, with no indication whatsoever about whether that memory pressure is still present.

> We won't start seeing hard faults until the system actually has a reason to
> remove pages from physical memory

I may just be being nitpicky here, but again, hard faults are when pages are *returned* to physical memory, not *removed* from physical memory. A counter for pages being removed from physical memory would be most excellent but I am not aware of such a counter.



Also, we call NtQuerySystemInformation from quite a few places, including two places that are getting HardFaultCount (GetHardFaultCountForCurrentProcess and GetHardFaultCountForChromeProcesse) and some that are getting private memory data (QuerySystemProcessInformation). This is a problem from a code duplication point of view and a performance point of view. We should try to call this function from a single time/place and then share the information.

Comment 16 by w...@chromium.org, Nov 10 2017

Re #15: Yes, agreed; hard-faults really measure potential slow-down due to past pressure causing page-outs (although it seems that activities other than page-in are counted toward hard-faults, so we should also be careful about that).

(My point re not seeing hard faults until the system actually has a reason to remove pages from physical memory was just that by definition, you can't have page-ins until you've paged something out - however, based on anecdata I think other activity is counted, as noted above)

And yes, we're calling a load of these sampling APIs too often (from Task Manager, UMA/UKM things, as part of pressure and other monitors, etc); that problem is not confined to Windows.  At one point there was an intention that those use-cases should be served by the TaskManager, the back-end impl of which should serve the various internal consumers as well as the UI - I still think that's a reasonable goal, but it'll take some further work, and we'll need to streamline the implementation, which has a lot of overhead right now.

Finally, as regards this particular bug, TaskManager already calls QuerySystemProcessInformation as soon as any of the fields served by the SharedSampler are requested, which includes e.g. CPU time, so in practice pulling the hard fault data out of that isn't adding an extra call, thankfully.
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f4a0b3223965110f96dc583f9e48871a220165d0

commit f4a0b3223965110f96dc583f9e48871a220165d0
Author: Wez <wez@chromium.org>
Date: Sat Nov 11 02:21:22 2017

Add per-process "hard fault" column to TaskManager

Hard faults (or "major" faults) are attempts to access memory pages
which are not resident in physical RAM, and therefore incur substantial
cost to make available (e.g. paging-in from disk, decompressing from
in-memory compressed swap, etc).

Task Manager will now offer an optional column showing the hard-fault
rate in each sampling cycle. 

Bug:  723767 
Change-Id: If3e77759e7725f6c70b2e4f163adbca600d00ba7
Reviewed-on: https://chromium-review.googlesource.com/538464
Reviewed-by: Nick Carter <nick@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515799}
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/app/generated_resources.grd
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/shared_sampler.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/shared_sampler_win.cc
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/task_group.cc
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/task_group.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/task_manager_impl.cc
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/sampling/task_manager_impl.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/task_manager_interface.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/task_manager_observer.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/test_task_manager.cc
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/task_manager/test_task_manager.h
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/ui/task_manager/task_manager_columns.cc
[modify] https://crrev.com/f4a0b3223965110f96dc583f9e48871a220165d0/chrome/browser/ui/task_manager/task_manager_table_model.cc

Comment 18 by w...@chromium.org, Nov 11 2017

Status: Fixed (was: Started)

Sign in to add a comment