New issue
Advanced search Search tips

Issue 879346 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Fix our usage of NtQuerySystemInformation

Project Member Reported by sebmarchand@chromium.org, Aug 30

Issue description

Calls to NtQuerySystemInformation have been notoriously slow, when we shipped the Swap Thrashing Monitor we had to disable it due to the execution cost of this function (see issue 780597). 

In the SwapThrashingMonitor (and in the task manager's sampler) we call this function like this:
  NtQuerySystemInformation(::SystemProcessInformation, buffer.data(),
        static_cast<ULONG>(buffer.size()), &return_length);

The first argument here is the kind of system information to be retrieved, some of the allowed values are described here: https://docs.microsoft.com/en-us/windows/desktop/api/winternl/nf-winternl-ntquerysysteminformation but most of them are simply undocumented so we decided to use the ::SystemProcessInformation which is somewhat documented (here: https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/ex/sysinfo/process.htm?ts=0,736) . It turns out to using this class type might be what's causing this function to be slow as it returns a LOT of data, including a lot of per-thread info the handle count for each process (which is apparently computed by iterating all the process handles). 

In "Windows NT/200, Native API reference" (by Gary Nebbett) there's a description of another information class (undocumented on MSDN) that could give us the number we want without the high execution cost: SystemPerformanceInformation. Here's some of the fields that look promising:

PageFaults : The number of page faults (both soft and hard).
TransitionFaults : The number of soft page faults (excluding demand zero faults).
DemandZeroFaults : The number of demand zero faults.
PagesRead : The number of pages read from disk to resolve page faults.
PageReadIos : The number of read operations initiated to resolve page faults.
PagefilePagesWritten : The number of pages written to the system’s pagefiles.
PagefilePageWriteIos : The number of write operations performed on the system’s pagefiles.

We should update the code that uses NtQuerySystemInformation and make it use this information class instead, it's already being used in https://cs.chromium.org/chromium/src/base/process/process_metrics_win.cc 
 
Description: Show this description
Hum, looking at this more closely it doesn't seem that it gives us per-process data, but it might still be useful and this information can be available via another of the undocumented information classes. 
Cc: -chrishall@chromium.org chrisha@chromium.org
Status: Assigned (was: Untriaged)
The memory manager effectively implements a servo system to try and keep a reserve of free/available pages. The important control value used to be the write-out rate, and I'm sure there's internal state to the servo loop.
I wonder if it'd be sufficient for our purposes to simply look at the modified/mapped page writes per time unit.
This won't give us the internal state of the control function, but it should give us its output value.
Here's a nice pic of what I'm talking about: https://en.wikipedia.org/wiki/Control_theory#Open-loop_and_closed-loop_(feedback)_control. The system tries to keep <Reference> available memory at all times. The <Controller> is probably quite stateful, but the <System input> coming out of it that's readily observable is the write-out rate of the two MPW threads (assuming those are still the same they used to be back in the day of NT/XP).
Thanks for the info Siggi, this is quite useful.

FYI for those not familiar with this: MPW stands for Modified Page Writer (there's some info about this here: https://support.microsoft.com/en-us/help/2713398/storage-developer-may-experience-what-appears-as-data-corruption-on-io)

Sign in to add a comment