Vmlog reports large page swap-in without many major faults. |
|||
Issue descriptionWhile debugging user reports of jank, I started to look at vmlog data to try to correlate swap activity with slowness logged by Chrome. I was able to match up high swap activity to "Slow method" log lines in the M61 feedback, and roughly match up the time to tab discarding in the M60 feedback. Two recent feedback reports show significant swapping activity, and a surprising ratio of pages being swapped in to major faults: https://listnr.corp.google.com/report/69149412199 (M61) https://listnr.corp.google.com/report/69266491909 (M60) Here's one surprising excerpt from the M60 feedback: $ cat vmlog_data | awk 'NR==6 {old_mf=$2; old_in=$5; old_out=$6; next}; {print $1, $2-old_mf, $5-old_in, $6-old_out; old_mf=$2; old_in=$5; old_out=$6}' time majfault_diff pswpin_diff pswpout_diff [47612.585877] 5765 7182 22247 [47614.588085] 3261 3068 57060 [47616.590277] 5228 37849 47169 [47618.592463] 5967 63530 88807 [47620.594358] 5305 81497 116868 [47622.596198] 4779 156290 180023 [47624.598345] 5107 66778 138156 [47626.600310] 7222 134962 167160 [47628.601382] 3549 3351 0 [47630.603378] 5961 6681 35 [47632.606104] 5140 60536 85271 [47634.608168] 1543 78701 71555 [47636.610198] 1765 223093 216027 [47638.612186] 3603 193182 201938 [47640.614698] 470 192090 183774 [47642.616766] 3879 200923 208216 [47644.618208] 184 213786 214028 [47646.620372] 856 251920 252174 [47648.622493] 500 276144 277755 [47650.624222] 1044 213318 253818 Those swapout rates seem extremely high: >>> 277755.0 * 4096 / 2 / 1024 / 1024 542.490234375 MB/sec At the same time, there are not many major faults. Looking at the ratio of swap-ins to major faults: >>> 276144.0 / 500 552.288 This seems very suspicious, especially because most of the log has an expected approximate 1:1 ratio for major faults to swap-ins.
,
Jul 27 2017
Doug pointed out that this may be explained by swapping in huge pages. The unit of swap is still the small page, so a single page fault might swap in a large number of them.
,
Jul 27 2017
It's not huge pages. Chrome OS uses hugepages, but only for the Chrome text segment, which we don't swap.
,
Jul 27 2017
Could it be from general compaction triggered by higher order allocations?
,
Jul 27 2017
But why would compaction need to swap pages in? Only the pages present in memory need to be copied.
,
Jul 27 2017
re #5 -- oh I just saw high swap out rate, yeah i'm not sure why there would be a lot of swap-ins unless compaction could cause swap out + swap in when moving pages around.
,
Jul 28 2017
I'm not sure that compaction could cause this behavior -- it was just a guess, need to look at the code more. The feedback report of a sudden burst of activity did sort of remind me of an older bug where Chrome switched to OilPan which didn't have generational GC and we started triggering excessive mark-and-sweep type GC https://bugs.chromium.org/p/chromium/issues/detail?id=588172#c84 So, it's just another random thought, with no data to back it up, but maybe this type of linear scan GC activity could cause a lot of prefetching without a lot of page faults if it were happening on mmaped?
,
Jul 28 2017
#7 very nice theory. I wonder if we shouldn't see such GCs more often. Maybe we just haven't looked for them.
I looked more closely at the logs. The "swap storm" starts here:
[47632.606104] 5140 60536 85271
on the syslog, this corresponds to roughly here:
2017-07-27T08:11:46.570297+00:00 NOTICE kernel: [47626.631116] audit: type=1400 audit(1501143106.569:188): avc: denied { read } for pid=8410 comm="Binder:2860_1" name="/" dev="tmpfs" ino=26817 scontext=u:r:priv_app:s0:c512,c768 tcontext=u:object_r:device:s0 tclass=dir permissive=0
(there's nothing interesting from the kernel at that time)
I think the syslog time is off by 7 hours (GMT difference) because the syslog ends here:
2017-07-27T08:14:10.189258+00:00 DEBUG kernel: [47770.249358] SELinux: initialized (dev proc, type proc), uses genfs_contexts
and the chrome user log ends about 7 hours earlier:
[1635:1704:0727/011328.539112:VERBOSE1:ppapi_plugin_process_host.cc(506)] PpapiPluginProcessHost[broker]<IPv6: 5>OnChannelError()
based on this hypothesis, the chrome user log shows that the tab discarder had just started running (about 5 seconds before the storm start):
[1635:1635:0727/011146.346449:WARNING:oom_memory_details.cc(46)] Tab Discards Memory details (274 ms):
but had done nothing:
[1635:1635:0727/011146.355439:ERROR:device_event_log_impl.cc(156)] [01:11:46.355] Memory: tab_manager_delegate_chromeos.cc:577 Target memory to free: -329728 KB
The next tab discard is 90 seconds later.
Shouldn't a large GC event be logged?
Could something be going on on the Android side?
The included bugreport suggests not:
Total PSS by process:
51,160K: com.android.vending (pid 27620 / activities)
41,191K: com.google.android.gms (pid 2888)
36,516K: system (pid 289)
35,198K: com.google.android.gms.unstable (pid 2860)
32,769K: com.google.android.gms.persistent (pid 2731)
27,488K: com.google.android.gm (pid 8300)
15,713K: android.process.media (pid 7348)
8,306K: android.process.acore (pid 6936)
8,276K: com.google.android.gms.ui (pid 26340)
5,426K: com.android.pacprocessor (pid 1906)
5,044K: com.google.android.keep (pid 27171)
4,084K: com.google.process.gapps (pid 783)
3,829K: com.google.process.gapps (pid 28450)
These are all small processes.
I am not super-proficient at reading bugreports (i.e.: first one I read) but I don't see anything super-exciting at that time.
,
Jul 28 2017
> based on this hypothesis, the chrome user log shows that the tab discarder > had just started running (about 5 seconds before the storm start): When we were talking about the tab discarder didn't someone suggest that the process of discarding a tab (especially w/ a shared renderer?) could trigger a GC? Yeah, Wez said: > killing a tab and GCing to reclaim memory is a non-starter in that situation. > Tab-killing from shared renderers really makes most sense if we're going to do > it before the renderer starts getting swapped-out, though, which I suspect makes > it not-very-useful in practice.
,
Jul 28 2017
Re #9: Discarding a tab in a shared renderer will not free up any memory until both V8 and Oilpan have GC'd; I thought we had a fast-path tear-down for single-tab renderers, but I think that depends on details of what the renderer is running - Kentaro should be able to confirm.
,
Jul 30 2017
If a single-tab renderer does not have unload handlers, shared workers etc, it is killed immediately (and the memory is freed up). If the renderer is shared by multiple tabs, we need V8 and Oilpan GCs to collect the memory of discarded tabs.
,
Aug 1 2017
There are 4 paths to add count to PSWPIN. (in kernel 3.18) 1. PGMAJFAULT_A (anonymous) do_swap_page() -> swapin_readahead() -> read_swap_cache_async() 2. PGMAJFAULT_S (shmem) shmem_fault() -> shmem_getpage_gfp() -> shmem_swapin() -> swapin_readahead() -> read_swap_cache_async() 3. madvise(..., MADV_WILLNEED) madvise_willneed() -> force_swapin_readahead() / force_shm_swapin_readahead() -> read_swap_cache_async() 4. swapoff swapoff() -> try_to_unuse() -> read_swap_cache_async() swapoff is unlikely. It should be madvise(..., MADV_WILLNEED) caused the big mismatch between PSWPIN and PGMAJFAULT.
,
Aug 2 2017
re #12 -- What about the shmem path? Could it be that graphics buffers see only one fault but the entire object gets paged in? That could certainly result in a very large ratio of swap-in vs fault
,
Aug 2 2017
re #13: It could be. The following call path increase PSWPIN without increasing PGMAJFAULT. i915_gem_object_get_pages_gtt() -> shmem_read_mapping_page_gfp() -> shmem_getpage_gfp() -> shmem_swapin() -> swapin_readahead() -> read_swap_cache_async() -> swap_readpage() #12 misses some cases, shmem_getpage_gfp() is called by several functions. |
|||
►
Sign in to add a comment |
|||
Comment 1 by teravest@chromium.org
, Jul 27 2017