[banon]: oom-killer invoked but "refused to die" and eventually hung task panic |
||||||
Issue description
dremelquery below gives me about 15K crash reports for BANON on 11021.56.0 release. Of those, 13K are for "hung task panics". Of hung task panics, 11K have substantial amount of oom-killer activity...which suggests these systems are tripping "Hung Task detection" before they hit OOM panic.
Banon runs chromeos-3.18 and is Braswell processor. Nearly all of the hung_task reports are 2GB machines.
SET sql_dialect GoogleSQL;
SET OUTPUT csv;
SELECT ReportID, ReportTime, stable_signature, product.Version, ProductData FROM crash.prod.latest
WHERE product.name='ChromeOS' AND product.Version='11021.56.0' AND
EXISTS (SELECT 1 FROM UNNEST(productdata) WHERE (Key='exec_name' AND Value='kernel' ) ) AND
EXISTS (SELECT 1 FROM UNNEST(productdata) WHERE (Key='hwclass' AND Value LIKE 'BANON %' ) )
ORDER BY StackSignature DESC -> dremel-BANON-kernel-11021.56.0-20181120.csv;
I have a script to fetch the crash reports and another script to summarize:
Total Kernel crash reports found: 15236
Out of Memory Panic: 1
Spinlock magic bad : 15
CRC error on Resume: 0
Corrupt Stack: 11
NULL_pointer : 776
49 i915_gem_obj_to_ggtt_view+0x13/0xe1
75 [snd_intel_sst_core]
99 intel_chv_clip_cursor.isra.42+0x1bd/0x29d
117 i915_gem_do_execbuffer.isra.13+0xacb/0x2645
203 [xpad]
Watchdog detected hard LOCKUP : 53
Page Fault : 243
7 ip_rcv+0x3cf/0x488
8 intel_prepare_plane_fb+0x207/0x46a
11 i915_gem_object_get_pages_gtt+0x352/0x415
21 i915_gem_object_info+0xdb/0x66b
77 clflushopt+0x4/0xa
x86 General Protection Fault : 209
6 zs_malloc+0x3d/0x539
8 i915_gem_object_pin+0x587/0xac3
10 intel_chv_clip_cursor.isra.42+0x156/0x29d
17 intel_commit_cursor_plane+0xfe/0x157
84
Soft lockup : 98
Hung Task panics (not runnable > 120 seconds) : 13417
Top 5 hung tasks:
Other BUG_ON: 291
1 <2>[ 948.079767] kernel BUG at ../../../../../tmp/portage/sys-kernel/chromeos-kernel-3_18-3.18-r2207/work/chromeos-kernel-3_18-3.18/block/blk-core.c:2560!
1 <2>[95454.069624] kernel BUG at ../../../../../tmp/portage/sys-kernel/chromeos-kernel-3_18-3.18-r2207/work/chromeos-kernel-3_18-3.18/drivers/gpu/drm/i915/i915_gem.c:4681!
1 <2>[ 9740.896906] kernel BUG at ../../../../../tmp/portage/sys-kernel/chromeos-kernel-3_18-3.18-r2207/work/chromeos-kernel-3_18-3.18/drivers/gpu/drm/i915/i915_gem.c:4681!
1 <2>[97767.265606] kernel BUG at ../../../../../tmp/portage/sys-kernel/chromeos-kernel-3_18-3.18-r2207/work/chromeos-kernel-3_18-3.18/drivers/gpu/drm/i915/i915_gem.c:4681!
1 <2>[ 9820.310862] kernel BUG at ../../../../../tmp/portage/sys-kernel/chromeos-kernel-3_18-3.18-r2207/work/chromeos-kernel-3_18-3.18/drivers/gpu/drm/i915/i915_gem.c:4681!
Out of Memory Panic: 1
REMAINDER: 122
Of the "Hung Task panics", most have substantial oom-killer activity:
for i in * ; do fgrep "refused to die" $i | wc -l | awk '{ print ( int($1 / 100) * 100) } ' ; done | sort -n | uniq -c
4561 0
2819 100
2789 200
1561 300
772 400
355 500
182 600
87 700
56 800
46 900
28 1000
19 1100
18 1200
19 1300
12 1400
12 1500
10 1600
8 1700
4 1800
8 1900
15 2000
26 2100
6 2200
4 2300
[note that less than 100 lines ends up being added to zero bucket of histogram - so many of the 4561 have activity]
CYAN has a different ratio of hung_task (only 2252 of 12007 crashes). This seems to correlate with the ratio of machines reporting 2GB or 4GB of RAM:
grundler <2062>find -type f -name \*.kcrash | xargs fgrep -m 1 "Memory:" | awk '{print $4}' | sort | uniq -c
714 1990356K/2042628K
343 1990596K/2042628K
1 1990740K/2042628K
3 3985532K/4139768K
4 3985536K/4139768K
2312 3985544K/4139780K
2195 3985548K/4139780K
1782 3985784K/4139780K
1771 3985788K/4139780K
1 3985928K/4139780K
while BANON crash reports are mostly 2GB:
find -type f -name \*.kcrash | xargs fgrep -m 1 "Memory:" | awk '{print $4}' | sort | uniq -c
2260 1990536K/2042568K
1 1990596K/2042568K
41 3985484K/4139720K
45 3985488K/4139720K
1 3985716K/4139708K
195 3985724K/4139720K
177 3985728K/4139720K
[Note the numbers don't add up to the total since this info is only printed at boot time and may have been overwritten in the dmesg buffer (circular buffer).]
,
Nov 21
I'm ok with marking this one as duplicate. Evidence I posted in this bug suggests the problem not fixed - just deferred so a different mechanism kicked in to panic the system. Yes, there are fewer OOM panics but a huge number of "hung task panics". I don't know if it's 1:1 correlation but the issue certainly isn't fixed from a users PoV.
,
Nov 21
yeah the OOM leading to hung-task is still likely a problem, but the overall amount of OOM should go down a lot and the number of users seeing this should also go down a lot. We could revisit it, but like I mentioned in person it's a known brokenness with the OOM killer prior too the OOM-reaper changes. If you want -- try to run your reports for Banon and Celes for R71 and see how much hung-task-oom is seen there based on golden eye it looks like some possible versions to look at would be 11151.25.0 11151.29.0 11151.33.0
,
Nov 24
re-opening since this seems to still happen in R71
,
Nov 27
I was able to reproduce a hang on OOM pretty easily on 3.18 kernel on Caroline I used Doug's mmm_donut.py script from src/platform/microbenchmarks at the chrome login screen do: mmm_donut.py -n 200 -z 50 and usually the system completely locks up and I have to use alt-volup-x to reset it
,
Nov 27
I also tried kernel 4.4 on Caroline with the same workload and it seems to handle it much better. I also found that 3.18 isn't not _totally_ locked up. It sometimes can take minutes to recover and just seems like it's locked up. Still, the behavior is sufficiently bad compared to 4.4 that I'd like to understand why.
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f468e169a9f9b1fe8a554e8389a077a8a29e797f commit f468e169a9f9b1fe8a554e8389a077a8a29e797f Author: Sonny Rao <sonnyrao@chromium.org> Date: Tue Nov 27 23:19:56 2018 CHROMIUM: oom: dump stack for processes that refuse to die This will help us get some more information about processes that are getting stuck during OOM. We limit the amount of stack dumping to one every two seconds. BUG=chromium:907329 TEST=build kernel for cyan, boot Change-Id: I5c9f71e44a3a06cb683555c344d825ba67602659 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1349508 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Reviewed-by: Vovo Yang <vovoy@chromium.org> [modify] https://crrev.com/f468e169a9f9b1fe8a554e8389a077a8a29e797f/mm/oom_kill.c
,
Dec 7
I checked some hung task crash reports with "refused to die" on banon R72.11307.0.0 . The following is the stack of the "refused to die" process. upload_file_kcrash-3c279ad22ee9ac6f.kcrash ( https://crash.corp.google.com/3c279ad22ee9ac6f ) <6>[160144.500733] chrome R running task 0 18134 1280 0x00100104 <4>[160144.500803] Call Trace: <4>[160144.500820] [<ffffffff88bd5e64>] schedule+0x58/0x5a <4>[160144.500834] [<ffffffff8842e2b3>] futex_wait+0x864/0xa25 <4>[160144.500847] [<ffffffff884501b6>] ? handle_mm_fault+0x1acb/0x1f42 <4>[160144.500859] [<ffffffff8842975d>] ? hrtimer_init+0xca/0xca <4>[160144.500869] [<ffffffff88429d23>] ? hrtimer_start_range_ns+0x14/0x16 <4>[160144.500882] [<ffffffff8842f9a6>] SyS_futex+0x1fb/0x14c9 <4>[160144.500894] [<ffffffff88bda0c4>] system_call_fastpath+0x21/0x26 upload_file_kcrash-3ea6a264a4b9a921.kcrash <6>[17236.537400] chrome R running task 0 29971 1254 0x00100104 <4>[17236.537464] Call Trace: <4>[17236.537480] [<ffffffffbe9d5df1>] preempt_schedule+0x36/0x51 <4>[17236.537495] [<ffffffffbe5b86a6>] ___preempt_schedule+0x35/0x67 <4>[17236.537508] [<ffffffffbe9d9a1b>] ? _raw_spin_unlock+0x1f/0x21 <4>[17236.537521] [<ffffffffbe24fecf>] handle_mm_fault+0x17e4/0x1f42 <4>[17236.537533] [<ffffffffbe2519b7>] ? find_vma+0x2f/0x64 <4>[17236.537546] [<ffffffffbe2072a3>] __do_page_fault+0x2ff/0x45a <4>[17236.537558] [<ffffffffbe21edca>] ? put_prev_task_fair+0x978/0xaa6 <4>[17236.537597] [<ffffffffbe9d99be>] ? _raw_spin_unlock_irq+0xe/0x22 <4>[17236.537609] [<ffffffffbe204c04>] ? __switch_to+0x158/0x357 <4>[17236.537621] [<ffffffffbe9d99be>] ? _raw_spin_unlock_irq+0xe/0x22 <4>[17236.537632] [<ffffffffbe9d59d2>] ? __schedule+0x582/0x96b <4>[17236.537643] [<ffffffffbe20740a>] do_page_fault+0xc/0xe <4>[17236.537654] [<ffffffffbe9dbdf2>] page_fault+0x22/0x30 upload_file_kcrash-5101ffc48822d5ce.kcrash <6>[ 1873.503562] chrome R running task 0 8168 1239 0x00100104 <4>[ 1873.503625] Call Trace: <4>[ 1873.503641] [<ffffffffb25d6894>] ? bit_wait+0x3f/0x3f <4>[ 1873.503652] [<ffffffffb25d5e64>] schedule+0x58/0x5a <4>[ 1873.503661] [<ffffffffb25d5efc>] io_schedule+0x96/0xed <4>[ 1873.503672] [<ffffffffb25d68db>] bit_wait_io+0x47/0x4b <4>[ 1873.503682] [<ffffffffb25d6563>] __wait_on_bit+0x4f/0x86 <4>[ 1873.503694] [<ffffffffb206b40d>] wait_on_page_bit_killable+0x86/0xa1 <4>[ 1873.503708] [<ffffffffb1e2279d>] ? autoremove_wake_function+0x37/0x37 <4>[ 1873.503720] [<ffffffffb206b46a>] __lock_page_or_retry+0x42/0x8a <4>[ 1873.503771] [<ffffffffb1e505c6>] handle_mm_fault+0x1edb/0x1f42 <4>[ 1873.503785] [<ffffffffb1e072a3>] __do_page_fault+0x2ff/0x45a <4>[ 1873.503797] [<ffffffffb1e7771d>] ? fsnotify+0x2a5/0x304 <4>[ 1873.503808] [<ffffffffb1e0740a>] do_page_fault+0xc/0xe <4>[ 1873.503819] [<ffffffffb25dbdf2>] page_fault+0x22/0x30 upload_file_kcrash-b28a263f802ef974.kcrash <6>[114394.602584] t_app_installer R running task 0 28312 2357 0x20120004 <4>[114394.602648] Call Trace: <4>[114394.602667] [<ffffffffbc7d5e64>] schedule+0x58/0x5a <4>[114394.602679] [<ffffffffbc7d8c27>] schedule_hrtimeout_range_clock+0x5b/0x843 <4>[114394.602692] [<ffffffffbc7d9a0a>] ? _raw_spin_unlock+0xe/0x21 <4>[114394.602706] [<ffffffffbc0501b6>] ? handle_mm_fault+0x1acb/0x1f42 <4>[114394.602718] [<ffffffffbc7d9422>] schedule_hrtimeout_range+0x13/0x15 <4>[114394.602731] [<ffffffffbc0787a5>] ep_poll+0x54a/0x5cf <4>[114394.602744] [<ffffffffbc0073d8>] ? __do_page_fault+0x434/0x45a <4>[114394.602756] [<ffffffffbc011730>] ? cpus_share_cache+0x36/0x36 <4>[114394.602768] [<ffffffffbc079589>] SyS_epoll_pwait+0x11a/0x1b7 <4>[114394.602781] [<ffffffffbc7dc5d0>] sysenter_dispatch+0xc/0x23 upload_file_kcrash-c3de591b08011bd1.kcrash <6>[ 6479.029437] chrome R running task 0 13027 1232 0x00100104 <4>[ 6479.029453] ffff88001c343cf8 0000000000000086 0000000000000050 ffff880050e052b0 <4>[ 6479.029469] ffff88001d82adf0 ffff88001d82adf0 0000000000013b40 ffff88001d82adf0 <4>[ 6479.029484] 00000000000af764 0000000000000000 ffff880078c3e800 ffff88001c343d78 <4>[ 6479.029501] Call Trace: <4>[ 6479.029515] [<ffffffffad7d5e64>] schedule+0x58/0x5a <4>[ 6479.029528] [<ffffffffad02e2b3>] futex_wait+0x864/0xa25 <4>[ 6479.029540] [<ffffffffad04fecf>] ? handle_mm_fault+0x17e4/0x1f42 <4>[ 6479.029552] [<ffffffffad02975d>] ? hrtimer_init+0xca/0xca <4>[ 6479.029563] [<ffffffffad029d23>] ? hrtimer_start_range_ns+0x14/0x16 <4>[ 6479.029575] [<ffffffffad02f9a6>] SyS_futex+0x1fb/0x14c9 <4>[ 6479.029587] [<ffffffffad7da0c4>] system_call_fastpath+0x21/0x26
,
Dec 7
Every "refused to die" process stuck in *schedule() with task status code R (runnable). Maybe the system was too busy and this runnable "refused to die" process never got scheduled. With patch https://crrev.com/c/522955 , the oom killer does not select new victim if current one is runnable. One possible solution is to select new victim if the current runnable "refused to die" process has been stuck too long.
,
Dec 7
,
Dec 17
I looked at this with Grant, and it seems very strange that these processes are marked runnable, but apparently they are not running. Most of the stack traces show the processes waiting for something, so the runnable state seems incorrect. At this point, it's hard to say why this is happening, but it seems like it could be a bug of some kind on our 3.18 kernel. Given that we're not sure how to reproduce this or debug it, let's go forward with the idea in #10 -- go ahead and select a new victim even if a process has been marked runnable but didn't die after some time.
,
Dec 19
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a8a8ec873b8c5010cfeaab25860a84b3e599ef69 commit a8a8ec873b8c5010cfeaab25860a84b3e599ef69 Author: Kuo-Hsin Yang <vovoy@chromium.org> Date: Wed Dec 19 09:12:48 2018 CHROMIUM: oom: select new victim if current one stuck too long Sometimes, a runnable oom victim may stuck indefinitely. Select a new victim if a runnable victim stuck for more than 2 seconds. BUG=chromium:907329 TEST=build and boot Cave, try to load it to start ooming Change-Id: I027b0395fd2e741feb9c76405287b56109128753 Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1382061 Commit-Ready: Vovo Yang <vovoy@chromium.org> Tested-by: Vovo Yang <vovoy@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Dmitry Torokhov <dtor@chromium.org> [modify] https://crrev.com/a8a8ec873b8c5010cfeaab25860a84b3e599ef69/mm/oom_kill.c
,
Dec 24
There are still Banon hung task panic reports caused by refused to die. In https://crash.corp.google.com/10f7ac606887f738 <6>[16919.062333] e.process.gapps R running task 0 1122 13845 0x20120004 <4>[16919.062349] ffff880029167d08 0000000000000086 ffff880029167ce8 ffff8800018aa4c0 <4>[16919.062365] ffff88003b3a4980 ffff88003b3a4980 0000000000013b40 ffff88003b3a4980 <4>[16919.062380] ffff880029167ce8 ffff88003b3a4980 ffff8800036f38a8 0000000000000001 <4>[16919.062396] Call Trace: <4>[16919.062411] [<ffffffff993d0182>] schedule+0x58/0x5a <4>[16919.062423] [<ffffffff993d2ece>] schedule_hrtimeout_range_clock+0x5b/0x846 <4>[16919.062436] [<ffffffff993d3cba>] ? _raw_spin_unlock+0xe/0x21 <4>[16919.062449] [<ffffffff98c50943>] ? handle_mm_fault+0x1663/0x1dec <4>[16919.062461] [<ffffffff993d36cc>] schedule_hrtimeout_range+0x13/0x15 <4>[16919.062474] [<ffffffff98c793f3>] ep_poll+0x555/0x5da <4>[16919.062485] [<ffffffff98c089fd>] ? __do_page_fault+0x434/0x45a <4>[16919.062498] [<ffffffff98c6e435>] ? SyS_fcntl+0x539/0x55f <4>[16919.062509] [<ffffffff98c11b1f>] ? cpus_share_cache+0x36/0x36 <4>[16919.062520] [<ffffffff98c7a42c>] SyS_epoll_pwait+0x11a/0x1b7 <4>[16919.062532] [<ffffffff993d6890>] sysenter_dispatch+0xc/0x23 ... <6>[16919.606951] sending NMI to all CPUs: <4>[16919.608047] NMI backtrace for cpu 0 <4>[16919.608055] CPU: 0 PID: 1524 Comm: cras Not tainted 3.18.0-18756-ga8a8ec873b8c #1 <4>[16919.608066] Hardware name: GOOGLE Banon, BIOS Google_Banon.7287.422.0 07/15/2018 <4>[16919.608077] task: ffff880055ddc050 ti: ffff880053ac0000 task.ti: ffff880053ac0000 <4>[16919.608088] RIP: 0010:[<ffffffff98c232e2>] [<ffffffff98c232e2>] do_raw_spin_lock+0x19/0x119 <4>[16919.608100] RSP: 0000:ffff880053ac38e8 EFLAGS: 00000246 <4>[16919.608108] RAX: ffffffff994354c0 RBX: ffff880060086640 RCX: 0000000000000000 <4>[16919.608118] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880060086640 <4>[16919.608129] RBP: ffff880053ac3908 R08: 0000000000000000 R09: 0000000000000001 <4>[16919.608140] R10: 0000000000000400 R11: 0000000000000400 R12: ffff88002362d3f0 <4>[16919.608150] R13: 0000000000000000 R14: ffff880053ac39f8 R15: 0000000000000002 <4>[16919.608161] FS: 00007eeb780e8700(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000 <4>[16919.608172] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[16919.608182] CR2: 00000000ff809060 CR3: 000000005388e000 CR4: 0000000000100770 <4>[16919.608192] Stack: <4>[16919.608197] 0000000000000000 ffff880060086640 ffff88002362d3f0 0000000000000000 <4>[16919.608207] ffff880053ac3918 ffffffff993d3c40 ffff880053ac3938 ffffffff98c4a53e <4>[16919.608221] 0000000000000000 ffff88002362d3f0 ffff880053ac3968 ffffffff98c6437d <4>[16919.608232] Call Trace: <4>[16919.608237] [<ffffffff993d3c40>] _raw_spin_lock+0x16/0x18 <4>[16919.608246] [<ffffffff98c4a53e>] list_lru_count_node+0x1e/0x5a <4>[16919.608255] [<ffffffff98c6437d>] super_cache_count+0x53/0x9d <4>[16919.608264] [<ffffffff98c6432a>] ? put_filp+0x44/0x44 <4>[16919.608272] [<ffffffff98c44bee>] try_to_free_pages+0x431/0xba0 <4>[16919.608281] [<ffffffff98c2215a>] ? __wake_up_common+0x4f/0x7e <4>[16919.608290] [<ffffffff98c3dd20>] __alloc_pages_nodemask+0x1ca5/0x2675 <4>[16919.608300] [<ffffffffc05e7446>] ? zram_decompress_page+0xde/0x109 [zram] <4>[16919.608310] [<ffffffff98c5a773>] read_swap_cache_async+0x91/0x1e0 <4>[16919.608319] [<ffffffff98c5aa26>] swapin_readahead+0x164/0x187 <4>[16919.608328] [<ffffffffc05e7581>] ? zram_slot_free_notify+0x65/0x70 [zram] <4>[16919.608338] [<ffffffff98c50198>] handle_mm_fault+0xeb8/0x1dec <4>[16919.608347] [<ffffffff98c088c8>] __do_page_fault+0x2ff/0x45a <4>[16919.608356] [<ffffffff98c08a2f>] ? do_page_fault+0xc/0xe <4>[16919.608365] [<ffffffff993d60b2>] ? page_fault+0x22/0x30 <4>[16919.608373] [<ffffffff98c34396>] ? seccomp_phase1+0x48/0x97 <4>[16919.608382] [<ffffffff98c08a2f>] do_page_fault+0xc/0xe <4>[16919.608390] [<ffffffff993d60b2>] page_fault+0x22/0x30 <4>[16919.608399] Code: 47 08 ed 1e af de 48 89 e5 48 89 47 10 89 47 0c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 52 48 89 fb 81 7f 04 ad 4e ad de <74> 0c 48 c7 c6 66 11 5d 99 e8 36 28 1d 00 65 48 8b 04 25 80 d5 It seems like the spin lock in list_lru_count_node() takes too much time and other tasks couldn't be scheduled. In kernel 3.18, list_lru_count_node() acquires the nlru->lock spin lock. It's not necessary to acquire the spinlock in list_lru_count_node() because: 1. list_lru_count_node() only read a long value nlru->nr_items, this read access should be atomic if nr_items is properly aligned. 2. In kernel 4.4, list_lru_count_node() doesn't acquire the spinlock to get nlru->nr_items. 3. list_lru_count_node() is only used to determine if some memory shrinker should be executed. So even list_lru_count_node() returns a wrong value, it's no big deal. I will send a patch to remove the spinlock acquisition in kernel 3.18 list_lru_count_node().
,
Jan 4
,
Jan 4
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e913b00119632bf590d109c8431e17012ca21c8a commit e913b00119632bf590d109c8431e17012ca21c8a Author: Kuo-Hsin Yang <vovoy@chromium.org> Date: Fri Jan 04 13:46:08 2019 CHROMIUM: remove the spinlock in list_lru_count_node() In some banon crash reports, we found that some process is trapped in the spinlock in list_lru_count_node() and prevents other processes from being scheduled and triggers hung task panic. Example Call Trace: [<ffffffff993d3c40>] _raw_spin_lock+0x16/0x18 [<ffffffff98c4a53e>] list_lru_count_node+0x1e/0x5a [<ffffffff98c6437d>] super_cache_count+0x53/0x9d [<ffffffff98c44bee>] try_to_free_pages+0x431/0xba0 [<ffffffff98c3dd20>] __alloc_pages_nodemask+0x1ca5/0x2675 [<ffffffff98c5a773>] read_swap_cache_async+0x91/0x1e0 [<ffffffff98c5aa26>] swapin_readahead+0x164/0x187 [<ffffffff98c50198>] handle_mm_fault+0xeb8/0x1dec [<ffffffff98c088c8>] __do_page_fault+0x2ff/0x45a [<ffffffff98c08a2f>] do_page_fault+0xc/0xe [<ffffffff993d60b2>] page_fault+0x22/0x30 In kernel 3.18, list_lru_count_node() acquires the nlru->lock spin lock. It's not necessary to acquire the spinlock in list_lru_count_node() because: 1. list_lru_count_node() only read a long value nlru->nr_items, this read access should be atomic if nr_items is properly aligned. 2. In kernel 4.4, list_lru_count_node() doesn't acquire the spinlock to get nlru->nr_items. 3. list_lru_count_node() is only used to determine if some memory shrinker should be executed. So even list_lru_count_node() returns a wrong value, it's no big deal. BUG=chromium:907329 TEST=check system in low memory condition Change-Id: Id7119466bfe9e710b9db5cc607e71191106658fa Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1394463 Commit-Ready: Vovo Yang <vovoy@chromium.org> Tested-by: Vovo Yang <vovoy@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/e913b00119632bf590d109c8431e17012ca21c8a/mm/list_lru.c |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sonny...@google.com
, Nov 21Status: Duplicate (was: Untriaged)