Pretty easy to get OOMs on Chromebook Plus |
|||||||||
Issue descriptionI've seen this a lot when freshly installing an app from Play Store, usually after doing stuff to eat a bit of memory in my system. I just reproduced this pretty easily though. In my case I had hardly used my Chromebook since I had last rebooted, but I did open some stuff to stress things. Here's what I had: 1. 5 pinned tabs open (gmail, calendar, irccloud, ...) 2. A gerrit tab open 3. A buganizer tab open 4. A google drive tab open 5. A (big) Google Doc open 6. A crosh tab open (running "top") 7. mossh (Chrome App) window open 8. carat (Chrome app) window open. ...and then I opened Squid (an Android app). My mouse got all janky and I kept waiting, then the browser quit and restarted. Somehow my feedback report has no logs, but maybe it somehow collided w/ me trying to collect them myself w/ chrome://net-internals. OK, I took a 2nd feedback report. Try: https://listnr.corp.google.com/product/208/report/85731232360 I also have the full logs from net-internals too. Here's stuff: [ 516.170949] TaskSchedulerFo invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=300 [ 516.170960] TaskSchedulerFo cpuset=chrome mems_allowed=0 [ 516.170972] CPU: 5 PID: 3546 Comm: TaskSchedulerFo Not tainted 4.4.153-15150-g43781e59fd96 #1 ... ... [ 516.211085] Mem-Info: [ 516.211101] active_anon:363371 inactive_anon:77321 isolated_anon:800 active_file:52202 inactive_file:44327 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:10885 slab_unreclaimable:21974 mapped:61250 shmem:80825 pagetables:14860 bounce:0 free:12204 free_pcp:147 free_cma:0 [ 516.211116] DMA free:48816kB min:45056kB low:56320kB high:67584kB active_anon:1453484kB inactive_anon:309284kB active_file:208808kB inactive_file:177308kB unevictable:0kB isolated(anon):3200kB isolated(file):0kB present:4059136kB managed:3904852kB mlocked:0kB dirty:0kB writeback:0kB mapped:245000kB shmem:323300kB slab_reclaimable:43540kB slab_unreclaimable:87896kB kernel_stack:31520kB pagetables:59440kB unstable:0kB bounce:0kB free_pcp:588kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10977908 all_unreclaimable? yes [ 516.211121] lowmem_reserve[]: 0 0 0 [ 516.211130] DMA: 353*4kB (ME) 3877*8kB (ME) 869*16kB (UME) 56*32kB (UM) 1*64kB (U) 2*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48444kB [ 516.211168] 212007 total pagecache pages [ 516.211172] 34654 pages in swap cache [ 516.211176] Swap cache stats: add 663006, delete 628352, find 16427/133473 [ 516.211179] Free swap = 1928852kB [ 516.211181] Total swap = 3813332kB [ 516.211184] 1014784 pages RAM [ 516.211187] 0 pages HighMem/MovableOnly [ 516.211190] 38571 pages reserved ... ... [ 516.212116] Out of memory: Kill process 4265 (chrome) score 838 or sacrifice child [ 516.212173] Killed process 4265 (chrome) total-vm:466476kB, anon-rss:44024kB, file-rss:31852kB [ 521.130977] Task <URL: 20> refused to die (killer <URL: 21> nvcsw=3914, nivcsw=3103) [ 521.232353] Task <URL: 20> refused to die (killer rs:main Q:Reg:554:535, nvcsw=1809, nivcsw=282) [ 521.344028] Task <URL: 20> refused to die (killer <URL: 22> nvcsw=9560, nivcsw=59117) [ 521.476779] Task <URL: 20> refused to die (killer ServiceWorker t:8272:8261, nvcsw=1008, nivcsw=2910) [ 525.989191] Task <URL: 20> refused to die (killer powerd:1469:1469, nvcsw=1006, nivcsw=4323) [ 526.091439] Task <URL: 20> refused to die (killer TaskSchedulerFo:8544:8533, nvcsw=29, nivcsw=8) [ 530.690160] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -110 [ 530.690187] mwifiex_pcie 0000:01:00.0: failed to get signal information [ 541.193284] TaskSchedulerFo invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=578 [ 541.193304] TaskSchedulerFo cpuset=chrome mems_allowed=0 [ 541.193322] CPU: 3 PID: 8548 Comm: TaskSchedulerFo Not tainted 4.4.153-15150-g43781e59fd96 #1 ... ... [ 542.021201] Out of memory: Kill process 4240 (chrome) score 669 or sacrifice child [ 542.021362] Killed process 4240 (chrome) total-vm:526536kB, anon-rss:72888kB, file-rss:30436kB [ 544.723587] Task <URL: 23> refused to die (killer TaskSchedulerFo:8733:8294, nvcsw=5, nivcsw=882) [ 545.948371] Task <URL: 23> refused to die (killer <URL: 24> nvcsw=3263, nivcsw=5552) [ 546.049613] Task <URL: 23> refused to die (killer TaskSchedulerFo:1766:1568, nvcsw=74312, nivcsw=24635) [ 546.156861] Task <URL: 23> refused to die (killer evdev:1838:1568, nvcsw=5775, nivcsw=4442) [ 546.314229] Out of memory: Kill process 6318 (TaskSchedulerFo) score 669 or sacrifice child [ 546.314257] Killed process 6318 (TaskSchedulerFo) total-vm:526536kB, anon-rss:73136kB, file-rss:30436kB ... ... As you can see, system was in unresponsive state for > 30 seconds. Ick.
,
Oct 18
I took a quick look -- it doesn't look like the tab discarder even kicked in I also see a lot of messages like this: [8983:8983:1018/112128.666625:ERROR:tab_manager_delegate_chromeos.cc(76)] Set OOM score error: [8983:8983:1018/112138.613622:ERROR:object_proxy.cc(615)] Failed to call method: org.chromium.debugd.SetOomScoreAdj: object_path= /org/chromium/debugd: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. I'm not sure why that's happening -- maybe debugd is stuck on something? vmlog doesn't indicate a lot of constant swapping but there's some near the end of the log Vovo added kernel logging to low memory notifications in this CL: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1175594 But I don't see anything in the kernel log here -- how did that happen?
,
Oct 18
I was able to reproduce this on my Kevin fairly easily I pinned 4 tabs (gmail, calendar, buganizer, irccloud logged in) and had about 7-8 other tabs. Mostly gerrit CLs and a few crbugs, and finally I had one hterm window open. I opened Play store and tried to install Google keep -- almost instant lock up with OOM and a Chrome browser crash I also didn't have any low mem notifications and I saw this in one of the stacks: 2018-10-18T22:39:36.762270+00:00 ERR kernel: [96962.176940] Out of memory: Kill process 19326 (chrome) score 976 or sacrifice child 2018-10-18T22:39:36.762284+00:00 ERR kernel: [96962.177028] Killed process 19326 (chrome) total-vm:513336kB, anon-rss:46824kB, file-rss:12128kB 2018-10-18T22:39:37.245756+00:00 WARNING kernel: [96965.279620] Task chrome:19326 refused to die (killer TaskSchedulerFo:23764:22262, nvcsw=2, nivcsw=48) 2018-10-18T22:39:41.340761+00:00 WARNING kernel: [96969.374513] Task CompositorTileW:19336 refused to die (killer chrome:21434:21434, nvcsw=6284, nivcsw=9031) 2018-10-18T22:39:41.443763+00:00 WARNING kernel: [96969.477494] Task CompositorTileW:19336 refused to die (killer chrome:16030:16030, nvcsw=55089, nivcsw=69419) 2018-10-18T22:39:46.252786+00:00 WARNING kernel: [96974.286680] Task TaskSchedulerSe:19327 refused to die (killer watchdog:4081:3426, nvcsw=3474, nivcsw=1765) 2018-10-18T22:39:56.121876+00:00 WARNING kernel: [96984.155588] chrome invoked oom-killer: gfp_mask=0x24280c2, order=0, oom_score_adj=200 2018-10-18T22:39:56.121929+00:00 INFO kernel: [96984.155608] chrome cpuset=urgent mems_allowed=0 2018-10-18T22:39:56.121935+00:00 WARNING kernel: [96984.155628] CPU: 1 PID: 30184 Comm: chrome Not tainted 4.4.159-15314-g7e40e6c2daea #1 2018-10-18T22:39:56.121940+00:00 WARNING kernel: [96984.155633] Hardware name: Google Kevin (DT) 2018-10-18T22:39:56.121944+00:00 EMERG kernel: [96984.155640] Call trace: 2018-10-18T22:39:56.121949+00:00 WARNING kernel: [96984.155653] [<ffffffc000208828>] dump_backtrace+0x0/0x15c 2018-10-18T22:39:56.121954+00:00 WARNING kernel: [96984.155660] [<ffffffc00020881c>] show_stack+0x20/0x2c 2018-10-18T22:39:56.121958+00:00 WARNING kernel: [96984.155669] [<ffffffc0004ec518>] __dump_stack+0x20/0x28 2018-10-18T22:39:56.121963+00:00 WARNING kernel: [96984.155676] [<ffffffc0004ec4d4>] dump_stack+0x74/0x98 2018-10-18T22:39:56.121967+00:00 WARNING kernel: [96984.155686] [<ffffffc00031eaa4>] dump_header+0x50/0x170 2018-10-18T22:39:56.121972+00:00 WARNING kernel: [96984.155694] [<ffffffc00031e768>] oom_kill_process+0xac/0x398 2018-10-18T22:39:56.121977+00:00 WARNING kernel: [96984.155702] [<ffffffc00031eef4>] out_of_memory+0x244/0x290 2018-10-18T22:39:56.121981+00:00 WARNING kernel: [96984.155711] [<ffffffc00035df30>] __alloc_pages_nodemask+0x1190/0x1320 2018-10-18T22:39:56.121986+00:00 WARNING kernel: [96984.155721] [<ffffffc0005fbf14>] kbase_mem_alloc_page+0x50/0x188 2018-10-18T22:39:56.121991+00:00 WARNING kernel: [96984.155730] [<ffffffc0005fcb00>] kbase_mem_pool_alloc_pages+0x130/0x228 2018-10-18T22:39:56.121996+00:00 WARNING kernel: [96984.155738] [<ffffffc0005fcae8>] kbase_mem_pool_alloc_pages+0x118/0x228 2018-10-18T22:39:56.122001+00:00 WARNING kernel: [96984.155747] [<ffffffc0005e0784>] kbase_alloc_phy_pages_helper+0x74/0xe8 2018-10-18T22:39:56.122005+00:00 WARNING kernel: [96984.155755] [<ffffffc0005e0fc0>] kbase_alloc_phy_pages+0x48/0xa4 2018-10-18T22:39:56.122010+00:00 WARNING kernel: [96984.155764] [<ffffffc0005f1f68>] kbase_mem_alloc+0x284/0x56c 2018-10-18T22:39:56.122014+00:00 WARNING kernel: [96984.155772] [<ffffffc0005f7208>] kbase_ioctl+0xdac/0x10c4 2018-10-18T22:39:56.122019+00:00 WARNING kernel: [96984.155782] [<ffffffc0003bb4e8>] compat_SyS_ioctl+0x3b4/0x1b34 2018-10-18T22:39:56.122024+00:00 WARNING kernel: [96984.155791] [<ffffffc000203e60>] __sys_trace_return+0x0/0x4 2018-10-18T22:39:56.122028+00:00 WARNING kernel: [96984.155796] Mem-Info: 2018-10-18T22:39:56.122033+00:00 WARNING kernel: [96984.155810] active_anon:349215 inactive_anon:77174 isolated_anon:416 2018-10-18T22:39:56.122037+00:00 WARNING kernel: [96984.155810] active_file:47548 inactive_file:46928 isolated_file:0 2018-10-18T22:39:56.122041+00:00 WARNING kernel: [96984.155810] unevictable:0 dirty:0 writeback:0 unstable:0 2018-10-18T22:39:56.122046+00:00 WARNING kernel: [96984.155810] slab_reclaimable:13674 slab_unreclaimable:23394 2018-10-18T22:39:56.122051+00:00 WARNING kernel: [96984.155810] mapped:76943 shmem:107336 pagetables:14816 bounce:0 2018-10-18T22:39:56.122055+00:00 WARNING kernel: [96984.155810] free:11260 free_pcp:0 free_cma:0 2018-10-18T22:39:56.122060+00:00 WARNING kernel: [96984.155835] DMA free:45040kB min:45056kB low:56320kB high:67584kB active_anon:1396860kB inactive_anon:308696kB active_file:190192kB inactive_file:187712kB unevictable:0kB isolated(anon):1664kB isolated(file):0kB present:4059136kB managed:3904864kB mlocked:0kB dirty:0kB writeback:0kB mapped:307772kB shmem:429344kB slab_reclaimable:54696kB slab_unreclaimable:93576kB kernel_stack:29968kB pagetables:59264kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10370648 all_unreclaimable? yes 2018-10-18T22:39:56.122066+00:00 WARNING kernel: [96984.155843] lowmem_reserve[]: 0 0 0 2018-10-18T22:39:56.122071+00:00 WARNING kernel: [96984.155858] DMA: 6882*4kB (UME) 1619*8kB (UME) 279*16kB (UME) 3*32kB (ME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45040kB 2018-10-18T22:39:56.122076+00:00 WARNING kernel: [96984.155908] 203165 total pagecache pages 2018-10-18T22:39:56.122080+00:00 WARNING kernel: [96984.155917] 1353 pages in swap cache 2018-10-18T22:39:56.122085+00:00 WARNING kernel: [96984.155923] Swap cache stats: add 13419656, delete 13418303, find 1351777/7356263 2018-10-18T22:39:56.122089+00:00 WARNING kernel: [96984.155928] Free swap = 1787640kB 2018-10-18T22:39:56.122093+00:00 WARNING kernel: [96984.155932] Total swap = 3813340kB 2018-10-18T22:39:56.122098+00:00 WARNING kernel: [96984.155938] 1014784 pages RAM 2018-10-18T22:39:56.122103+00:00 WARNING kernel: [96984.155942] 0 pages HighMem/MovableOnly 2018-10-18T22:39:56.122107+00:00 WARNING kernel: [96984.155947] 38568 pages reserved I'm a little suspicious that ARC++ is triggering something bad like allocating a ton of graphics memory really quickly which is causing the OOM
,
Oct 19
Here's the feedback report from my reproduction: https://listnr.corp.google.com/report/85731617840 the OOM happens around 15:39 -- there are memd logs and interestingly it shows a low memory notification at 15:38 but no OOM -- I'm not sure why the OOM isn't showing up in memd here?
,
Oct 19
The original OOM reporting used kernel tracing and conflicted with its use from ARC++, so I had to back it out quickly, then replaced it with signals from the anomaly collector. So it was probably not operational in the version you're using.
,
Oct 19
dmesg in https://listnr.corp.google.com/product/208/report/85731232360 : [ 516.170949] TaskSchedulerFo invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=300 ... [ 516.211085] Mem-Info: [ 516.211116] DMA free:48816kB min:45056kB low:56320kB high:67584kB active_anon:1453484kB inactive_anon:309284kB active_file:208808kB inactive_file:177308kB unevictable:0kB isolated(anon):3200kB isolated(file):0kB present:4059136kB managed:3904852kB mlocked:0kB dirty:0kB writeback:0kB mapped:245000kB shmem:323300kB slab_reclaimable:43540kB slab_unreclaimable:87896kB kernel_stack:31520kB pagetables:59440kB unstable:0kB bounce:0kB free_pcp:588kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10977908 all_unreclaimable? yes GPU pins a lot memory and breaks memory reclaim, the symptom: shmem:323300kB > inactive_anon:309284kB This issue is related to "Premature OOM on ARM aarch64 device" https://crbug.com/875702
,
Oct 19
,
Oct 19
re #6 yeah i suspect somehow arc++ is allocating a lot of gfx memory very quickly. Do you know if the 323300 kb is pinned gfx memory?
,
Oct 19
Most of shmem:323300kB should be pinned gfx memory. A common pattern on linux gfx driver: gfx driver allocates memory via shmem so the memory can be shared between multiple processes and can be swapped when it's not used. Gfx driver pins some shmem it's using. I've seen many premature OOM reports (OOM while there are still a lot of anonymous memory and free swap), in most of the premature OOM reports, shmem > inactive_anon.
,
Oct 19
+ graphics folks Any ideas on how can we figure out who is allocating gfx mem and how quickly? I'm wondering if ARC is somehow allocating a lot of gfx mem for later use and wanted to get some traces. I see stuff in v4.4/drivers/gpu/arm/midgard/mali_linux_kbase_trace.h but I don't see stuff specifically related to allocating memory
,
Oct 19
@9: On ARM platforms, graphics memory cannot be swapped, sadly (well a tiny portion can, but it's not really significant). It also isn't allocated via shmem... @10: There isn't really a good way to track this; we usually use ad-hoc tools (for exampel you can strace the allocation ioctls, or you can trace at the GLES level with a tool like apitrace). It might be easier to debug this on intel platforms, where you have object-level reporting in the kernel. But overall it's pretty difficult.
,
Oct 19
I think it's ok that it cannot be swapped because we're also not using that much -- 300MB in the examples above, however I am guessing that the issue is the allocation speed is very high, so none of the mechanisms that prevent OOM have time to work. It sounds like mali likes to allocate larger regions of memory and then carve them up later, does that sound correct to you guys?
,
Oct 20
@12: Hmm :) Let me re-state that graphics memory on mali-based platforms isn't allocated via shmem. Therefore, looking at shmem as a proxy for graphics memory is incorrect on mali-based platforms.
,
Oct 20
I will check which module allocates these shmem memory.
,
Oct 31
I can reproduce this issue with the following steps on Bob(aarch64): 1. Open many https://edition.cnn.com tabs to consume memory 2. Launch Play store 3. Press Alt+Tab several times to switch window 4. OOM killer is invoked Feedback report from my reproduction: https://listnr.corp.google.com/product/208/report/85754904997 I applied the following patch to check drm/rockchip shmem allocation: https://crrev.com/c/1308251 In the feedback report, drm/rockchip allocated 34735 shmem pages (138940 kB) and caused the OOM killer invocation: <6>[ 182.649718] rockchip_gem_get_pages, alloc: 1000, total: 34735, max: 34735 <4>[ 205.628684] memd invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=0 Example stack trace to allocate shmem pages: dump_stack+0x74/0x98 shmem_read_mapping_page_gfp+0x78/0x468 drm_gem_get_pages+0xc0/0x170 rockchip_gem_get_pages (inlined) rockchip_gem_alloc_iommu (inlined) rockchip_gem_alloc_buf (inlined) rockchip_gem_create_object+0x6c/0x2bc rockchip_gem_create_with_handle+0x34/0x84 rockchip_gem_create_ioctl+0x34/0x4c drm_ioctl+0x1e8/0x440 drm_compat_ioctl+0x30/0x84 compat_SyS_ioctl+0x3b4/0x1b34
,
Oct 31
,
Oct 31
We can: 1. Mark the drm/rockchip allocated shmem pages as unevictable to avoid oom killer invocation. (similar as https://crrev.com/c/1290419 ) 2. Review ARC++ and Alt-tab preview window graphics buffer usage.
,
Oct 31
Nice investigation -- any reason not to go ahead and do #1 and mark the shmem pages as unevictable?
,
Oct 31
The alt-tab screen does use a lot of memory at the moment. Along with the UI team we have started working on optimizing some of these aspects, I am CCing people working on this, both as FYI and maybe because they have fixes.
,
Oct 31
I looked at the feedback report -- I also noticed that you hit the issue where the OOM killer is getting stuck trying to kill something which Doug was also seeing: <3>[ 209.909066] Killed process 5235 (chrome) total-vm:492896kB, anon-rss:6560kB, file-rss:49336kB <4>[ 212.236266] Task <URL: 13> refused to die (killer <URL: 14> nvcsw=1454, nivcsw=3721) <4>[ 213.311265] Task <URL: 13> refused to die (killer powerd:1349:1349, nvcsw=409, nivcsw=375) <4>[ 213.789207] Task <URL: 13> refused to die (killer powerd:1349:1349, nvcsw=472, nivcsw=379) <4>[ 213.890298] Task <URL: 13> refused to die (killer Chrome_SyncThre:2839:1366, nvcsw=521, nivcsw=762) <4>[ 214.728599] Task <URL: 13> refused to die (killer mali-mem-purge:1639:1606, nvcsw=2227, nivcsw=1514) <4>[ 215.158786] Task <URL: 13> refused to die (killer powerd:1349:1349, nvcsw=522, nivcsw=395) <4>[ 215.618208] Task <URL: 13> refused to die (killer powerd:1349:1349, nvcsw=554, nivcsw=404) <4>[ 218.169292] Task ScriptStreamer :5269 refused to die (killer shill:1547:1547, nvcsw=578, nivcsw=1477) <4>[ 218.377301] Task ScriptStreamer :5269 refused to die (killer TaskSchedulerFo:1585:1366, nvcsw=22372, nivcsw=5696) <4>[ 220.225357] Task CompositorTileW:5247 refused to die (killer Compositor:5345:5334, nvcsw=203, nivcsw=151) <6>[ 220.225381] CompositorTileW D ffffffc0002168d4 0 5247 1542 0x00440809 <0>[ 220.225396] Call trace: <4>[ 220.225412] [<ffffffc0002168d4>] __switch_to+0x94/0xa0 <4>[ 220.225423] [<ffffffc000953a60>] __schedule+0x420/0x9c4 <4>[ 220.225430] [<ffffffc0009535e4>] schedule+0x40/0x9c <4>[ 220.225438] [<ffffffc000954028>] schedule_preempt_disabled+0x24/0x3c <4>[ 220.225446] [<ffffffc00095578c>] __mutex_lock_common+0x264/0x4a8 <4>[ 220.225454] [<ffffffc000954f18>] __mutex_lock_slowpath+0x38/0x44 <4>[ 220.225461] [<ffffffc000954edc>] mutex_lock+0x64/0x68 <4>[ 220.225472] [<ffffffc0005f2670>] kbase_mem_evictable_reclaim_count_objects+0x24/0x64 <4>[ 220.225482] [<ffffffc000361998>] shrink_slab+0x148/0x448 <4>[ 220.225490] [<ffffffc000362f84>] shrink_zone+0x4f0/0x624 <4>[ 220.225499] [<ffffffc000328d18>] try_to_free_pages+0x2ac/0x674 <4>[ 220.225507] [<ffffffc00035dd38>] __alloc_pages_nodemask+0xb28/0x1270 <4>[ 220.225515] [<ffffffc000370ec4>] __read_swap_cache_async+0x94/0x1f0 <4>[ 220.225523] [<ffffffc0003491e4>] read_swap_cache_async+0x2c/0x6c <4>[ 220.225530] [<ffffffc00034936c>] swapin_readahead+0x148/0x17c <4>[ 220.225539] [<ffffffc00036abf0>] handle_mm_fault+0x868/0x1344 <4>[ 220.225548] [<ffffffc000218444>] do_page_fault+0x148/0x27c <4>[ 220.225555] [<ffffffc0002182c0>] do_translation_fault+0x50/0x8c <4>[ 220.225563] [<ffffffc000200304>] do_mem_abort+0x5c/0xd8 Doug has posted a WIP CL that would alleviate this here: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/471933
,
Nov 9
,
Nov 13
,
Nov 13
A thought re #20: the anomaly collector can easily be extended to detect OOM kill failures, if we think it may be worth it. It could either generate a crash report, or produce a memd event. The anomaly collector uses a lex-generated scanner, so adding more patterns will not have a performance impact (except for the size of its state machine, but I don't think that's a problem).
,
Nov 13
re #23 That could be interesting to make a metric on how long the OOM kill takes. I think we know we have issues with the OOM killer on 4.4 and earlier since we don't have the OOM reaper. You should file a separate bug for that, thanks.
,
Nov 14
Regarding the discussion about where mali allocates memory - we have two paths, the mali kernel driver and the rockchip drm driver. All of our shareable graphics memory (ie passed between processes or used with kms) gets allocated through rockchip drm and the drm_gem_get_pages() fix you mentioned here https://bugs.chromium.org/p/chromium/issues/detail?id=903298#c5 should address those. However, graphics memory allocated by the mali driver go through the kbase kernel driver and uses some other mechanism (not the drm framework, not shmem), as marcheu says. This includes textures and rendertargets and can also use a lot of memory. I'm not too familiar with the kbase driver myself, but any work on memory pressure on mali should at least account for these allocations too.
,
Nov 28
Re #25. The patch "mark pinned shmemfs pages as unevictable" [1] can only handle a special case -- marking all pages in a drm object/mapping as unevictable, that works for i915_gem_object_put_pages_gtt() and drm_gem_get_pages(). The kbase driver uses get_user_pages() to pin memory, which may only pin some pages in a mapping. To handle that case, a general page pinning API is required. Unfortunately, that RFC patch [2] has been staled for a long time, it would require changing many files in the kernel/mm to make VM_PINNED work. I think we could fix the issue in rockchip drm driver first. [1]: https://crrev.com/c/1328423 [2]: "mm: Introduce VM_PINNED and interfaces" https://lore.kernel.org/patchwork/patch/467593/
,
Jan 4
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ec1b8026d032b0c643cb94b899e5416585232942 commit ec1b8026d032b0c643cb94b899e5416585232942 Author: Kuo-Hsin Yang <vovoy@chromium.org> Date: Fri Jan 04 17:08:04 2019 FROMLIST: drm/gem: mark pinned pages as unevictable The gem drivers use shmemfs to allocate backing storage for gem objects. On Samsung Chromebook Plus, the drm/rockchip driver may call rockchip_gem_get_pages -> drm_gem_get_pages -> shmem_read_mapping_page to pin a lot of pages, breaking the page reclaim mechanism and causing oom-killer invocation. E.g. when the size of a zone is 3.9 GiB, the inactive_ratio is 5. If active_anon / inactive_anon < 5 and all pages in the inactive_anon lru are pinned, page reclaim would keep scanning inactive_anon lru without reclaiming memory. It breaks page reclaim when the rockchip driver only pins about 1/6 of the anon lru pages. Mark these pinned pages as unevictable to avoid the premature oom-killer invocation. See also similar patch on i915 driver [1]. [1]: https://patchwork.freedesktop.org/patch/msgid/20181106132324.17390-1-chris@chris-wilson.co.uk Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (backport from https://patchwork.freedesktop.org/patch/268406/) BUG=chromium:896805 TEST=check unevictable page count in bob Change-Id: I86db7a9f01fb52c02034535ae3d07303174bac08 Reviewed-on: https://chromium-review.googlesource.com/1352920 Commit-Ready: Vovo Yang <vovoy@chromium.org> Tested-by: Vovo Yang <vovoy@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> [modify] https://crrev.com/ec1b8026d032b0c643cb94b899e5416585232942/drivers/gpu/drm/drm_gem.c
,
Today
(8 hours ago)
How is this coming along, should we see improvement after the patch in #27 landed?
,
Today
(6 hours ago)
Yes, the patch in #27 should have fixed this issue. diander@, can you help to check if this issue is fixed on R73.11524.0.0 or newer build ? |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by sonnyrao@chromium.org
, Oct 18Owner: vovoy@chromium.org