caroline with new memory parameters gets OOM kills too soon |
||||||
Issue descriptionUser reports that system freezes for several seconds, then the screen blacks out for several more seconds, then comes back with several sad tabs. I asked the user to type a triple alt-volup-X while the system was frozen. This is the report: https://feedback.corp.google.com/product/208/neutron?lView=rd&lRSort=1&lROrder=2&lRFilter=1&lReport=57282709650 First thing I notice in console-ramoops: OOM kills start way too soon, with lots of swap space available: [ 91.172977] atmel_mxt_ts i2c-ATML0001:00: Status: 00 Config Checksum: 06cb89 [ 219.468263] entering low_mem (avail RAM = 409584 kB, avail swap 813608 kB) with lowest seen anon mem: 2122648 kB [ 226.663592] AudioOutputDevi invoked oom-killer: gfp_mask=0x2004d0, order=0, oom_score_adj=519 ... [ 226.664121] Normal free:68908kB min:68960kB low:86200kB high:103440kB active_anon:764688kB inactive_anon:255200kB active_file:101880kB inactive_file:87044kB unevictable:0kB isolated(anon):2432kB isolated(file):0kB present:2080768kB managed:2007916kB mlocked:0kB dirty:4kB writeback:8kB mapped:228616kB shmem:373820kB slab_reclaimable:25684kB slab_unreclaimable:43000kB kernel_stack:12320kB pagetables:31864kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:6174124 all_unreclaimable? yes [ 226.664156] lowmem_reserve[]: 0 0 0 0 [ 226.664166] DMA: 0*4kB 0*8kB 1*16kB (E) 2*32kB (UE) 1*64kB (E) 2*128kB (UE) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (UE) 2*4096kB (MR) = 15760kB [ 226.664205] DMA32: 5546*4kB (UEM) 3635*8kB (UM) 883*16kB (UEM) 1*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (R) = 73616kB [ 226.664238] Normal: 6743*4kB (UEM) 3198*8kB (UEM) 476*16kB (UEM) 28*32kB (UEM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (R) = 69260kB [ 226.664272] 288140 total pagecache pages [ 226.664278] 1022 pages in swap cache [ 226.664283] Swap cache stats: add 1966179, delete 1965157, find 4253/530087 [ 226.664290] Free swap = 820972kB [ 226.664295] Total swap = 3999996kB [ 226.664300] 1022443 pages RAM Also, we had just entered low mem, so tab discarding should have started but there are no discards in the chrome log. Then we get the alt-volup-X panic, attached below for convenience. Most processes are blocked inside sys_poll(). Some are allocating. Nothing stands out.
,
Apr 11 2017
It seems like the discarder should be working according to crbug.com/705185 -- and this was run on 59.0.3065.0 canary -- so I believe it should have the fixes from that bug
,
Apr 11 2017
Another example of premature OOM kills is in this report also from Chris: https://feedback.corp.google.com/#/Report/57280301703
,
Apr 11 2017
I wonder if the thread which is listening for the low memory notify can get blocked such that it doesn't get the notification immediately?
,
Apr 11 2017
I don't think that there is a "listening" thread any longer, it's been replaced by a polling thread. But the polling should be fairly frequent, about once a second.
,
Apr 11 2017
re #5 -- hmm I wonder if that is really frequent enough or not -- or even why it's better to poll rather than listen for the signal in the kernel...
,
Apr 11 2017
Polling is certainly worse for two reasons: 1. latency; 2. it prevents the system from quiescing. We used to wait on a select(). The change to polling was made a few years back by skuhne. I think tab discarding was added to other OSes as well, and those OSes don't have a low-memory notifier, so we lost this feature for the sake of unifying the code.
,
Apr 11 2017
,
Apr 12 2017
Looking at the original report, I see dm_bufio stuff all over the place. I'll make the same comments I did in bug #710857, comment #2. Maybe someone can test and land: https://chromium-review.googlesource.com/c/423253/ - UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count https://chromium-review.googlesource.com/c/423252/ - UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation ...but I guess maybe we should keep this bug about the fact that the tab discarder isn't running properly in this case...
,
Apr 13 2017
Oh amazing, we've seen bufio deadlocks since 2013 ( issue 248606 ). I can test these on a caroline. You don't have a specific test in mind, do you? I haven't reproduced this on my caroline, but maybe Chris can help me do it.
,
Apr 13 2017
Sure. I can set up the original unit tomorrow and see if we can repro.
,
Jun 13 2017
Ben, we can probably close this, right?
,
Jun 15 2017
Marked as fixed since R61-9635.
,
Jun 27 2017
Not reproducible in Chrome OS 9690.0.0, 61.0.3138.0. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by semenzato@chromium.org
, Apr 11 2017Labels: OS-Chrome