When memory is low and things get OOM killed, sometimes system locks up (maybe ARC++ related) |
|||||||||||||||||||||
Issue description(credits to Doug for noticing this) Android processes (for instance, the binder) run with the default oom_score_adj value of 0. This means that they are all killable by the kernel OOM killer (the value -1000 makes them unkillable). However, the critical android processes are likely to be killed only after all Chrome renderers and Android apps are killed. I have to say "likely" because the oom_score_adj is only one factor in choosing the kill candidate (see issue 701848). Another is the process size (RSS + swap + page table pages). However, chrome renderers and android apps have oom_score_adj values between 300 and 1000, which is a lot: the units of oom_score_adj are 0.1% of total RAM, thus on a 4GB system each unit is equivalent to 4MB. The lowest adj is 300, corresponding to 1.2GB. Thus an android system process would have to be at least that size (or 600MB on a 2GB system) to be kernel-OOM-killed before a tab or an app. The question here is whether we want to make them OOM-unkillable, i.e. we'd rather panic, or whether it's OK to have the kernel OOM-kill android when we're out of memory. Note that no android apps are running, and no chrome tabs are open, but extensions are still loaded and running. (We should really start killing those too...) I am in favor of killing Android before panicking.
,
Mar 17 2017
Hmm, very dificult question. I think the answer is yes, we should prevent the oom killer from taking out android system-critical processes. Android will either restart the process anyways or behave eratically(eratic behavior seems to be the common failure mode here). However, if we are out of android apps to kill, I'd be in favor of killing the frameworks as a whole. Turn off the container because at that point it isn't doing anything useful and it's eating 100s of megs of ram. We'd have to be smart about restarting it when memory pressure relaxed so notification apps would work again.
,
Mar 17 2017
Better unkillable than hanging the system like what seems to happen today when binder gets OOM killed. :-P ...so for the short term making critical system processes seems to make sense...
,
Mar 20 2017
Since I just hit this personally, I'd be very curious to know if we can at least get some of this handled in short order? It seems that OOM kill of Android stuff is pretty broken right now because of the bad crash when binder is killed. Even if you're hardly using Android at all simply having Play Store enabled on your system means you're in an unhappy state.
,
Mar 21 2017
#2 agree. I hadn't thought about notifications, that complicates things, because it makes for really bad UX if notifications aren't reliable.
,
Mar 21 2017
,
Mar 21 2017
I'd be worried about UX when oom-killer starts killing Android processes. Shouldn't Android be handling what to kill or GC for anything that it spawns to free up mem? Would be good to have a way to notify Android about low memory situation so it could handle it in a more deterministic manner. Short of that in the near term, I also think we should make at least Android system-critical processes unkillable by oom-killer. And if these critical processes became too bloated which prevents users from doing what they want e.g. load a new tab, then turning off the container without relying on oom-killer seems reasonable. Hopefully that'll never be necessary with reasonable DRAM spec
,
Mar 22 2017
#8 Partly my fault, but we have to be careful in distinguishing normal and emergency behavior---although the two should be as close as possible. Emergency behavior is when there is a memory allocation request and no memory left. The kernel has two choices: it can kill some process, or it can panic. Panicking kills all processes, so it's seldom better than killing *some* processes. Right now we panic rather than killing, therefore this bug. I am changing the title to make this clearer. The immediate consequence of this reasoning is that for consistency we should shut down the entire Android framework when short on memory. Unfortunately this doesn't only introduce Android restart latency, but also destroys Android notifications. Maybe the strategy should be then: 1. "Discard candidate" can be Android app, Chrome tab, or Android framework. 2. Use LRU to discard Android apps and Chrome tabs, but treat the framework separately. 3. When the framework is LRU, and there is only one Chrome tab left (only two? Only "visible" tabs?), discard the framework, but pop up this dialog: "Android is being shut down for RAM shortage and will stop sending notifications" This doesn't necessarily work well, but I am not sure what else we can do.
,
Mar 22 2017
Hmmm, so maybe there are several bugs to address here. The problem I was seeing (that I thought this bug was about) is: 1. Memory starts getting low 2. For whatever reason, the Chrome Tab Discarder isn't able to react fast enough. Maybe we have it tuned wrong. Maybe there's a bug. Maybe memory was just allocated too quickly. We can discuss ways to make this better, but there will always be some times the tab discarder doesn't run. We can discuss that elsewhere (not here!). In my case, assume the tab discarder didn't free memory. 3. We get to the kernel OOM killer. 4. The OOM killer chooses a VIAP (very important android process) as the one to kill. Like "binder", for instance. 5. When the VIAP dies, the system is doomed. It might not crash right away, but within a finite amount of time (like 120 seconds) the kernel will detect a hung task and reboot. Possibly some parts of the system might freeze/hang during this waiting period, or possibly not. --- So, my assertion is that rather than kill the VIAP, the following would have been preferable: A) Kill a Chrome process. B) Kill an unimportant android process. C) Kill the whole android container. D) Just panic. -> In cases I've looked at I'm pretty sure that A) was possible, but we chose not to do it because we thought that the VIAP was a better candidate to kill. -> If it's too hard to figure out B) or C) right now, I'd advocate that A) + D) is still better than what we have today. --- As per dgreid, possibly we should also try to fix the system so that it doesn't fall on its face if a VIAP crashes, but even so it still probably makes sense not to purposefully kill one. === I'm going to try some experiments to actually show how the system behaves when a VIAP dies.
,
Mar 22 2017
My current efforts on this are related to trying to actually kill a "Binder_X" process and watch the system die. I think that's been relatively common with feedback reports I've seen. As per dgreid these are short-lived processes and are spawned for binder to communicate back to something, so maybe that explains why I don't see them normally in "ps aux". --- Looking in "messages" for a machine where I was having problems (attached), I notice: 2017-03-20T15:55:32.534852-07:00 INFO kernel: [18962.589904] [ 6041] 665367 6041 238332 0 106 4 4165 965 d.process.acore ... 2017-03-20T15:55:32.534912-07:00 ERR kernel: [18962.590018] Out of memory: Kill process 6041 (d.process.acore) score 966 or sacrifice child ... 2017-03-20T15:55:32.544008-07:00 INFO kernel: [18964.056499] [18856] 665367 6041 238332 15 106 4 4151 965 Binder_7 ... 2017-03-20T15:55:32.544070-07:00 ERR kernel: [18964.057098] Out of memory: Kill process 18856 (Binder_7) score 966 or sacrifice child So this "Binder_7" was spawned _after_ we killed "d.process.acore", but has the same "uid" and same OOM Score, so it seems like maybe it was created for the "d.process.acore" and maybe inherited its OOM score? ---
,
Mar 22 2017
Killing "Binder" processes alone doesn't seem to make this happen all the time... Let's see if I can find a pattern... --- b/35647865, "OoM-killer_chrome_crash.tar.gz", "messages": $ grep sacrifice messages 2017-02-05T22:29:02.500816-08:00 ERR kernel: [24286.111920] Out of memory: Kill process 7916 (e.process.gapps) score 980 or sacrifice child 2017-02-05T22:29:02.509046-08:00 ERR kernel: [24286.373930] Out of memory: Kill process 11860 (Binder_10) score 980 or sacrifice child 2017-02-05T22:29:08.290024-08:00 ERR kernel: [24292.813157] Out of memory: Kill process 26777 (rk.clouddpc.arc) score 960 or sacrifice child 2017-02-05T22:29:10.327657-08:00 ERR kernel: [24294.851191] Out of memory: Kill process 26791 (Binder_2) score 960 or sacrifice child 2017-02-05T22:29:20.294308-08:00 ERR kernel: [24304.032281] Out of memory: Kill process 12014 (gle.android.gms) score 941 or sacrifice child 2017-02-05T22:29:50.909243-08:00 ERR kernel: [24335.432407] Out of memory: Kill process 8110 (c.intent_helper) score 919 or sacrifice child 2017-02-05T22:29:51.154379-08:00 ERR kernel: [24335.441583] Out of memory: Kill process 8121 (Binder_2) score 919 or sacrifice child 2017-02-05T22:29:59.202457-08:00 ERR kernel: [24343.726336] Out of memory: Kill process 7835 (arc.applauncher) score 898 or sacrifice child 2017-02-05T22:30:02.553861-08:00 ERR kernel: [24347.046614] Out of memory: Kill process 8272 (rprisereporting) score 877 or sacrifice child ... ... 2017-02-05T22:30:28.409704-08:00 WARNING session_manager[1488]: [WARNING:liveness_checker_impl.cc(68)] Browser hang detected! 2017-02-05T22:30:28.410279-08:00 WARNING session_manager[1488]: [WARNING:liveness_checker_impl.cc(72)] Aborting browser process. 2017-02-05T22:30:28.410309-08:00 INFO session_manager[1488]: [INFO:browser_job.cc(124)] Terminating process: Browser did not respond to DBus liveness check. 2017-02-05T22:30:28.410319-08:00 INFO session_manager[1488]: [INFO:system_utils_impl.cc(110)] Sending 6 to 1515 as 1000 2017-02-05T22:30:30.341316-08:00 WARNING session_manager[1488]: [WARNING:browser_job.cc(132)] Aborting child process 1515's process group 3 seconds after sending signal 2017-02-05T22:30:30.341376-08:00 INFO session_manager[1488]: [INFO:browser_job.cc(116)] Terminating process group: Browser took more than 3 seconds to exit after signal. 2017-02-05T22:30:30.341386-08:00 INFO session_manager[1488]: [INFO:system_utils_impl.cc(110)] Sending 6 to -1515 as 1000 2017-02-05T22:30:31.046142-08:00 WARNING crash_reporter[27313]: [user] Received crash notification for chrome[1615] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-02-05T22:30:31.046646-08:00 WARNING crash_reporter[27313]: [ARC] Received crash notification for chrome[1615] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-02-05T22:30:31.049006-08:00 WARNING crash_reporter[27312]: [user] Received crash notification for chrome[1733] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-02-05T22:30:31.049086-08:00 WARNING crash_reporter[27314]: [user] Received crash notification for chrome[1710] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-02-05T22:30:31.049107-08:00 WARNING crash_reporter[27312]: [ARC] Received crash notification for chrome[1733] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-02-05T22:30:31.050571-08:00 WARNING crash_reporter[27314]: [ARC] Received crash notification for chrome[1710] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-02-05T22:30:31.164928-08:00 INFO kernel: [24375.646032] binder: 7404:7480 transaction failed 29189, size 328-0 2017-02-05T22:30:32.131740-08:00 WARNING crash_reporter[27328]: Received crash notification for chrome[1515] user 1000 (called directly) 2017-02-05T22:30:32.481673-08:00 WARNING crash_reporter[27354]: [user] Received crash notification for chrome[1515] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-02-05T22:30:32.482366-08:00 WARNING crash_reporter[27354]: [ARC] Received crash notification for chrome[1515] sig 6, user 1000 (ignoring - crash origin is not ARC) IIUC kHangDetectionIntervalDefaultSeconds is 60, so I think that means that the browser has been hung for between 60 seconds and 120 seconds. So any of those could have been it.
,
Mar 22 2017
Looking more closely at my crash (messages.2 from comment 11): $ grep sacrifice ~/Downloads/messages.2.txt 2017-03-20T15:55:32.534125-07:00 ERR kernel: [18957.450399] Out of memory: Kill process 9556 (ndroid.gms:snet) score 985 or sacrifice child 2017-03-20T15:55:32.534912-07:00 ERR kernel: [18962.590018] Out of memory: Kill process 6041 (d.process.acore) score 966 or sacrifice child 2017-03-20T15:55:32.544070-07:00 ERR kernel: [18964.057098] Out of memory: Kill process 18856 (Binder_7) score 966 or sacrifice child 2017-03-20T15:55:32.544931-07:00 ERR kernel: [18967.234454] Out of memory: Kill process 5976 (e.process.gapps) score 949 or sacrifice child 2017-03-20T15:55:32.545729-07:00 ERR kernel: [18972.399221] Out of memory: Kill process 9974 (.android.gms.ui) score 931 or sacrifice child 2017-03-20T15:55:32.546457-07:00 ERR kernel: [18977.249886] Out of memory: Kill process 6190 (id.gms.unstable) score 919 or sacrifice child 2017-03-20T15:55:41.821899-07:00 ERR kernel: [18988.743190] Out of memory: Kill process 6569 (ndroid.settings) score 896 or sacrifice child 2017-03-20T15:55:46.146568-07:00 ERR kernel: [18992.756865] Out of memory: Kill process 5702 (d.process.media) score 879 or sacrifice child 2017-03-20T15:55:52.548938-07:00 ERR kernel: [18999.470414] Out of memory: Kill process 5585 (e.process.gapps) score 861 or sacrifice child 2017-03-20T15:55:54.920177-07:00 ERR kernel: [18999.473870] Out of memory: Kill process 15358 (Binder_F) score 861 or sacrifice child 2017-03-20T15:56:06.439886-07:00 ERR kernel: [19013.361888] Out of memory: Kill process 5707 (gle.android.gms) score 847 or sacrifice child 2017-03-20T15:56:20.526220-07:00 ERR kernel: [19027.447412] Out of memory: Kill process 5730 (Binder_2) score 847 or sacrifice child ... ... 2017-03-20T15:57:37.543721-07:00 WARNING session_manager[1302]: [WARNING:liveness_checker_impl.cc(68)] Browser hang detected! 2017-03-20T15:57:37.554394-07:00 WARNING session_manager[1302]: [WARNING:liveness_checker_impl.cc(72)] Aborting browser process. 2017-03-20T15:57:37.555689-07:00 INFO session_manager[1302]: [INFO:browser_job.cc(157)] Terminating process: Browser did not respond to DBus liveness check. 2017-03-20T15:57:37.555815-07:00 INFO session_manager[1302]: [INFO:system_utils_impl.cc(110)] Sending 6 to 1370 as 1000 2017-03-20T15:57:40.436358-07:00 WARNING kernel: [19107.357368] udevd[189]: seq 2719 '/devices/virtual/xt_idletimer/timers' is taking a long time 2017-03-20T15:57:41.036557-07:00 WARNING session_manager[1302]: [WARNING:browser_job.cc(165)] Aborting child process 1370's process group 3 seconds after sending signal 2017-03-20T15:57:41.036714-07:00 INFO session_manager[1302]: [INFO:browser_job.cc(149)] Terminating process group: Browser took more than 3 seconds to exit after signal. 2017-03-20T15:57:41.036778-07:00 INFO session_manager[1302]: [INFO:system_utils_impl.cc(110)] Sending 6 to -1370 as 1000 2017-03-20T15:57:41.410436-07:00 WARNING crash_reporter[29916]: [user] Received crash notification for chrome[1633] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-03-20T15:57:41.410977-07:00 WARNING crash_reporter[29916]: [ARC] Received crash notification for chrome[1633] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-03-20T15:57:42.106383-07:00 WARNING crash_reporter[29927]: Received crash notification for chrome[1370] user 1000 (called directly) 2017-03-20T15:57:42.149559-07:00 INFO kernel: [19109.070859] binder: 4168:5160 transaction failed 29189, size 328-0 2017-03-20T15:57:42.551664-07:00 WARNING crash_reporter[29948]: [user] Received crash notification for chrome[1370] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-03-20T15:57:42.552048-07:00 WARNING crash_reporter[29948]: [ARC] Received crash notification for chrome[1370] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-03-20T15:57:42.561921-07:00 WARNING crash_reporter[29949]: Received crash notification for chrome-crash-unknown-process[29890] user 1000 (called directly) 2017-03-20T15:57:42.571654-07:00 INFO kernel: [19109.492945] Watchdog[1707]: unhandled level 1 translation fault (11) at 0x00000000, esr 0x92000045 2017-03-20T15:57:42.571710-07:00 ALERT kernel: [19109.492955] pgd = ffffffc0e7998000 2017-03-20T15:57:42.571718-07:00 ALERT kernel: [19109.492959] [00000000] *pgd=0000000000000000, *pud=0000000000000000 2017-03-20T15:57:42.571724-07:00 WARNING kernel: [19109.492966] 2017-03-20T15:57:42.571730-07:00 WARNING kernel: [19109.492972] CPU: 4 PID: 1707 Comm: Watchdog Not tainted 4.4.35-06794-g75ce794 #1 2017-03-20T15:57:42.571736-07:00 WARNING kernel: [19109.492976] Hardware name: Google Kevin (DT) 2017-03-20T15:57:42.571741-07:00 WARNING kernel: [19109.492980] task: ffffffc0ea7b2700 ti: ffffffc0e514c000 task.ti: ffffffc0e514c000 2017-03-20T15:57:42.571748-07:00 WARNING kernel: [19109.492985] PC is at 0xaebf8cf0 2017-03-20T15:57:42.571753-07:00 WARNING kernel: [19109.492989] LR is at 0xee2886d3 2017-03-20T15:57:42.571760-07:00 WARNING kernel: [19109.492992] pc : [<00000000aebf8cf0>] lr : [<00000000ee2886d3>] pstate: 600f0030 2017-03-20T15:57:42.571766-07:00 WARNING kernel: [19109.492996] sp : 00000000ea4e5a48 2017-03-20T15:57:42.571771-07:00 WARNING kernel: [19109.492999] x12: 00000000ee2d8714 2017-03-20T15:57:42.571813-07:00 WARNING kernel: [19109.493004] x11: 0000000000000000 x10: 00000000ea4e5bb0 2017-03-20T15:57:42.571820-07:00 WARNING kernel: [19109.493011] x9 : 00000000ea4e5bbc x8 : 00000000ea4e5ba8 2017-03-20T15:57:42.571826-07:00 WARNING kernel: [19109.493017] x7 : 00000000ea4e5a48 x6 : 00000000ea4e5a64 2017-03-20T15:57:42.571832-07:00 WARNING kernel: [19109.493023] x5 : 00000000b3320dd0 x4 : 00000000ea4e5a60 2017-03-20T15:57:42.571838-07:00 WARNING kernel: [19109.493029] x3 : 00000000ea4e5ab4 x2 : 0000000000000000 2017-03-20T15:57:42.571844-07:00 WARNING kernel: [19109.493036] x1 : 0000000000000000 x0 : 00000000ea4e5a60 2017-03-20T15:57:42.571850-07:00 WARNING kernel: [19109.493042] 2017-03-20T15:57:42.624555-07:00 WARNING crash_reporter[29957]: [user] Received crash notification for chrome[1699] sig 6, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-03-20T15:57:42.625951-07:00 WARNING crash_reporter[29950]: Received crash notification for chrome-crash-unknown-process[18799] user 1000 (called directly) 2017-03-20T15:57:42.626255-07:00 WARNING crash_reporter[29957]: [ARC] Received crash notification for chrome[1699] sig 6, user 1000 (ignoring - crash origin is not ARC) 2017-03-20T15:57:42.628515-07:00 WARNING crash_reporter[29954]: [user] Received crash notification for chrome[1676] sig 11, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2017-03-20T15:57:42.628965-07:00 WARNING crash_reporter[29954]: [ARC] Received crash notification for chrome[1676] sig 11, user 1000 (ignoring - crash origin is not ARC) 2017-03-20T15:57:42.708806-07:00 INFO session_manager[1302]: [INFO:child_exit_handler.cc(77)] Handling 1370 exit. 2017-03-20T15:57:42.708860-07:00 ERR session_manager[1302]: [ERROR:child_exit_handler.cc(85)] Exited with signal 6 --- Based on my analysis above, the hang must have happened _after_ 15:55:37. Assuming the crash from above shares the same root cause, then maybe it's not binder but instead "e.process.gapps" or "gle.android.gms"? ...or maybe something else is important? I also tracked down the crash report for the browser when it was killed due to the missing liveness check. It is 3c2edd6640000000. Nothing jumps out at me, though...
,
Mar 23 2017
Moving ownership to myself since: 1. I seem to be able to reproduce this or something similar using a combination of starting up ARC++ and running "balloon" to chew through memory 2. When I reproduce the problem, the system is in a clearly bad state where pretty much nothing is happening (other than interrupts firing) for ~2 minutes until the hung task detector comes along. 3. I'm not really sure that binder is involved here, or at least not sure that it's always involved here. -- At the moment I'm hesitant to speculate too much on details of what I've "figured out" since much of it doesn't make lots of sense. Certainly there are lots of tasks that are sitting there waiting to run and the system isn't doing anything to help make them run. I haven't yet been able to figure out why, but many things are blocked waiting for memory to become available or waiting on various disk / squashfs operations... Happy for any help folks want to give, or else I'll just plod along trying to understand this...
,
Mar 23 2017
Hi Doug, Since you are on it, could you play with increasing the low memory margin by writing a larger number to /sys/kernel/mm/chromeos-low_mem/margin and see if it makes any difference?
,
Mar 24 2017
@15: I'd rather not. IMHO the fact that the OOM killer can get us into this awful hung state is a bug and that bug needs to be addressed. We already have open bugs about adjusting the low memory margin and I think we can focus on tweaks and tests there.
,
Mar 24 2017
I made a bit of progress. At the moment I'm convinced that our OOM killer is often churning. It runs and then fails to pick a process to kill, but elsewhere it thinks it did so it keeps trying. I've got a set of steps to reproduce this problem and I've been slowly churning through the code to debug. More Monday.
,
Mar 27 2017
OK, here's where I'm at today. I got a bit blocked at the end because I was trying to hack up debugd and got into version mismatch hell w/ it, libchromeos, and the system that was on my device. :( I'm trying to rebuild everything from scratch now... --- Top priority: From go/arc-memory, some Android processes are supposed to have have negative OOM scores. I'm fairly certain that we're ignoring these and killing these "important" android tasks. I believe these processes aren't super great for killing and that's why we're triggering such problems. Specifically when I see OOM kills, all sorts of important-sounding Android processes have OOM scores like ~800 or ~900, not negative. ...and I've never seen an important Android process negative. cylee@: can you help figure out how to fix this? -- That being said, it looks like we're ticking a bunch of corner cases of the kernel here where the OOM killer really doesn't work super well. The main problem is that the OOM killer generally doesn't work super well when the task to be killed is blocked on a mutex (or similar). I'm still poking, prodding, and gathering logs, but I'm pretty sure we're in this problem: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to kill process 1001, for whatever reason. ...when the above happens we mark process 1001 as "MEMDIE" and we assume it will die soon. In fact, we avoid OOM killing any other process _until_ it does because this process get special access to reserved memory to help it die and we don't want to give that special access to lots of processes at the same time. See the comment in oom_scan_process_thread(). ...while we're waiting for the process to die, we spin like crazy across many CPUs, just for the fun of it. The "out_of_memory" function will silently return "true" indicating that it made process as long as any process is still marked as MEMDIE, which will cause us to loop and loop and loop in the allocation functions. Even better, we'll be joined by other processes looping because they'll notice we have the oom_lock (they assume we're making progress for them). See __alloc_pages_may_oom() So basically we've got a classic deadlock. -> Process 1000 can't make progress (which it needs to do before it releases the mutex) because it can't get any memory and nobody's allowed to OOM kill. -> Process 1001 can't finish dying (giving up its memory and then letting other things get killed) because it's waiting on the mutex held by process 1000. ...presumably we have to relax the restriction of only allowing one process to have MEMDIE at once and we just need to make sure that there's enough reserved space for everyone. NOTE: with Android processes we seem to end up in this state much more often than we did for Chrome. One place where we seem to get stuck on mutexes a lot: [ 119.172054] DOUG: select_bad_process 333: OOM_SCAN_ABORT: Binder_1-4149 (0, 0x00404040, 0x00000004) [ 119.172055] Call trace: [ 119.172064] [<ffffffc000204ff0>] __switch_to+0x9c/0xa8 [ 119.172069] [<ffffffc00093bd10>] __schedule+0x504/0x740 [ 119.172073] [<ffffffc00093bfd4>] schedule+0x88/0xa8 [ 119.172077] [<ffffffc00093c358>] schedule_preempt_disabled+0x28/0x44 [ 119.172081] [<ffffffc00093ddd4>] __mutex_lock_slowpath+0xf8/0x1a4 [ 119.172085] [<ffffffc00093decc>] mutex_lock+0x4c/0x68 [ 119.172091] [<ffffffc0007c0d30>] binder_thread_read+0x694/0x11bc [ 119.172095] [<ffffffc0007c4210>] binder_ioctl_write_read.constprop.46+0x1e8/0x318 [ 119.172099] [<ffffffc0007c46a0>] binder_ioctl+0x360/0x768 [ 119.172104] [<ffffffc0003b89c4>] compat_SyS_ioctl+0x134/0x10ac [ 119.173642] [<ffffffc000203e34>] el0_svc_naked+0x24/0x28 [ 119.173644] DOUG: end of stack } -- It's possible there are some other related bugs here, all related to tasks not dying fast enough: 1. Possibly there are cases where we're not properly giving priority to a dying process. In oom_scan_process_thread() we stop scanning with OOM_SCAN_ABORT when we see task_will_free_mem(). ...but I don't see where we tag that process with MEMDIE. In many other cases we _do_ mark the task with mark_oom_victim(). I'm pretty sure I found one deadlock where we were looping waiting for a process to finish. Note that alternatively we may be able to patch page_alloc.c to pay more attention to PF_EXITING. It seems to pay attention to it in some places, but not others. 2. In oom_kill_process() I think there's a bug when we choose a child to die in our place (such good parents we are!). If that child fails to die (see the mutex discussion above) then we'll keep choosing it over and over again. I think we need a call to oom_scan_process_thread() before the call to oom_badness() (actually, I think it's not super valid to ever call oom_badness() without first calling oom_scan_process_thread()). 3. In the OOM killing code you see "schedule_timeout_killable(1)". That will wait a different amount of time depending on the value of HZ. We have HZ=1000, so that will only give 1 ms. Possibly this should be a little longer? 4. We only do the "schedule_timeout_killable(1)" in the case where we actually tried to kill the process. ...but it seems like we'd want to do that in other cases too, like if we notice another process is dying. Basically any time we claim that we "made progress" from out_of_memory() it seems like we should have some sort of delay since we can't exactly kill things instantly (we mark them to die and then wait for them to get to userspace so they can be cleanly killed). 5. I'm less certain about this one, but in __alloc_pages_may_oom() it sure seems like the oom_lock should be moved to be only around the block of code commented as "Exhausted what can be done so it's blamo time". -- A few other interesting findings: * I believe I also found OOM killed tasks sitting waiting in squashfs_cache_get() a lot of the time. That pointed out CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU, which we probably want (need to benchmark). That's on my back burner to dig into. * I also found OOM killed tasks sitting waiting in kbase_mem_evictable_reclaim_count_objects() on a mutex. As per a previous investigation I did, grabbing a mutex in count_objects isn't ideal. This can easily be optimized and we can see if it helps our behavior when memory is low.
,
Mar 28 2017
Forked ARC++ issue to bug #706048
,
Mar 31 2017
Patches posted: 4.4: remote: https://chromium-review.googlesource.com/465186 UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks remote: https://chromium-review.googlesource.com/465187 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still ... remote: https://chromium-review.googlesource.com/465188 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place remote: https://chromium-review.googlesource.com/465189 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims 3.14: remote: https://chromium-review.googlesource.com/465468 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks remote: https://chromium-review.googlesource.com/465469 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still ... remote: https://chromium-review.googlesource.com/465470 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place remote: https://chromium-review.googlesource.com/465471 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims --- I'll try to port to some of our other kernels, but I will likely need some help testing on them...
,
Mar 31 2017
These are untested: --- 3.18: remote: https://chromium-review.googlesource.com/465472 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks remote: https://chromium-review.googlesource.com/465473 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still ... remote: https://chromium-review.googlesource.com/465474 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place remote: https://chromium-review.googlesource.com/465475 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims We probably want to test and apply that since ARC++ seems to make this worse... --- 3.10 (only 3 patches needed): remote: https://chromium-review.googlesource.com/465411 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks remote: https://chromium-review.googlesource.com/465412 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place remote: https://chromium-review.googlesource.com/465413 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims --- 3.8: remote: https://chromium-review.googlesource.com/465586 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks remote: https://chromium-review.googlesource.com/465587 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place remote: https://chromium-review.googlesource.com/465588 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims We could confirm basic OOMing here and if we're good then we can maybe just mark this and 3.10 to go in? ====
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88bdca832d35cb1a28436ee41626da92c2f3251e commit 88bdca832d35cb1a28436ee41626da92c2f3251e Author: Johannes Weiner <hannes@cmpxchg.org> Date: Fri Mar 31 22:16:21 2017 UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465186 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/88bdca832d35cb1a28436ee41626da92c2f3251e/mm/oom_kill.c
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/222d652a0495a51be1d799772ec669e169ca9496 commit 222d652a0495a51be1d799772ec669e169ca9496 Author: Douglas Anderson <dianders@chromium.org> Date: Fri Mar 31 22:16:22 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465187 [modify] https://crrev.com/222d652a0495a51be1d799772ec669e169ca9496/mm/oom_kill.c
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe commit d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe Author: Douglas Anderson <dianders@chromium.org> Date: Fri Mar 31 22:16:23 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465188 [modify] https://crrev.com/d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe/mm/oom_kill.c
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5 commit ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5 Author: Douglas Anderson <dianders@chromium.org> Date: Fri Mar 31 22:16:24 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465189 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fcdc42d2699d037a399f655571637e093e9fe09e commit fcdc42d2699d037a399f655571637e093e9fe09e Author: Johannes Weiner <hannes@cmpxchg.org> Date: Sat Apr 01 01:56:57 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465586 [modify] https://crrev.com/fcdc42d2699d037a399f655571637e093e9fe09e/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6d287f9bc9e24b57f938155368784a755204e64c commit 6d287f9bc9e24b57f938155368784a755204e64c Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 01:56:58 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465587 [modify] https://crrev.com/6d287f9bc9e24b57f938155368784a755204e64c/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ea803c877bd53ab8a15e7ff6ed8a20b766b247e5 commit ea803c877bd53ab8a15e7ff6ed8a20b766b247e5 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 01:56:59 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465588 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/ea803c877bd53ab8a15e7ff6ed8a20b766b247e5/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/60728a964498e1200265b94d414cc169b1cdb840 commit 60728a964498e1200265b94d414cc169b1cdb840 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Sat Apr 01 05:33:30 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465472 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/60728a964498e1200265b94d414cc169b1cdb840/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b74ed4c3bfcdf15339adf67e28e2b2525d8270c7 commit b74ed4c3bfcdf15339adf67e28e2b2525d8270c7 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:31 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465473 [modify] https://crrev.com/b74ed4c3bfcdf15339adf67e28e2b2525d8270c7/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e8278e9f12d0f82e39c46452420cf2e74cb5a228 commit e8278e9f12d0f82e39c46452420cf2e74cb5a228 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:32 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465474 [modify] https://crrev.com/e8278e9f12d0f82e39c46452420cf2e74cb5a228/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/25d68b7fa97d3b331e4ac5e1ff0c8399574569bf commit 25d68b7fa97d3b331e4ac5e1ff0c8399574569bf Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:33 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465475 Tested-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/25d68b7fa97d3b331e4ac5e1ff0c8399574569bf/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8b4a16f82833c8b9f8c3d9cf67898c38354c65c0 commit 8b4a16f82833c8b9f8c3d9cf67898c38354c65c0 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Sat Apr 01 05:33:25 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465468 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/8b4a16f82833c8b9f8c3d9cf67898c38354c65c0/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e792e36e5d689f3daf93a70741ca1d407deb4513 commit e792e36e5d689f3daf93a70741ca1d407deb4513 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:27 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465469 [modify] https://crrev.com/e792e36e5d689f3daf93a70741ca1d407deb4513/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/195db82069300f8440a507faa20a4649ad25c022 commit 195db82069300f8440a507faa20a4649ad25c022 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:28 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465470 [modify] https://crrev.com/195db82069300f8440a507faa20a4649ad25c022/mm/oom_kill.c
,
Apr 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fc300ccd3b96963b018fbcf1bd90570b7eb512b9 commit fc300ccd3b96963b018fbcf1bd90570b7eb512b9 Author: Douglas Anderson <dianders@chromium.org> Date: Sat Apr 01 05:33:29 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465471 Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/fc300ccd3b96963b018fbcf1bd90570b7eb512b9/mm/oom_kill.c
,
Apr 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b4b86c11b400271aaec13c9b56f50e0509d7aaec commit b4b86c11b400271aaec13c9b56f50e0509d7aaec Author: Johannes Weiner <hannes@cmpxchg.org> Date: Sun Apr 02 04:52:26 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465411 Reviewed-by: Guenter Roeck <groeck@chromium.org> [modify] https://crrev.com/b4b86c11b400271aaec13c9b56f50e0509d7aaec/mm/oom_kill.c
,
Apr 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6e3f395bb36e20d04fb83ac945bab47410e12e86 commit 6e3f395bb36e20d04fb83ac945bab47410e12e86 Author: Douglas Anderson <dianders@chromium.org> Date: Sun Apr 02 04:52:27 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465412 Reviewed-by: Guenter Roeck <groeck@chromium.org> [modify] https://crrev.com/6e3f395bb36e20d04fb83ac945bab47410e12e86/mm/oom_kill.c
,
Apr 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/aa071ca4b372e1e30701c56fe50823f34b3bc9f6 commit aa071ca4b372e1e30701c56fe50823f34b3bc9f6 Author: Douglas Anderson <dianders@chromium.org> Date: Sun Apr 02 04:52:28 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465413 Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/aa071ca4b372e1e30701c56fe50823f34b3bc9f6/mm/oom_kill.c
,
Apr 3 2017
Landed in all kernels in M59. Bernie: I'd love to hear your thoughts on this for M58. --- 1. This is not a new issue and not a regression. However we're hitting it a lot more because: 1a) ARC++ triggers this much more easily and ARC++ usage is increasing on kernels 3.14, 3.18, and 4.4. 1b) Possibly our adjustments of OOM scores is making us trigger this more. 1c) Possibly a change to the browser and how it works with subprocesses is making us trigger this more (bug #649116) 2. Change landed across all our kernels between Friday and Saturday, so not much bake time yet. 3. I have done the most testing with kernel 4.4 and second most with 3.14. I backported this to other kernels (and needed to make some small changes) and did light testing there. Dylan did a bit of testing on 3.18. I did a bit of testing on 3.8 (no ARC++ though). I didn't test 3.10, but code is nearly the same as 3.8. 4. Change only affects the OOM killer. Should be zero impact when memory isn't tight. When memory is tight the new code will be run. 5. The change has been mostly tested synthetically. It's a bit hard to quickly get your device into a low memory situation where the OOM killer runs during normal usage. --- To me it seems like this would be good to get into R58, but I value other opinions. --- It would be plausible to get QA to try to stress memory usage on at least one device per kernel version, ideally with ARC++ involved where possible. Searching through "file:///var/log/messages" for "refused to die" is a good way to confirm that my code ran. A few notes to QA: A) When stressing memory, if you allocate slowly enough you're supposed to get memory freed nicely by Chrome. I'm not sure how much this happens, but that's the idea. B) To really stress things out, you need to do something on the web which allocates very quickly. The best I've been able to figure out to do this is to use Facebook infinite scrolling with auto play videos. Find some feed with lots of videos and make sure high def autoplay is on. Start scrolling and see if you can get as many videos playing at once. C) Possibly you could find some Android apps that use lots of memory, but I'm not personally aware of them. D) If my code is _working_, you _expect_ to see "tab killed". AKA: if you see tab killed, that's a really, really good thing as far as this fix is concerned. Said another way: D-1. Ideally other parts of Chrome OS will notice that memory is low and shut things down cleanly. The kernel OOM killer (what we're fixing here) is the code that run when the other part of Chrome OS fails to free memory fast enough. It is known and expected that the kernel OOM killer will sometimes need to run since Chrome can't react as fast as the kernel can. D-2. When the kernel OOM killer runs the side effect is "tab killed". That's much better than the alternative of freezing or crashing the whole system. E) What you really _don't_ want to see is a full system hang (1-2 minutes of no responsiveness followed by a reboot). Actually, as long as you're not synthetically stressing the system w/ a command line tool (like balloon.arm), you don't want to see _any_ full reboots of the system. F) Even if you can't manage to reproduce the "refused to die", hopefully you can at least see the OOM killer running. Look for "OOM" in file:///var/log/messages
,
Apr 3 2017
If you are confident in it, and nothing bad has happened so far with these on ToT lets go for it for 58. We should be doing another beta by Wednesday so these should land before then if possible, I'll defer to you on if you want to see more action from ToT before merging.
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/eed40556449ebee13af39c8308ac02bfb252cf40 commit eed40556449ebee13af39c8308ac02bfb252cf40 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Apr 03 23:28:10 2017 UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465186 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit 88bdca832d35cb1a28436ee41626da92c2f3251e) Reviewed-on: https://chromium-review.googlesource.com/466559 [modify] https://crrev.com/eed40556449ebee13af39c8308ac02bfb252cf40/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3d3748ee6c0a5cac059e1fa65adf10c85870c890 commit 3d3748ee6c0a5cac059e1fa65adf10c85870c890 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:28:29 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465187 (cherry picked from commit 222d652a0495a51be1d799772ec669e169ca9496) Reviewed-on: https://chromium-review.googlesource.com/466560 [modify] https://crrev.com/3d3748ee6c0a5cac059e1fa65adf10c85870c890/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b4253e4b60fb6903693aa137bf2f2334970e03df commit b4253e4b60fb6903693aa137bf2f2334970e03df Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:28:57 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465188 (cherry picked from commit d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe) Reviewed-on: https://chromium-review.googlesource.com/466561 [modify] https://crrev.com/b4253e4b60fb6903693aa137bf2f2334970e03df/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2 commit 7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:29:25 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465189 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5) Reviewed-on: https://chromium-review.googlesource.com/466562 [modify] https://crrev.com/7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d40d12a0baa2776d91908c74f37a0d1c46eecb37 commit d40d12a0baa2776d91908c74f37a0d1c46eecb37 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Apr 03 23:30:15 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465472 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit 60728a964498e1200265b94d414cc169b1cdb840) Reviewed-on: https://chromium-review.googlesource.com/466563 [modify] https://crrev.com/d40d12a0baa2776d91908c74f37a0d1c46eecb37/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/bb3ab0662a46da9b135df2917f80da5b1e47c9ee commit bb3ab0662a46da9b135df2917f80da5b1e47c9ee Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:30:38 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465473 (cherry picked from commit b74ed4c3bfcdf15339adf67e28e2b2525d8270c7) Reviewed-on: https://chromium-review.googlesource.com/466564 [modify] https://crrev.com/bb3ab0662a46da9b135df2917f80da5b1e47c9ee/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9 commit f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:31:20 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465474 (cherry picked from commit e8278e9f12d0f82e39c46452420cf2e74cb5a228) Reviewed-on: https://chromium-review.googlesource.com/466565 [modify] https://crrev.com/f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2014267785d257ebc84d3f9da30ed0c5df41e2b0 commit 2014267785d257ebc84d3f9da30ed0c5df41e2b0 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:31:46 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465475 Tested-by: Dylan Reid <dgreid@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit 25d68b7fa97d3b331e4ac5e1ff0c8399574569bf) Reviewed-on: https://chromium-review.googlesource.com/466846 [modify] https://crrev.com/2014267785d257ebc84d3f9da30ed0c5df41e2b0/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/facb2dedf039748d0c736e4b00893e2923d83179 commit facb2dedf039748d0c736e4b00893e2923d83179 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Apr 03 23:33:04 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465468 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit 8b4a16f82833c8b9f8c3d9cf67898c38354c65c0) Reviewed-on: https://chromium-review.googlesource.com/466847 [modify] https://crrev.com/facb2dedf039748d0c736e4b00893e2923d83179/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a0a5e011f72a347ab78e1ac2381830d91b50f6cd commit a0a5e011f72a347ab78e1ac2381830d91b50f6cd Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:33:29 2017 CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- In the function find_lock_task_mm() the stated purpose is to find a subthread in the case that the original task no longer has an "->mm". Implied in the description is that we should pick the original process "p" if it still has "->mm" and we should only pick a subthread if needed. However, that's not what happens with the current implementation. Let's change the code to explicitly make sure that we return the original process "p" if it has a valid "->mm". Why do we care? A future patch will improve the OOM killer to sometimes allow more than one OOM victim to exist at once. If we do that without making find_lock_task_mm() prefer the original process then we can get into this loop: 1. Pick a process as a victim. 2. Don't kill the chosen process, but instead kill some other task that (perhaps) has already been previously marked as a victim. 3. Go back to step #1 and pick the same process as a victim (since it wasn't actually killed before). Even without the future patch, though, it still makes sense to pick the original process if possible. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465469 (cherry picked from commit e792e36e5d689f3daf93a70741ca1d407deb4513) Reviewed-on: https://chromium-review.googlesource.com/466848 [modify] https://crrev.com/a0a5e011f72a347ab78e1ac2381830d91b50f6cd/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b3ab459eb659ad748cdfd9b8f5074eed42c8b535 commit b3ab459eb659ad748cdfd9b8f5074eed42c8b535 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:33:52 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465470 (cherry picked from commit 195db82069300f8440a507faa20a4649ad25c022) Reviewed-on: https://chromium-review.googlesource.com/466849 [modify] https://crrev.com/b3ab459eb659ad748cdfd9b8f5074eed42c8b535/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b3ab459eb659ad748cdfd9b8f5074eed42c8b535 commit b3ab459eb659ad748cdfd9b8f5074eed42c8b535 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:33:52 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465470 (cherry picked from commit 195db82069300f8440a507faa20a4649ad25c022) Reviewed-on: https://chromium-review.googlesource.com/466849 [modify] https://crrev.com/b3ab459eb659ad748cdfd9b8f5074eed42c8b535/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/54435bd3425984b4ea5e6d8eaa8cbc59055b96e7 commit 54435bd3425984b4ea5e6d8eaa8cbc59055b96e7 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:34:18 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465471 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit fc300ccd3b96963b018fbcf1bd90570b7eb512b9) Reviewed-on: https://chromium-review.googlesource.com/466850 [modify] https://crrev.com/54435bd3425984b4ea5e6d8eaa8cbc59055b96e7/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/392aa7079e1b6ac13458ee46a01b9676e221a146 commit 392aa7079e1b6ac13458ee46a01b9676e221a146 Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Apr 03 23:35:05 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465411 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit b4b86c11b400271aaec13c9b56f50e0509d7aaec) Reviewed-on: https://chromium-review.googlesource.com/466851 [modify] https://crrev.com/392aa7079e1b6ac13458ee46a01b9676e221a146/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/9d5640372c747c33e57b8e6a9172854334921c61 commit 9d5640372c747c33e57b8e6a9172854334921c61 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:35:32 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465412 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 6e3f395bb36e20d04fb83ac945bab47410e12e86) Reviewed-on: https://chromium-review.googlesource.com/466852 [modify] https://crrev.com/9d5640372c747c33e57b8e6a9172854334921c61/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/58aa31ac416443efae4ff24085a55e143891c64d commit 58aa31ac416443efae4ff24085a55e143891c64d Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:35:52 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465413 Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit aa071ca4b372e1e30701c56fe50823f34b3bc9f6) Reviewed-on: https://chromium-review.googlesource.com/466853 [modify] https://crrev.com/58aa31ac416443efae4ff24085a55e143891c64d/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ae0debcbe000ccc368b801d24d8d80ddff4d708f commit ae0debcbe000ccc368b801d24d8d80ddff4d708f Author: Johannes Weiner <hannes@cmpxchg.org> Date: Mon Apr 03 23:36:37 2017 BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks When the OOM killer scans tasks and encounters a PF_EXITING one, it force-selects that task regardless of the score. The problem is that if that task got stuck waiting for some state the allocation site is holding, the OOM reaper can not move on to the next best victim. Frankly, I don't even know why we check for exiting tasks in the OOM killer. We've tried direct reclaim at least 15 times by the time we decide the system is OOM, there was plenty of time to exit and free memory; and a task might exit voluntarily right after we issue a kill. This is testing pure noise. Remove it. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were extra checked in 3.14, but the concept is still the same. Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Argangeli <andrea@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176) Reviewed-on: https://chromium-review.googlesource.com/465586 (cherry picked from commit fcdc42d2699d037a399f655571637e093e9fe09e) Reviewed-on: https://chromium-review.googlesource.com/466854 [modify] https://crrev.com/ae0debcbe000ccc368b801d24d8d80ddff4d708f/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/21f9394b508ab427db3ad9bf2504a1c045541796 commit 21f9394b508ab427db3ad9bf2504a1c045541796 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:37:01 2017 CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- Of the child of an OOM victim has a different "->mm" element than the victim, we may choose to kill the child in place of the victim. This is sane because if we killed the victim the child would die anyway, so why not see if we can save the parent? ...but, before we kill the child let's double-check that it's OK by calling oom_scan_process_thread(). I'll admit that as the code stands right now this is a totally useless thing to do. The only time oom_scan_process_thread() will tell us not to kill the child is if: 1. The child is already marked with MEMDIE. However, that's not possible because the code (as it stands today) won't let more than one process have MEMDIE. 2. The child has no "mm". ...but that shouldn't happen because the parent and child shouldn't share an "mm" if the child's "mm" is NULL. Now that I've explained why this patch is stupid and useless, here's why we want it anyway: in a future patch we'll make it (sometimes) possible for more than one process to be marked with MEMDIE. With that in mind, point #1 above is moot and suddenly this patch has merit. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system NOTE: this file is slightly different than its 4.4 counterpart (where the patch was originally written) due to different parameters in oom_scan_process_thread(). Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465587 (cherry picked from commit 6d287f9bc9e24b57f938155368784a755204e64c) Reviewed-on: https://chromium-review.googlesource.com/466855 [modify] https://crrev.com/21f9394b508ab427db3ad9bf2504a1c045541796/mm/oom_kill.c
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b879df4c109d88fbc3b3f3b255c62dafe97acab0 commit b879df4c109d88fbc3b3f3b255c62dafe97acab0 Author: Douglas Anderson <dianders@chromium.org> Date: Mon Apr 03 23:37:23 2017 CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal. --- It turns out that there's a pretty big flaw with the way the OOM killer works. If the chosen OOM kill victim doesn't die then the OOM killer totally stops working. Specifically once we've picked an OOM victim we won't ever pick another until the first victim dies. If that victim refuses to die then we'll silently loop until eventually some sort of watchdog reboots the system. This is easy to see in toy code. Just write a function that userspace can invoke that grabs a mutex twice. Now set the OOM score so that this is the preferred process to kill. The first invocation of the OOM killer will try to kill this process but won't be able to because the process will never get the mutex. All subsequent OOM killer invocations will be complete no-ops. This is easy to argue about in non-toy code too. Just imagine: A) Memory gets low and pretty much everyone is starved. B) Process 1000 has a mutex named "mutex" and is waiting to allocate memory. C) Process 1001 wants "mutex". D) We choose to OOM kill process 1001, maybe because it has an attractive looking OOM score. OOM kill is the last resort of the memory manager. That means it has no other ways to get memory to give to process 1000 in the example above and process 1000 will loop waiting forever, never releasing its mutex. Process 1001 will never get the mutex and thus can never die to either release memory or allow an additional OOM kill. Classic deadlock. This is seen in real life. Specifically while stressing the OOM killer on a Chromebook with Android processes running, the OOM killer likes to kill Android processes. Often these processes are sitting in binder_thread_read(). When we try to kill them they wake up and try to grab the binder mutex. Often the binder mutex is held by another process who is stuck trying to allocate memory. Ugh. We probably want to fix Binder to make this specific case behave better, but we can't fix every case. This is seen in real life in non-binder code as well. Specifically I've seen the victim process sitting waiting like this: schedule+0x88/0xa8 schedule_timeout+0x44/0x27c io_schedule_timeout+0x7c/0xbc bt_get+0x174/0x278 blk_mq_get_tag+0x4c/0xd4 __blk_mq_alloc_request+0x28/0x158 blk_mq_map_request+0x340/0x39c blk_sq_make_request+0xb0/0x3b8 generic_make_request+0xcc/0x174 submit_bio+0xe4/0x1d4 submit_bh_wbc.isra.35+0x124/0x138 submit_bh+0x2c/0x38 ll_rw_block+0xac/0xd8 squashfs_read_data+0x354/0x510 squashfs_readpage_block+0x2d8/0x490 squashfs_readpage+0x48c/0x600 __do_page_cache_readahead+0x1e4/0x248 filemap_fault+0x14c/0x3b8 __do_fault+0x78/0xf0 handle_mm_fault+0x744/0x1050 do_page_fault+0x140/0x2dc do_mem_abort+0x64/0xd8 While I didn't dig into why the process couldn't get unblocked in the squashfs case, it seems fairly sane to assume some other process is waiting on memory. Now that you (hopefully) believe we have a problem, we need a solution. One sane solution is to eventually let the OOM killer kill more than one process. We don't want to kill more than one process right away as per the dire warnings in the OOM killer code about exhausting our memory reserves and ending up in a livelock. While imperfect, it seems sane to perhaps use a heuristic here where we wait a bit of time (100 ms) for a process to die. If it fails, we give up and try another process. BUG=chromium:706048, chromium:702707 TEST=With whole series, eating memory doesn't hang system Conflicts: mm/oom_kill.c There were conflicts compared to the 4.4 version because of lots of churn in the oom kill code. Conceptually this is the same thing, though. Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962 Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/465588 Reviewed-by: Dylan Reid <dgreid@chromium.org> (cherry picked from commit ea803c877bd53ab8a15e7ff6ed8a20b766b247e5) Reviewed-on: https://chromium-review.googlesource.com/466856 [modify] https://crrev.com/b879df4c109d88fbc3b3f3b255c62dafe97acab0/mm/oom_kill.c
,
Apr 3 2017
,
Apr 3 2017
QA: please see comment 39 for testing notes
,
Apr 27 2017
Issue 649116 has been merged into this issue.
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/94f9ee2d35f666bced955ea4b4ee25beb1a18aea commit 94f9ee2d35f666bced955ea4b4ee25beb1a18aea Author: Sonny Rao <sonnyrao@chromium.org> Date: Wed May 10 06:20:10 2017 CHROMIUM: mm: lower oom dump burst limit to 1 Dumping tasks during oom seems to cause noticeable UI jank. The kernel has a mechanism for disabling the dump altogether, but the dump can still be useful to see in feedback reports. So, this lowers the default ratelimit burst from 10 to 1 so that we still get one report of what was consuming memory when we get OOMs. BUG=chromium:702707 TEST=induce an OOM and check for UI jank Change-Id: Ia0e73640ad706d8d4f0d6593a4d3ec36e5cd8728 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/501248 Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/94f9ee2d35f666bced955ea4b4ee25beb1a18aea/mm/oom_kill.c
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/df33c0ef46b833a96930500a54a0e729bb8ed642 commit df33c0ef46b833a96930500a54a0e729bb8ed642 Author: Sonny Rao <sonnyrao@chromium.org> Date: Wed May 10 06:20:09 2017 CHROMIUM: mm: lower oom dump burst limit to 1 Dumping tasks during oom seems to cause noticeable UI jank. The kernel has a mechanism for disabling the dump altogether, but the dump can still be useful to see in feedback reports. So, this lowers the default ratelimit burst from 10 to 1 so that we still get one report of what was consuming memory when we get OOMs. BUG=chromium:702707 TEST=induce an OOM and check for UI jank Change-Id: Ia0e73640ad706d8d4f0d6593a4d3ec36e5cd8728 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/501228 Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/df33c0ef46b833a96930500a54a0e729bb8ed642/mm/oom_kill.c
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/41088d1fcb294668f992627cbeee4c9d06f2e61f commit 41088d1fcb294668f992627cbeee4c9d06f2e61f Author: Sonny Rao <sonnyrao@chromium.org> Date: Wed May 10 18:58:24 2017 CHROMIUM: mm: lower oom dump burst limit to 1 Dumping tasks during oom seems to cause noticeable UI jank. The kernel has a mechanism for disabling the dump altogether, but the dump can still be useful to see in feedback reports. So, this lowers the default ratelimit burst from 10 to 1 so that we still get one report of what was consuming memory when we get OOMs. BUG=chromium:702707 TEST=induce an OOM and check for UI jank Change-Id: Ia0e73640ad706d8d4f0d6593a4d3ec36e5cd8728 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/501408 Commit-Ready: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/41088d1fcb294668f992627cbeee4c9d06f2e61f/mm/oom_kill.c
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7fbd9fcc0499ef438d22b5af12e53cb846ff7722 commit 7fbd9fcc0499ef438d22b5af12e53cb846ff7722 Author: Sonny Rao <sonnyrao@chromium.org> Date: Wed May 10 18:58:26 2017 CHROMIUM: mm: lower oom dump burst limit to 1 Dumping tasks during oom seems to cause noticeable UI jank. The kernel has a mechanism for disabling the dump altogether, but the dump can still be useful to see in feedback reports. So, this lowers the default ratelimit burst from 10 to 1 so that we still get one report of what was consuming memory when we get OOMs. BUG=chromium:702707 TEST=induce an OOM and check for UI jank Change-Id: Ia0e73640ad706d8d4f0d6593a4d3ec36e5cd8728 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/501310 Commit-Ready: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/7fbd9fcc0499ef438d22b5af12e53cb846ff7722/mm/oom_kill.c
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/735c96dc4cb5f73cc9ee62228336a72446f89b5d commit 735c96dc4cb5f73cc9ee62228336a72446f89b5d Author: Sonny Rao <sonnyrao@chromium.org> Date: Wed May 10 18:58:29 2017 CHROMIUM: mm: lower oom dump burst limit to 1 Dumping tasks during oom seems to cause noticeable UI jank. The kernel has a mechanism for disabling the dump altogether, but the dump can still be useful to see in feedback reports. So, this lowers the default ratelimit burst from 10 to 1 so that we still get one report of what was consuming memory when we get OOMs. BUG=chromium:702707 TEST=induce an OOM and check for UI jank Change-Id: Ia0e73640ad706d8d4f0d6593a4d3ec36e5cd8728 Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/501287 Commit-Ready: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/735c96dc4cb5f73cc9ee62228336a72446f89b5d/mm/oom_kill.c
,
Jun 15 2017
Issue 728194 has been merged into this issue.
,
Jan 22 2018
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by sonnyrao@chromium.org
, Mar 17 2017