New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Can we do better w/ OOM scores of ARC++ processes?

Project Member Reported by diand...@chromium.org, Mar 28 2017

Issue description

As per go/arc-memory:

Some Android processes should have negative OOM adjust scores (whether you use oom_adj or oom_score_adj, the end result is that it should be negative):

---

-11
PERSISTENT_SERVICE_ADJ
This is a process that the system or a persistent process has bound to, and indicated it is important.

-12
PERSISTENT_PROC_ADJ
This is a system persistent process, such as telephony.  Definitely don't want to kill it, but doing so is not completely fatal.

-16
SYSTEM_ADJ
The system process runs at the default adjustment.

---

Processes marked like this are supposed to be _more important_ than the foreground Android app.  The same doc says foreground Android apps should be marked like:

0
FOREGROUND_APP_ADJ
This is the process running the current foreground app.  We'd really rather not kill it!

---

Right now it's pretty clear that this isn't happening.  If I look at OOM scores of the Android processes running when ARC++ is there none of them have a negative OOM score.  In fact, many of them are marked as the things to kill first.

This needs to change because:

* It appears that kill these persistent Android process are more likely to trigger kernel corner cases (bug #702707) and hang the whole system.

* These processes will likely just be respawned.

* Killing these processes will likely make Android stop working properly, so it's silly to kill them before killing an app.

---

IMHO this is the root cause of why we're getting so many crazy OOM fails.  Please treat this as very high priority to fix.
 
Showing comments 5 - 104 of 104 Older

Comment 5 by uekawa@google.com, Mar 28 2017

Cc: -dgreid@chromium.org nya@chromium.org

Comment 6 by uekawa@google.com, Mar 28 2017

Cc: dgreid@chromium.org
oops.
Here's a printout from an OOM kill:

[  367.463287] [ 4015] 665370  4015   235096       11      87       4     2890           975 externalstorage
[  367.473227] [ 4020] 665369  4020   236994      153      96       4     3236           800 d.process.media
[  367.493039] [ 4081] 665383  4081   236219        0      91       4     2993           700 hromium.arc.ime
[  367.512821] [ 4149] 665396  4149   235751       86      86       4     2904           825 id.printspooler
[  367.522814] [ 4168] 665373  4168   275404      443     175       4     6597           650 .gms.persistent
[  367.532896] [ 4202] 656360  4202   236711        0      93       4     3005           775 arc.applauncher
[  367.552125] [ 4271] 665373  4271   238760       30      96       4     3248           388 e.process.gapps
[  367.562182] [ 4289] 665389  4289   240652        0     100       4     3162           950 rk.clouddpc.arc
[  367.572242] [ 4333] 665373  4333   314894      123     208       4     7096           475 gle.android.gms
[  367.582292] [ 4401] 665376  4401   262716     9375     151       4     8908           300 android.vending
[  367.592317] [ 4496] 656360  4496   235134      154      85       4     2827           900 ndroid.keychain
[  367.602346] [ 4501] 665398  4501   237496      105      87       4     2904           925 marvin.talkback
[  367.612436] [ 4522] 665363  4522   236308      167      94       4     3040           750 c.intent_helper
[  367.622508] [ 4536] 665364  4536   235380      128      86       4     2871           875 rovisioning.arc
[  367.632586] [ 4549] 665382  4549   236169      128      87       4     2930           850 hromium.arc.gms
[  367.642585] [ 4562] 665368  4562   235377        5      83       4     2966           675 id.defcontainer
[  367.652976] [ 4578] 665366  4578   235778       63      94       4     2980           725 viders.calendar

===

Most of these don't seem like processes that ought to be OOM killed.

===

You can easily see oom scores by booting up ARC++ and forcing an OOM kill (echo f > /proc/sysrq-trigger)
All right!  I've got a "fix"!  :)

---

dianders@tictac:platform2 ((47da9097c913...))$ git diff
diff --git a/debugd/src/oom_adj_tool.cc b/debugd/src/oom_adj_tool.cc
index aac70b8d2b05..e87eb8e62737 100644
--- a/debugd/src/oom_adj_tool.cc
+++ b/debugd/src/oom_adj_tool.cc
@@ -25,6 +25,7 @@ namespace {
 
 constexpr char kProcfsDirFormat[] = "/proc/%d";
 constexpr char kOomScoreAdjFileFormat[] = "/proc/%d/oom_score_adj";
+constexpr char kCommFileFormat[] = "/proc/%d/comm";
 constexpr char kUidMapFileFormat[] = "/proc/%d/uid_map";
 
 // Though uid might be an unsigned int, some system call like setreuid() still
@@ -171,6 +172,13 @@ void OomScoreSetter::SetOne(
     return;
   }
 
+  // Only adjust Chrome
+  base::FilePath comm_file(base::StringPrintf(kCommFileFormat, pid));
+  std::string comm;
+  if (base::ReadFileToString(comm_file, &comm) && (comm != "chrome")) {
+    return;
+  }
+
   std::string score_str = std::to_string(score);
   const size_t len = score_str.length();
   base::FilePath oom_file(base::StringPrintf(kOomScoreAdjFileFormat, pid));

Just as a heads up, here's my testing on kevin:

1. Copy these two attachments to /usr/local/bin
2. Login to an account w/ ARC++ enabled.
3. Open a few tabs.  I opened "The Verge" and started streaming 3 youtube videos at once.  I also started the Play Store.
4. SSH over and type: "cd /usr/local/bin; nice ./launchBalloons.sh 6 950 100000"


Without my change to "debugd" you end up with a hang and the system reboots after 2 minutes or so.

With my change, a few tabs die and the system keeps running normally.

===

Certainly we could land the debugd change, but it would be better if we could do something a little less gross...
balloon.arm
22.4 KB Download
launchBalloons.sh
296 bytes View Download

Comment 10 by cylee@chromium.org, Mar 29 2017

It's a hard issue to identify which Android apps should be killed and which aren't. For example, here's the apps list when running 'dumpsys activity'

    PERS #24: sys   F/ /P  trm: 0 285:system/1000 (fixed)
    PERS #23: pers  F/ /P  trm: 0 480:com.android.systemui/u0a18 (fixed)
    Proc # 0: fore  F/A/T  trm: 0 654:org.chromium.arc.home/u0a1 (top-activity)
    Proc # 5: vis   F/ /IF trm: 0 768:com.google.android.gms.persistent/u0a13 (service)
        com.google.android.gms/com.google.android.location.fused.FusedLocationService<=Proc{285:system/1000}
    Proc # 3: vis   F/ /SB trm: 0 2008:com.google.android.music:main/u0a40 (service)
        com.google.android.music/.preferences.MusicPreferenceService$MusicPreferenceServiceBinder<=Proc{2579:com.google.android.music:ui/u0a40}
    Proc # 1: vis   F/A/T  trm: 0 2579:com.google.android.music:ui/u0a40 (visible)
    Proc #22: prcp  B/ /IB trm: 0 592:org.chromium.arc.ime/u0a23 (service)
        org.chromium.arc.ime/.ArcInputMethodService<=Proc{285:system/1000}
    Proc # 8: svc   B/ /S  trm: 0 1361:org.chromium.arc.crash_collector/u0a0 (started-services)
    Proc # 2: prev  B/ /S  trm: 0 2703:com.spotify.music/u0a39 (cch-started-ui-services)
    Proc # 7: cch   B/ /S  trm: 0 2166:com.android.vending/u0a16 (cch-started-services)
    Proc # 6: cch   B/ /CE trm: 0 741:com.google.process.gapps/u0a13 (cch-empty)
    Proc # 4: cch   B/ /S  trm: 0 828:com.google.android.gms/u0a13 (cch-started-services)
    Proc #11: cch+2 B/ /CE trm: 0 822:android.process.media/u0a9 (cch-empty)
    Proc #10: cch+2 B/ /CE trm: 0 2643:com.google.android.packageinstaller/u0a14 (cch-empty)
    Proc # 9: cch+2 B/ /CE trm: 0 1041:android.process.acore/u0a7 (cch-empty)
    Proc #14: cch+4 B/ /CE trm: 0 1149:com.google.android.apps.work.clouddpc.arc/u0a29 (cch-empty)
    Proc #13: cch+4 B/ /CE trm: 0 2282:com.google.android.gms.ui/u0a13 (cch-empty)
    Proc #12: cch+4 B/ /CE trm: 0 2503:com.android.printspooler/u0a36 (cch-empty)
    Proc #21: cch+6 B/ /S  trm: 0 1007:org.chromium.arc.intent_helper/u0a3 (cch-started-services)
    Proc #20: cch+6 B/ /S  trm: 0 709:org.chromium.arc.applauncher/1000 (cch-started-services)
    Proc #19: cch+6 B/ /S  trm: 0 1183:org.chromium.arc.file_system/u0a3 (cch-started-services)
    Proc #18: cch+6 B/ /S  trm: 0 1210:org.chromium.arc.tts/u0a24 (cch-started-services)
    Proc #17: cch+6 B/ /S  trm: 0 1243:com.google.android.tts/u0a33 (cch-empty)
    Proc #16: cch+6 B/ /CE trm: 0 1932:com.android.musicfx/u0a15 (cch-empty)
    Proc #15: cch+6 B/ /CE trm: 0 1658:com.google.android.apps.cloudprint/u0a30 (cch-empty)


Android has a lot of signals to identify process importance and process state, but none of them fit our need. For example, the column with cch+6 is a readable oom adj score returned by ProcessList.makeOomAdjString(), the column after the second slash (the one with IB, S, CE) is process state returned by ProcessList.makeProcStateString(), the 
tailing one in the parenthesis (like cch-empty) is a string type ProcessRecord.adjType. 

Currently the main signal used in Chrome is process state. But it's not a good signal to identify "important" processes from "non-important" processes either.
For example, you can see the process "com.spotify.music" has process state "S" (stands for PROCESS_STATE_SERVICE) when spotify is in the backgroun, and you may think it can be killed when memory is low. However there're other processes like org.chromium.arc.intent_helper, org.chromium.arc.file_system, are in process state "S" which you may think not kill-able. And they have even lower priority in terms of oom_adj (here oom_adj refers to a importance score calculated internally by Android, not related to system-wise oom_score_adj or oom_adj or oom_score). 

So maybe important processes like org.chromium.arc.intent_helper and org.chromium.arc.file_system should have some way to let Android know their importance and mark them like "PROCESS_STATE_PERSISTENT" ? I think we need somebody more familiar with Android to answer this.

A quick fix to this is to tighten the restriction. For example, only kill processes with process state "PROCESS_STATE_CACHED_EMPTY". I'm not so sure if it's safe because you can see processes like com.google.android.gms.ui and com.android.printspooler has that process state. I guess they'll respawn when needed though. 
If you have something you want me to try, let me know.  You also have my steps above to reproduce this problem.

Note that even though my steps are very synthetic this is a real problem that I've reproduced in the real world (and I've also seen a number of feedback reports).  IMHO until we do _something_ our low memory handling in Chrome OS will often lead to 2-minute hangs + crashes rather than just a few sad tabs. 

Comment 12 by cylee@chromium.org, Mar 29 2017

I'm working on a CL to kill only apps in process state "PROCESS_STATE_CACHED_EMPTY". 

What I'm trying to say is that it's almost impossible to distinguish "potential killable apps" from "important un-killable apps" nicely given the app info I can get from Android. So I'll be very conservative for now - never kill any "potentially visible" Android apps and maybe some not-that-important processes (Note that even a focused chrome tabs is killable in current implementation). This may be a feasible short term solution though.
Is the information about android processes available anywhere?
If not, how does android set the oom scores on a native android system?

Comment 14 by cylee@chromium.org, Mar 29 2017

Cc: elijahtaylor@chromium.org mitsuji@chromium.org
I guess the problem is that some processes you mentioned are NOT important from Android's point of view. It's only important in the context of ARC++ (or say, when ChromeOS is involved).

For example, looking at the list of processes #10. Native android may kill any processes when needed (maybe except "system" or "persistent" processes, I need to check it). Killing some of these processes may have bad impact from ChromeOS's point of view (e.g., filesystem ? I guess). But Android have no knowledge of it. Android looks take it as just a normal app. 

Perhaps we'll need a white list of "important apps to ChromeOS", I don't know. Could we have some ARC++ folks chime in ? I don't really work in core ARC++ team so my knowledge is limited. 

elijahtaylor ? mitsuji ?

Comment 15 by cylee@chromium.org, Mar 29 2017

Some random thought: maybe I just shouldn't kill any process with ".arc." in its name? :)
One note is that I have made some progress in making Android processes play nicer with the OOM killer.  I'm still poking with it at the moment, but it might help alleviate these problems?

Presumably, though, normal Android systems that aren't ARC++ don't have my patch (or some sort of equivalent) and don't run into hangs.  Thus I still make the assertion that we are somehow setting OOM scores differently than normal Android and that's why they don't hit this issue.  Maybe it makes sense to get a normal Android device and check out the OOM scores there?  

Comment 17 by cylee@chromium.org, Mar 30 2017

@16

I've checked on a production Android build. Except those marked as "system" and "persistent" processes, all others are killable (Having non-zero /proc/<pid>/oom_score).

For example, 
PERS #44: pers  F/ /P  trm: 0 6959:com.android.phone/1001 (fixed)
PERS #45: pers  T/ /PU trm: 0 6821:com.android.systemui/u0a33 (pers-top-ui)
are non-killable processes on Android. They're also non-killable on ChromeOS.

On the other hand, all the following processes are killable from Android's point of view, and they're also killable on ChromeoS
    Proc # 1: vis   F/ /T  trm: 0 7395:com.google.android.gms.persistent/u0a17 (service)
    Proc #11: svc   F/ /SB trm: 0 3480:com.google.android.gms/u0a17 (started-services)
    Proc #19: svcb  B/ /S  trm: 0 31031:android.process.media/u0a12 (started-services)
    
    Proc #10: cch   F/ /T  trm: 0 1892:com.android.vending/u0a28 (cch-empty)
    
    Proc #14: cch   B/ /CE trm: 0 9702:android.process.acore/u0a1 (cch-empty)
    
    Proc #20: cch+2 B/ /CE trm: 0 28002:com.android.printspooler/u0a79 (cch-empty)
    
I don't have a Kevin right now. So I use Samus to do an experiment. I tried to manually kill all Android apps by
$ pgrep -f org.chromium.arc.\|com.android\|com.google.android\|android.process | xargs kill -9
But the system is still alive. So I doubt merely killing those processes is sufficient to reproduce the problem. Maybe the problem only occur under memory pressure? Maybe it's some kind of kernel level deadlock as you mentioned in the other thread? Could you find anything suspicious in the log ?
Cc: lhchavez@chromium.org yusukes@chromium.org
+some ARC eng to weigh in

How are processes marked "persistent"?  I can imagine there are several we want to persist, and if there is a mechanism we can use it we should do that instead of some sort of a whitelist
Probably we can mark our processes via android:persistent attribute in AndroidManifest.xml. Checking..

https://developer.android.com/guide/topics/manifest/application-element.html

I tried to add android:persistent="true" to ArcCrashCollector/AndroidManifest.xml, ArcHome/AndroidManifest.xml, and ArcIntentHelper/AndroidManifest.xml. It changed the LRU list as expected:

* Before

  Process LRU list (sorted by oom_adj, 21 total, non-act at 1, non-svc at 1):
    PERS #20: sys   F/ /P  trm: 0 116:system/1000 (fixed)
    PERS #19: pers  F/ /P  trm: 0 354:com.android.systemui/u0a23 (fixed)
    Proc # 0: fore  T/A/T  trm: 0 402:org.chromium.arc.home/u0a1 (top-activity)
    Proc # 1: vis   F/ /SB trm: 0 490:com.google.android.gms.persistent/u0a15 (service)
        com.google.android.gms/.backup.BackupTransportService<=Proc{116:system/1000}
    Proc # 3: svc   B/ /S  trm: 0 1450:org.chromium.arc.crash_collector/u0a0 (started-services)
    Proc # 2: cch   B/ /S  trm: 0 780:com.facebook.orca/u0a58 (cch-started-services)
    Proc # 5: cch   B/ /CE trm: 0 565:com.google.process.gapps/u0a15 (cch-empty)
    Proc # 4: cch   B/ /CE trm: 0 638:com.google.android.gms/u0a15 (cch-empty)
    Proc # 8: cch+2 B/ /CE trm: 0 5876:com.google.android.apps.photos/u0a61 (cch-empty)
    Proc # 7: cch+2 B/ /CE trm: 0 4581:com.facebook.katana/u0a56 (cch-empty)
    Proc # 6: cch+2 B/ /CE trm: 0 6305:com.instagram.android/u0a60 (cch-empty)
    Proc # 9: cch+4 B/ /S  trm: 0 875:com.android.vending/u0a19 (cch-empty)
    Proc #11: cch+4 B/ /CE trm: 0 6085:android.process.media/u0a10 (cch-empty)
    Proc #10: cch+4 B/ /CE trm: 0 3262:com.google.android.gms.unstable/u0a15 (cch-empty)
    Proc #18: cch+6 B/ /S  trm: 0 983:org.chromium.arc.applauncher/1000 (cch-started-services)
    Proc #17: cch+6 B/ /S  trm: 0 1019:org.chromium.arc.file_system/u0a3 (cch-started-services)
    Proc #16: cch+6 B/ /S  trm: 0 1061:org.chromium.arc.intent_helper/u0a3 (cch-started-services)
    Proc #15: cch+6 B/ /S  trm: 0 1109:org.chromium.arc.tts/u0a30 (cch-started-services)
    Proc #14: cch+6 B/ /S  trm: 0 1160:com.google.android.tts/u0a41 (cch-empty)
    Proc #13: cch+6 B/ /CE trm: 0 5900:com.android.printspooler/u0a45 (cch-empty)
    Proc #12: cch+6 B/ /CE trm: 0 4558:com.google.android.gms:snet/u0a15 (cch-empty)

* After

  Process LRU list (sorted by oom_adj, 31 total, non-act at 0, non-svc at 0):
    PERS #30: sys   F/ /P  trm: 0 116:system/1000 (fixed)
    PERS #29: pers  F/ /P  trm: 0 222:com.android.systemui/u0a23 (fixed)
    PERS #27: pers  F/ /P  trm: 0 378:org.chromium.arc.crash_collector/u0a0 (fixed)
    PERS #26: pers  F/ /P  trm: 0 384:org.chromium.arc.intent_helper/u0a3 (fixed)
    PERS #28: pers  T/ /PU trm: 0 272:org.chromium.arc.home/u0a1 (pers-top-activity)
    Proc # 6: vis   F/ /IF trm: 0 1494:com.android.defcontainer/u0a8 (service)
        com.android.defcontainer/.DefaultContainerService<=Proc{116:system/1000}
    Proc # 2: vis   F/ /IF trm: 0 508:com.google.android.gms/u0a15 (started-services)
    Proc # 0: vis   F/ /IF trm: 0 357:com.google.android.gms.persistent/u0a15 (service)
        com.google.android.gms/com.google.android.location.geocode.GeocodeService<=Proc{116:system/1000}
    Proc #23: svc   B/ /S  trm: 0 919:org.chromium.arc.applauncher/1000 (started-services)
    Proc # 5: svc   B/ /S  trm: 0 756:com.android.vending/u0a19 (service)
        com.android.vending/com.google.android.finsky.setup.PlaySetupService<=Proc{919:org.chromium.arc.applauncher/1000}
    Proc # 3: prev  B/ /LA trm: 0 529:com.google.process.gapps/u0a15 (provider)
    Proc #22: svcb  B/ /S  trm: 0 953:org.chromium.arc.file_system/u0a3 (started-services)
    Proc #20: svcb  B/ /S  trm: 0 1002:org.chromium.arc.tts/u0a30 (started-services)
    Proc #19: svcb  B/ /S  trm: 0 1072:com.google.android.tts/u0a41 (service)
        com.google.android.tts/.service.GoogleTTSService<=Proc{1002:org.chromium.arc.tts/u0a30}
    Proc # 8: svcb  B/ /S  trm: 0 655:com.facebook.orca/u0a58 (started-services)
    Proc #10: cch   B/ /CE trm: 0 495:com.facebook.katana/u0a56 (cch-empty)
    Proc # 9: cch   B/ /CE trm: 0 402:com.android.printspooler/u0a45 (cch-empty)
    Proc # 7: cch   B/ /CE trm: 0 1150:com.google.android.gms.ui/u0a15 (cch-empty)
    Proc # 4: cch   B/ /CE trm: 0 1453:com.google.android.gms:car/u0a15 (cch-empty)
    Proc # 1: cch   B/ /CE trm: 0 958:com.google.android.gms.unstable/u0a15 (cch-empty)
    Proc #15: cch+2 B/ /CE trm: 0 1304:com.microsoft.office.excel/u0a53 (cch-empty)
    Proc #14: cch+2 B/ /CE trm: 0 736:com.android.providers.calendar/u0a5 (cch-empty)
    Proc #13: cch+2 B/ /CE trm: 0 1084:com.google.android.apps.photos/u0a61 (cch-empty)
    Proc #12: cch+2 B/ /CE trm: 0 805:com.google.android.apps.work.clouddpc.arc/u0a35 (cch-empty)
    Proc #11: cch+2 B/ /CE trm: 0 1207:com.google.android.gm/u0a52 (cch-empty)
    Proc #24: cch+4 B/ /CE trm: 0 875:com.google.process.gapps/u0a38 (cch-empty)
    Proc #21: cch+4 B/ /CE trm: 0 1029:org.chromium.arc.gms/u0a28 (cch-empty)
    Proc #18: cch+4 B/ /CE trm: 0 460:android.process.media/u0a10 (cch-empty)
    Proc #17: cch+4 B/ /CE trm: 0 1037:com.android.phone/1001 (cch-empty)
    Proc #16: cch+4 B/ /CE trm: 0 615:android.process.acore/u0a6 (cch-empty)
    Proc #25: cch+6 B/ /CE trm: 0 720:com.android.managedprovisioning/u0a17 (cch-empty)

All these 3 .arc. processes are on top of the list with "pers".

Which one(s) should we protect from the OOM killer? Maybe org.chromium.arc.home (only)? Also, processes like zygote, vold, arcbridgeservice are not listed here. Are they properly protected?

Labels: -Pri-1 Pri-2
Summary: Can we do better w/ OOM scores of ARC++ processes? (was: ARC++ isn't honoring important Android processes)
So it turns out that I've got a pretty good solution at making the OOM killer robust and I'm trying to clean it up now.  That makes this bug a bit less urgent.

It still seems like we should take good care with our OOM scores, but maybe we don't need to smash something in quickly?

I'll confirm when I actually have a clean set of patches and continued testing is going well.

Comment 22 by cylee@chromium.org, Mar 31 2017

I grabbed a Minnie (also arm) and done some memory pressure test:

1. Launch ARC++, open several tabs and some Android apps
2. Run a memory consuming binary "memory-eater" (I used to use it for memory pressure test, which simply allocates a specific amount of memory). 
3. Repeat 2. carefully, do not add lots of memory pressure at once. Then watch how Android apps are killed by 
$ watch -n 1 'ps aux | grep -E \(org.chromium.arc.\|com.android\|com.google.android\|android.process\)'

At last, all android "apps" are killed except com.android.systemui and org.chromium.arc.home. The 2 are specified as non-killable by Android (-1000 oom_score_adj) so remain intact.

I think the test is very similar to what dianders mentioned above. But still my system neither hang nor crash.

So my tentative conclusion is either
1) A Kevin specific problem
2) Killing any of the Android apps won't cause system hang or crash, it's still a kernel OOM issue which causes the problem

That said. I  think we should mark apps "persistent" if we don't want it to be killed by tab discarder and kernel OOM killer.  Thoughts? yusukes@?

memory-eater
5.6 KB View Download
@22: I had no problem reproducing this on my jerry (same family as minnie).  I had a 2GB jerry, I so I adjusted parameters, but the idea is identical:

1. Copy these two attachments to /usr/local/bin
2. Login to an account w/ ARC++ enabled.
3. Open a few tabs.  I opened "The Verge" and started streaming 3 youtube videos at once.  I also started the Play Store.
4. SSH over and type: "cd /usr/local/bin; nice ./launchBalloons.sh 4 700 100000"

I don't think your test is the same as the one I'm doing.  I'm specifically adding lots of pressure at once to get the kernel OOM killer to run, and that's very important here.  If you look at all the patches I've been posting they specifically talk about deadlocks that only happen when the OOM killer is running and Android processes are the ones being killed.  ...and here we're talking about setting kernel OOM scores that only affect the kernel OOM killer, so testing other methods of killing processes probably isn't too relevant?


I backported my fixes to 3.14 and they seem to fix it just like they do for 4.4.

---

So basically: I think the crashing in general is gone but, as you said, we still might want to play with the scores.  I'm not at all convinced that the current processes that the OOM killer is killing are the best ones.
Project Member

Comment 24 by bugdroid1@chromium.org, Mar 31 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88bdca832d35cb1a28436ee41626da92c2f3251e

commit 88bdca832d35cb1a28436ee41626da92c2f3251e
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri Mar 31 22:16:21 2017

UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465186
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/88bdca832d35cb1a28436ee41626da92c2f3251e/mm/oom_kill.c

Project Member

Comment 25 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/222d652a0495a51be1d799772ec669e169ca9496

commit 222d652a0495a51be1d799772ec669e169ca9496
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Mar 31 22:16:22 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465187

[modify] https://crrev.com/222d652a0495a51be1d799772ec669e169ca9496/mm/oom_kill.c

Project Member

Comment 26 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe

commit d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Mar 31 22:16:23 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465188

[modify] https://crrev.com/d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe/mm/oom_kill.c

Project Member

Comment 27 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5

commit ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Mar 31 22:16:24 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465189
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5/mm/oom_kill.c

Project Member

Comment 28 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.8
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fcdc42d2699d037a399f655571637e093e9fe09e

commit fcdc42d2699d037a399f655571637e093e9fe09e
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Sat Apr 01 01:56:57 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465586

[modify] https://crrev.com/fcdc42d2699d037a399f655571637e093e9fe09e/mm/oom_kill.c

Project Member

Comment 29 by bugdroid1@chromium.org, Apr 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6d287f9bc9e24b57f938155368784a755204e64c

commit 6d287f9bc9e24b57f938155368784a755204e64c
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 01:56:58 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465587

[modify] https://crrev.com/6d287f9bc9e24b57f938155368784a755204e64c/mm/oom_kill.c

Project Member

Comment 30 by bugdroid1@chromium.org, Apr 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ea803c877bd53ab8a15e7ff6ed8a20b766b247e5

commit ea803c877bd53ab8a15e7ff6ed8a20b766b247e5
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 01:56:59 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465588
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/ea803c877bd53ab8a15e7ff6ed8a20b766b247e5/mm/oom_kill.c

Project Member

Comment 31 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/60728a964498e1200265b94d414cc169b1cdb840

commit 60728a964498e1200265b94d414cc169b1cdb840
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Sat Apr 01 05:33:30 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465472
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/60728a964498e1200265b94d414cc169b1cdb840/mm/oom_kill.c

Project Member

Comment 32 by bugdroid1@chromium.org, Apr 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b74ed4c3bfcdf15339adf67e28e2b2525d8270c7

commit b74ed4c3bfcdf15339adf67e28e2b2525d8270c7
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:31 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465473

[modify] https://crrev.com/b74ed4c3bfcdf15339adf67e28e2b2525d8270c7/mm/oom_kill.c

Project Member

Comment 33 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e8278e9f12d0f82e39c46452420cf2e74cb5a228

commit e8278e9f12d0f82e39c46452420cf2e74cb5a228
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:32 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465474

[modify] https://crrev.com/e8278e9f12d0f82e39c46452420cf2e74cb5a228/mm/oom_kill.c

Project Member

Comment 34 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/25d68b7fa97d3b331e4ac5e1ff0c8399574569bf

commit 25d68b7fa97d3b331e4ac5e1ff0c8399574569bf
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:33 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465475
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/25d68b7fa97d3b331e4ac5e1ff0c8399574569bf/mm/oom_kill.c

Project Member

Comment 35 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8b4a16f82833c8b9f8c3d9cf67898c38354c65c0

commit 8b4a16f82833c8b9f8c3d9cf67898c38354c65c0
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Sat Apr 01 05:33:25 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465468
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/8b4a16f82833c8b9f8c3d9cf67898c38354c65c0/mm/oom_kill.c

Project Member

Comment 36 by bugdroid1@chromium.org, Apr 1 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e792e36e5d689f3daf93a70741ca1d407deb4513

commit e792e36e5d689f3daf93a70741ca1d407deb4513
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:27 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465469

[modify] https://crrev.com/e792e36e5d689f3daf93a70741ca1d407deb4513/mm/oom_kill.c

Project Member

Comment 37 by bugdroid1@chromium.org, Apr 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/195db82069300f8440a507faa20a4649ad25c022

commit 195db82069300f8440a507faa20a4649ad25c022
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:28 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465470

[modify] https://crrev.com/195db82069300f8440a507faa20a4649ad25c022/mm/oom_kill.c

Project Member

Comment 38 by bugdroid1@chromium.org, Apr 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fc300ccd3b96963b018fbcf1bd90570b7eb512b9

commit fc300ccd3b96963b018fbcf1bd90570b7eb512b9
Author: Douglas Anderson <dianders@chromium.org>
Date: Sat Apr 01 05:33:29 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465471
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/fc300ccd3b96963b018fbcf1bd90570b7eb512b9/mm/oom_kill.c

Project Member

Comment 39 by bugdroid1@chromium.org, Apr 2 2017

Labels: merge-merged-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b4b86c11b400271aaec13c9b56f50e0509d7aaec

commit b4b86c11b400271aaec13c9b56f50e0509d7aaec
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Sun Apr 02 04:52:26 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465411
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/b4b86c11b400271aaec13c9b56f50e0509d7aaec/mm/oom_kill.c

Project Member

Comment 40 by bugdroid1@chromium.org, Apr 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6e3f395bb36e20d04fb83ac945bab47410e12e86

commit 6e3f395bb36e20d04fb83ac945bab47410e12e86
Author: Douglas Anderson <dianders@chromium.org>
Date: Sun Apr 02 04:52:27 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465412
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/6e3f395bb36e20d04fb83ac945bab47410e12e86/mm/oom_kill.c

Project Member

Comment 41 by bugdroid1@chromium.org, Apr 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/aa071ca4b372e1e30701c56fe50823f34b3bc9f6

commit aa071ca4b372e1e30701c56fe50823f34b3bc9f6
Author: Douglas Anderson <dianders@chromium.org>
Date: Sun Apr 02 04:52:28 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465413
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/aa071ca4b372e1e30701c56fe50823f34b3bc9f6/mm/oom_kill.c

Project Member

Comment 42 by bugdroid1@chromium.org, Apr 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/eed40556449ebee13af39c8308ac02bfb252cf40

commit eed40556449ebee13af39c8308ac02bfb252cf40
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon Apr 03 23:28:10 2017

UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465186
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit 88bdca832d35cb1a28436ee41626da92c2f3251e)
Reviewed-on: https://chromium-review.googlesource.com/466559

[modify] https://crrev.com/eed40556449ebee13af39c8308ac02bfb252cf40/mm/oom_kill.c

Project Member

Comment 43 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3d3748ee6c0a5cac059e1fa65adf10c85870c890

commit 3d3748ee6c0a5cac059e1fa65adf10c85870c890
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:28:29 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465187
(cherry picked from commit 222d652a0495a51be1d799772ec669e169ca9496)
Reviewed-on: https://chromium-review.googlesource.com/466560

[modify] https://crrev.com/3d3748ee6c0a5cac059e1fa65adf10c85870c890/mm/oom_kill.c

Project Member

Comment 44 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b4253e4b60fb6903693aa137bf2f2334970e03df

commit b4253e4b60fb6903693aa137bf2f2334970e03df
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:28:57 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465188
(cherry picked from commit d7cd7d859ad92dac0f9f4511a2bc1cf2208e8cbe)
Reviewed-on: https://chromium-review.googlesource.com/466561

[modify] https://crrev.com/b4253e4b60fb6903693aa137bf2f2334970e03df/mm/oom_kill.c

Project Member

Comment 45 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2

commit 7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:29:25 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465189
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit ed758deb4a6ef27d56e95da9d1a73be0cbd09ac5)
Reviewed-on: https://chromium-review.googlesource.com/466562

[modify] https://crrev.com/7271dacb3f1f17aa698a9831c941bcb7e4b4a8a2/mm/oom_kill.c

Project Member

Comment 46 by bugdroid1@chromium.org, Apr 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d40d12a0baa2776d91908c74f37a0d1c46eecb37

commit d40d12a0baa2776d91908c74f37a0d1c46eecb37
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon Apr 03 23:30:15 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465472
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit 60728a964498e1200265b94d414cc169b1cdb840)
Reviewed-on: https://chromium-review.googlesource.com/466563

[modify] https://crrev.com/d40d12a0baa2776d91908c74f37a0d1c46eecb37/mm/oom_kill.c

Project Member

Comment 47 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/bb3ab0662a46da9b135df2917f80da5b1e47c9ee

commit bb3ab0662a46da9b135df2917f80da5b1e47c9ee
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:30:38 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465473
(cherry picked from commit b74ed4c3bfcdf15339adf67e28e2b2525d8270c7)
Reviewed-on: https://chromium-review.googlesource.com/466564

[modify] https://crrev.com/bb3ab0662a46da9b135df2917f80da5b1e47c9ee/mm/oom_kill.c

Project Member

Comment 48 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9

commit f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:31:20 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465474
(cherry picked from commit e8278e9f12d0f82e39c46452420cf2e74cb5a228)
Reviewed-on: https://chromium-review.googlesource.com/466565

[modify] https://crrev.com/f70e18011ebfe8f16d22b86ba34dea84aa1c3ef9/mm/oom_kill.c

Project Member

Comment 49 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2014267785d257ebc84d3f9da30ed0c5df41e2b0

commit 2014267785d257ebc84d3f9da30ed0c5df41e2b0
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:31:46 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465475
Tested-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit 25d68b7fa97d3b331e4ac5e1ff0c8399574569bf)
Reviewed-on: https://chromium-review.googlesource.com/466846

[modify] https://crrev.com/2014267785d257ebc84d3f9da30ed0c5df41e2b0/mm/oom_kill.c

Project Member

Comment 50 by bugdroid1@chromium.org, Apr 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/facb2dedf039748d0c736e4b00893e2923d83179

commit facb2dedf039748d0c736e4b00893e2923d83179
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon Apr 03 23:33:04 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465468
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit 8b4a16f82833c8b9f8c3d9cf67898c38354c65c0)
Reviewed-on: https://chromium-review.googlesource.com/466847

[modify] https://crrev.com/facb2dedf039748d0c736e4b00893e2923d83179/mm/oom_kill.c

Project Member

Comment 51 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a0a5e011f72a347ab78e1ac2381830d91b50f6cd

commit a0a5e011f72a347ab78e1ac2381830d91b50f6cd
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:33:29 2017

CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if we still have mm

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

In the function find_lock_task_mm() the stated purpose is to find a
subthread in the case that the original task no longer has an "->mm".

Implied in the description is that we should pick the original process
"p" if it still has "->mm" and we should only pick a subthread if
needed.  However, that's not what happens with the current
implementation.  Let's change the code to explicitly make sure that we
return the original process "p" if it has a valid "->mm".

Why do we care?  A future patch will improve the OOM killer to
sometimes allow more than one OOM victim to exist at once.  If we do
that without making find_lock_task_mm() prefer the original process
then we can get into this loop:
1. Pick a process as a victim.
2. Don't kill the chosen process, but instead kill some other task
   that (perhaps) has already been previously marked as a victim.
3. Go back to step #1 and pick the same process as a victim (since it
   wasn't actually killed before).

Even without the future patch, though, it still makes sense to pick
the original process if possible.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Change-Id: I3b59bbff74bd5fed1fa7790dd9696e29030fba02
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465469
(cherry picked from commit e792e36e5d689f3daf93a70741ca1d407deb4513)
Reviewed-on: https://chromium-review.googlesource.com/466848

[modify] https://crrev.com/a0a5e011f72a347ab78e1ac2381830d91b50f6cd/mm/oom_kill.c

Project Member

Comment 52 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b3ab459eb659ad748cdfd9b8f5074eed42c8b535

commit b3ab459eb659ad748cdfd9b8f5074eed42c8b535
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:33:52 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465470
(cherry picked from commit 195db82069300f8440a507faa20a4649ad25c022)
Reviewed-on: https://chromium-review.googlesource.com/466849

[modify] https://crrev.com/b3ab459eb659ad748cdfd9b8f5074eed42c8b535/mm/oom_kill.c

Project Member

Comment 53 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/54435bd3425984b4ea5e6d8eaa8cbc59055b96e7

commit 54435bd3425984b4ea5e6d8eaa8cbc59055b96e7
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:34:18 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465471
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit fc300ccd3b96963b018fbcf1bd90570b7eb512b9)
Reviewed-on: https://chromium-review.googlesource.com/466850

[modify] https://crrev.com/54435bd3425984b4ea5e6d8eaa8cbc59055b96e7/mm/oom_kill.c

Project Member

Comment 54 by bugdroid1@chromium.org, Apr 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/392aa7079e1b6ac13458ee46a01b9676e221a146

commit 392aa7079e1b6ac13458ee46a01b9676e221a146
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon Apr 03 23:35:05 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465411
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit b4b86c11b400271aaec13c9b56f50e0509d7aaec)
Reviewed-on: https://chromium-review.googlesource.com/466851

[modify] https://crrev.com/392aa7079e1b6ac13458ee46a01b9676e221a146/mm/oom_kill.c

Project Member

Comment 55 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/9d5640372c747c33e57b8e6a9172854334921c61

commit 9d5640372c747c33e57b8e6a9172854334921c61
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:35:32 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465412
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 6e3f395bb36e20d04fb83ac945bab47410e12e86)
Reviewed-on: https://chromium-review.googlesource.com/466852

[modify] https://crrev.com/9d5640372c747c33e57b8e6a9172854334921c61/mm/oom_kill.c

Project Member

Comment 56 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/58aa31ac416443efae4ff24085a55e143891c64d

commit 58aa31ac416443efae4ff24085a55e143891c64d
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:35:52 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465413
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit aa071ca4b372e1e30701c56fe50823f34b3bc9f6)
Reviewed-on: https://chromium-review.googlesource.com/466853

[modify] https://crrev.com/58aa31ac416443efae4ff24085a55e143891c64d/mm/oom_kill.c

Project Member

Comment 57 by bugdroid1@chromium.org, Apr 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.8
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ae0debcbe000ccc368b801d24d8d80ddff4d708f

commit ae0debcbe000ccc368b801d24d8d80ddff4d708f
Author: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon Apr 03 23:36:37 2017

BACKPORT: mm: oom_kill: don't ignore oom score on exiting tasks

When the OOM killer scans tasks and encounters a PF_EXITING one, it
force-selects that task regardless of the score.  The problem is that if
that task got stuck waiting for some state the allocation site is
holding, the OOM reaper can not move on to the next best victim.

Frankly, I don't even know why we check for exiting tasks in the OOM
killer.  We've tried direct reclaim at least 15 times by the time we
decide the system is OOM, there was plenty of time to exit and free
memory; and a task might exit voluntarily right after we issue a kill.
This is testing pure noise.  Remove it.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were extra checked in 3.14, but the concept is still the same.

Change-Id: Ied8391fd2d8cda4d1728a28dbe6cfea31883b24e
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 6a618957ad17d8f4f4c7eeede752685374b1b176)
Reviewed-on: https://chromium-review.googlesource.com/465586
(cherry picked from commit fcdc42d2699d037a399f655571637e093e9fe09e)
Reviewed-on: https://chromium-review.googlesource.com/466854

[modify] https://crrev.com/ae0debcbe000ccc368b801d24d8d80ddff4d708f/mm/oom_kill.c

Project Member

Comment 58 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/21f9394b508ab427db3ad9bf2504a1c045541796

commit 21f9394b508ab427db3ad9bf2504a1c045541796
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:37:01 2017

CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

Of the child of an OOM victim has a different "->mm" element than the
victim, we may choose to kill the child in place of the victim.  This
is sane because if we killed the victim the child would die anyway, so
why not see if we can save the parent?

...but, before we kill the child let's double-check that it's OK by
calling oom_scan_process_thread().

I'll admit that as the code stands right now this is a totally useless
thing to do.  The only time oom_scan_process_thread() will tell us not
to kill the child is if:

1. The child is already marked with MEMDIE.  However, that's not
   possible because the code (as it stands today) won't let more than
   one process have MEMDIE.
2. The child has no "mm".  ...but that shouldn't happen because the
   parent and child shouldn't share an "mm" if the child's "mm" is
   NULL.

Now that I've explained why this patch is stupid and useless, here's
why we want it anyway: in a future patch we'll make it (sometimes)
possible for more than one process to be marked with MEMDIE.  With
that in mind, point #1 above is moot and suddenly this patch has merit.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

NOTE: this file is slightly different than its 4.4 counterpart (where
the patch was originally written) due to different parameters in
oom_scan_process_thread().

Change-Id: I0ea9b5f4a03c1392d52d9d927bd395ed44d7b98c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465587
(cherry picked from commit 6d287f9bc9e24b57f938155368784a755204e64c)
Reviewed-on: https://chromium-review.googlesource.com/466855

[modify] https://crrev.com/21f9394b508ab427db3ad9bf2504a1c045541796/mm/oom_kill.c

Project Member

Comment 59 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b879df4c109d88fbc3b3f3b255c62dafe97acab0

commit b879df4c109d88fbc3b3f3b255c62dafe97acab0
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 03 23:37:23 2017

CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE.

It is part of a patch series that attempts to solve similar problems
to the OOM reaper upstream.  NOTE that the OOM reaper patches weren't
backported because mm patches in general are fairly intertwined and
picking the OOM reaper without an mm expert and lots of careful
testing could cause new mysterious problems.

Compared to the OOM reaper, this patch series:
+ Is simpler to reason about.
+ Is not very intertwined to the rest of the system.
+ Can be fairly easily reasoned about to see that it should do no
  harm, even if it doesn't fix every problem.
- Can more easily get into the state where all memory reserves are
  gone, since it allows more than one task to be in "MEMDIE".
- Doesn't free memory as quickly.  The reaper kills anonymous /
  swapped memory even before the doomed task exits.
- Contains the magic "100 ms" heuristic, which is non-ideal.

---

It turns out that there's a pretty big flaw with the way the OOM
killer works.  If the chosen OOM kill victim doesn't die then the OOM
killer totally stops working.  Specifically once we've picked an OOM
victim we won't ever pick another until the first victim dies.  If
that victim refuses to die then we'll silently loop until eventually
some sort of watchdog reboots the system.

This is easy to see in toy code.  Just write a function that userspace
can invoke that grabs a mutex twice.  Now set the OOM score so that
this is the preferred process to kill.  The first invocation of the
OOM killer will try to kill this process but won't be able to because
the process will never get the mutex.  All subsequent OOM killer
invocations will be complete no-ops.

This is easy to argue about in non-toy code too.  Just imagine:
A) Memory gets low and pretty much everyone is starved.
B) Process 1000 has a mutex named "mutex" and is waiting to allocate
   memory.
C) Process 1001 wants "mutex".
D) We choose to OOM kill process 1001, maybe because it has an
   attractive looking OOM score.

OOM kill is the last resort of the memory manager.  That means it has
no other ways to get memory to give to process 1000 in the example
above and process 1000 will loop waiting forever, never releasing its
mutex.  Process 1001 will never get the mutex and thus can never die
to either release memory or allow an additional OOM kill.  Classic
deadlock.

This is seen in real life.  Specifically while stressing the OOM
killer on a Chromebook with Android processes running, the OOM killer
likes to kill Android processes.  Often these processes are sitting in
binder_thread_read().  When we try to kill them they wake up and try
to grab the binder mutex.  Often the binder mutex is held by another
process who is stuck trying to allocate memory.  Ugh.  We probably
want to fix Binder to make this specific case behave better, but we
can't fix every case.

This is seen in real life in non-binder code as well.  Specifically
I've seen the victim process sitting waiting like this:
 schedule+0x88/0xa8
 schedule_timeout+0x44/0x27c
 io_schedule_timeout+0x7c/0xbc
 bt_get+0x174/0x278
 blk_mq_get_tag+0x4c/0xd4
 __blk_mq_alloc_request+0x28/0x158
 blk_mq_map_request+0x340/0x39c
 blk_sq_make_request+0xb0/0x3b8
 generic_make_request+0xcc/0x174
 submit_bio+0xe4/0x1d4
 submit_bh_wbc.isra.35+0x124/0x138
 submit_bh+0x2c/0x38
 ll_rw_block+0xac/0xd8
 squashfs_read_data+0x354/0x510
 squashfs_readpage_block+0x2d8/0x490
 squashfs_readpage+0x48c/0x600
 __do_page_cache_readahead+0x1e4/0x248
 filemap_fault+0x14c/0x3b8
 __do_fault+0x78/0xf0
 handle_mm_fault+0x744/0x1050
 do_page_fault+0x140/0x2dc
 do_mem_abort+0x64/0xd8

While I didn't dig into why the process couldn't get unblocked in the
squashfs case, it seems fairly sane to assume some other process is
waiting on memory.

Now that you (hopefully) believe we have a problem, we need a
solution.  One sane solution is to eventually let the OOM killer kill
more than one process.  We don't want to kill more than one process
right away as per the dire warnings in the OOM killer code about
exhausting our memory reserves and ending up in a livelock.  While
imperfect, it seems sane to perhaps use a heuristic here where we wait
a bit of time (100 ms) for a process to die.  If it fails, we give up
and try another process.

BUG=chromium:706048, chromium:702707
TEST=With whole series, eating memory doesn't hang system

Conflicts:
	mm/oom_kill.c
There were conflicts compared to the 4.4 version because of lots of
churn in the oom kill code.  Conceptually this is the same thing,
though.

Change-Id: I4099314abed36b8af0680ceb0eb6d85427ba4962
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/465588
Reviewed-by: Dylan Reid <dgreid@chromium.org>
(cherry picked from commit ea803c877bd53ab8a15e7ff6ed8a20b766b247e5)
Reviewed-on: https://chromium-review.googlesource.com/466856

[modify] https://crrev.com/b879df4c109d88fbc3b3f3b255c62dafe97acab0/mm/oom_kill.c

The kernel seems okay now (no deadlock on OOM), but we still need to adjust Android processes' oom scores because we're still seeing lots of LowMemoryKill on R59-9454.0.0:

https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=58182494186
> I have gmail running as an android app; but when I try to start the play store I just get a spinney on the icon.  Clearly the container is up.  Why doesn't play store start?  Other apps are also not starting.  But the gmail app is functioning so its not like all Android apps are dead.  Weird.  


Copy and pasting my comment in the email thread:
According to the logcat section of the log, LowMemoryKiller is (repeatedly) killing org.chromium.arc.crash_collector, org.chromium.arc.gms, and org.chromium.arc.intent_helper. The processes immediately restarts right after being killed, then LowMemoryKiller kills them again. ActivityManagerService looks very busy doing the kill-relaunch cycle.

Worse, according to "PROCESS TIMES" section in the log, org.chromium.arc.applauncher, org.chromium.arc.file_system, and org.chromium.arc.tts don't even exist. The lack of org.chromium.arc.applauncher would explain the spinner for Play Store app. These processes are supposed to relaunch after being killed, but apparently that didn't happen. Probably ActivityManagerService gave up relaunching after too many relaunches? It's unclear from the log.

Gmail app (com.google.android.gm) somehow survived because the process was not on the bottom of the "Process LRU" list sorted by oom_adj (see below.)

Either way, since org.chromium.arc.* processes are designed to relaunch right after being killed, allowing the low memory killer to kill them seems just a waste of CPU resource. That also seems to make the system unstable (like the lack of org.chromium.arc.applauncher mentioned above).


Labels: -Pri-2 Pri-1
As I mentioned at crbug.com/706048#c20 , I think we can protect org.chromium.arc.* processes by modifying our AndroidManifest.xml files. I'm going to do that on Monday.

cylee@, dianders@, if you have a better idea or concerns, please let me know.

Comment 62 by cylee@google.com, Apr 24 2017

#61
Yes I think it should be done on Android since Android already provides a way to specify process importance. Hard-coding from Chrome side by process name doesn't sound a better idea to me.
Thanks for helping !

ag/2148956

Agreed that we should be protecting any processes which would just be re-launched again when killed.
One other question: what is the adjusted OOM score (in terms of LRU) for newly created processes?  Presumably any newly created process should be counted as "recently used".  If not, you'll tend to kill the new processes, then just create them again, right?
The oom_score_adj is inherited by the parent.  I am hoping that the spawning process has a score of -1000 (unkillable).
c#65
"dumpsys activity" output says that a newly created Android process (for showing an Android activity) has the "foreground" priority, and therefore is properly protected.

Comment 68 by cylee@google.com, Apr 25 2017

#65
I'm using the LRU maintained by Android system itself (i.e., "app.lastActivityTime") to sort Android processes, so in theory new processes have higher priority thus less chance to be killed. 
But in practice I do notice the repeated killing behavior if system is in memory pressure for some time. Because currently those "invisible background ARC processes" has lower priority than other visible apps and background tabs. So if system doesn't go back to normal memory level before they're respawned, they'll be killed again.
Maybe we should higher the priority of respawning app so background tab is killed before them?
#68
Killing background ARC processes seems okay to me (unless they are ARC system processes like org.chromium.arc.applauncher which I've already protected with ag/2148956) since they're not visible to the user. Apps are supposed to handle OOM kills nicely, and in general, not all apps try to restart them immediately after being killed.

(Ours like org.chromium.arc.applauncher tend to do the immediate relaunch because ours are marked as Service.START_STICKY for reasons, but not all apps do that.)

As I wrote in #67, a newly created _and_ user-visible process is specially handled, and doesn't seem to have the issue.

> Maybe we should higher the priority of respawning app so background tab is killed before them?

So.. not sure. The current behavior (with ag/2148956) might be good enough. WDYT?

BTW, I've landed the fix to nyc-tot, mnc-tot, and nyc-m58.

Cc: khmel@chromium.org
We are seeing a problem where Android icons in the tray disappeared even though they are still alive, and that's because the applauncher is killed. So that problem should be fixed after the Android uprev.
Ben, nyc-arc M58+ will be okay, but I didn't merge the CL to mnc-arc-m58. Do we want to fix mnc-arc-m58 as well? How serious is the applauncher issue?
#71: that's issue 710984.
I just gave R60-9497-.0.0 a try. The newly protected process are not killed, but it seems that we are still thrashing on some Android processes.

In my experiment I opted-in play store but didn't open any apps. Instead, I was solely using Chrome tabs to increase memory pressure. Attached is the filtered output from ui.LATEST where a bunch of com.google processes are continuously killed and restarted and killed. For example, com.google.android.tts is one of them:

[1577:1577:0426/140349.052981:ERROR:device_event_log_impl.cc(156)] [14:03:49.052] Memory: tab_manager_delegate_chromeos.cc:614 Killed app 19258 (com.google.android.tts), process_state ProcessState::IMPORTANT_FOREGROUND, is_focused 0, lastActivityTime 1217871, process_type BACKGROUND_APP, estimated 57896 KB freed
[1577:1577:0426/140402.054797:ERROR:device_event_log_impl.cc(156)] [14:04:02.054] Memory: tab_manager_delegate_chromeos.cc:614 Killed app 19849 (com.google.android.tts), process_state ProcessState::IMPORTANT_FOREGROUND, is_focused 0, lastActivityTime 1235026, process_type BACKGROUND_APP, estimated 53704 KB freed
[1577:1577:0426/140428.040320:ERROR:device_event_log_impl.cc(156)] [14:04:28.039] Memory: tab_manager_delegate_chromeos.cc:614 Killed app 19909 (com.google.android.tts), process_state ProcessState::IMPORTANT_FOREGROUND, is_focused 0, lastActivityTime 1237860, process_type BACKGROUND_APP, estimated 56600 KB freed
[1577:1577:0426/140430.038799:ERROR:device_event_log_impl.cc(156)] [14:04:30.038] Memory: tab_manager_delegate_chromeos.cc:614 Killed app 20200 (com.google.android.tts), process_state ProcessState::IMPORTANT_FOREGROUND, is_focused 0, lastActivityTime 1263906, process_type BACKGROUND_APP, estimated 46716 KB freed
[1577:1577:0426/140822.724526:ERROR:device_event_log_impl.cc(156)] [14:08:22.723] Memory: tab_manager_delegate_chromeos.cc:614 Killed app 22816 (com.google.android.tts), process_state ProcessState::IMPORTANT_FOREGROUND, is_focused 0, lastActivityTime 1413870, process_type BACKGROUND_APP, estimated 46664 KB freed


Are they auto started on kill? I think we should avoid killing them too.

app_tab_kills.txt
69.8 KB View Download
#74
Two questions:

* How did you use Chrome to increase the memory pressure? I'd like to reproduce this locally on my device.
* Did you see any user-visible issues like UI jank while processes are killed and restarted?

> I think we should avoid killing them too.

Some of the processes listed in the log are from pre-built apks and we cannot easily make them persistent.

Here are my steps:

1) remove rootfs verification
2) add the following to /etc/chrome_dev.conf
--enable-logging=stderr vmodule=tab_manager_delegate_chromeos=3
3) in crosh, do "swap enable 500" to only enable 500MB of swap space to make low-memory condition happen sooner

Then keep opening Chrome tabs to fill the swap space and watch /var/log/ui.LATEST. The tab kill messages will show up there. In my experiments even the machine is sitting idle, those Android processes are continuously restarted and killed.
Cc: cylee@chromium.org
Owner: yusukes@chromium.org
Status: Started (was: Untriaged)
Ok, thanks. Taking a look if there's something the container side can do.
re #74 -- yes I've seen this quite a bit including things like gms, print app, etc getting killed, and I think they will just respawn.  I think gms going down also has the potential to disrupt a user who is currently running an android app, no?
Re #74 etc.
The main reason of the frequent respawning is because Chrome kills IMPORTANT_FOREGROUND ARC processes. According to ActivityManager.java, the user is aware of IMPORTANT_FOREGROUND processes:

    /** @hide Process is a persistent system process. */
    public static final int PROCESS_STATE_PERSISTENT = 0;

    /** @hide Process is a persistent system process and is doing UI. */
    public static final int PROCESS_STATE_PERSISTENT_UI = 1;

    /** @hide Process is hosting the current top activities.  Note that this covers                                                                                                                                                                                                       
     * all activities that are visible to the user. */
    public static final int PROCESS_STATE_TOP = 2;

    /** @hide Process is hosting a foreground service due to a system binding. */
    public static final int PROCESS_STATE_BOUND_FOREGROUND_SERVICE = 3;

    /** @hide Process is hosting a foreground service. */
    public static final int PROCESS_STATE_FOREGROUND_SERVICE = 4;

    /** @hide Same as {@link #PROCESS_STATE_TOP} but while device is sleeping. */
    public static final int PROCESS_STATE_TOP_SLEEPING = 5;

    /** @hide Process is important to the user, and something they are aware of. */
    public static final int PROCESS_STATE_IMPORTANT_FOREGROUND = 6;

Although the kernel may still kill them, but at least the userland code (in Chrome) shouldn't actively kill them. I'm going to patch chrome/browser/memory/tab_manager_delegate_chromeos.cc shortly.

Another thing we could do is to avoid killing a process whose uptime is less than N seconds. That will prevent us from repeatedly killing unimportant but sticky (SERVICE_STICKY) processes. I'm probably going to implement this too, but I think this is less important.

Project Member

Comment 81 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b9b315ec73b88bebaaa1b45dd85b338392eb8fef

commit b9b315ec73b88bebaaa1b45dd85b338392eb8fef
Author: yusukes <yusukes@chromium.org>
Date: Tue May 02 22:05:09 2017

Don't kill a process if it has IMPORTANT_FOREGROUND or higher priority

Actively killing IMPORTANT_FOREGROUND processes in Chrome doesn't make
much sense because the processes are user-visible to some extent and
because of that, they will likely respawn immediately after being
killed.

BUG=706048
TEST=try
TEST=Follow crbug.com/706048#c76, confirm we no longer see infinite
 the kill-respawn loop.

Review-Url: https://codereview.chromium.org/2852093003
Cr-Commit-Position: refs/heads/master@{#468788}

[modify] https://crrev.com/b9b315ec73b88bebaaa1b45dd85b338392eb8fef/chrome/browser/memory/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/b9b315ec73b88bebaaa1b45dd85b338392eb8fef/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc

Project Member

Comment 82 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b784d0769dce13bd2efc4ae9bf13b0ffe8523a03

commit b784d0769dce13bd2efc4ae9bf13b0ffe8523a03
Author: yusukes <yusukes@chromium.org>
Date: Wed May 03 00:28:16 2017

Do not kill recently killed ARC processes again

ARC processes sometimes respawn right after being killed. In that case,
killing them every time is just a waste of resources.

BUG=706048
TEST=try
TEST=Follow crbug.com/706048#c76, confirm Chrome no longer kills the
 processes too often.

Review-Url: https://codereview.chromium.org/2850203003
Cr-Commit-Position: refs/heads/master@{#468835}

[modify] https://crrev.com/b784d0769dce13bd2efc4ae9bf13b0ffe8523a03/chrome/browser/memory/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/b784d0769dce13bd2efc4ae9bf13b0ffe8523a03/chrome/browser/memory/tab_manager_delegate_chromeos.h
[modify] https://crrev.com/b784d0769dce13bd2efc4ae9bf13b0ffe8523a03/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc

Cc: igo@chromium.org bhthompson@chromium.org
Is it possible to merge them to R58 too? They are in Chrome so I don't know if that's feasible or not.
I believe these are chromeos specific if you mean comment 81 and 82 in which case we can merge them.

However given these just landed in the last few hours, we should verify they work right on ToT, then merge to 59, and if it works there then merge to 58.

59 merge approval will be from gkihumba@.
Ben, here's the situation:

* Android change (already live in nyc-m60, nyc-m59, nyc-m58, and mnc-m60):

ag/2148956
This protects org.chromium.arc.* processes from being killed. If they are killed, some of Chrome features (like showing ARC icons, launching ARC apps, etc.) will be lost.

* Chrome changes (live in m60):

https://codereview.chromium.org/2852093003
This protects IMPORTANT_FOREGROUND processes from being killed. This is good because IMPORTANT_FOREGROUND processes tend to relaunch right after being killed.

https://codereview.chromium.org/2850203003
Some of lowest priority (CACHED_EMPTY) processes still relaunch right after being killed. This CL detects such processes at runtime and avoid killing them too often.


Android change, which I think it the most important, is okay. They are already in nyc-m58. As for Chrome changes, let's try to merge them to m59 at least. Will request a merge later this week, after baking them on canary channel for a while.

m58 might be too late. Are they super important? Avoid killing these processes is a definitely nice to have feature, but I'm not really sure how important it is.

Sorry I posted without looking at #c84. Ok, let's merge them to m59 first.

The unit test I added had a minor issue. Will reland it soon:
https://codereview.chromium.org/2861613002/

Project Member

Comment 89 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eba2b9c32c0df7536d5884ab0d645b2701815206

commit eba2b9c32c0df7536d5884ab0d645b2701815206
Author: yusukes <yusukes@chromium.org>
Date: Wed May 03 20:02:12 2017

Reland: Do not kill recently killed ARC processes again

ARC processes sometimes respawn right after being killed. In that case,
killing them every time is just a waste of resources.

BUG=706048
TEST=try
TEST=Follow crbug.com/706048#c76, confirm Chrome no longer kills the
 processes too often.
TEST=gn gen out/Debug --args='is_debug=true use_goma=true target_os="chromeos"
 is_component_build=true remove_webcore_debug_symbols=true' &&
 ninja -C out/Debug unit_tests

Review-Url: https://codereview.chromium.org/2861613002
Cr-Commit-Position: refs/heads/master@{#469092}

[modify] https://crrev.com/eba2b9c32c0df7536d5884ab0d645b2701815206/chrome/browser/memory/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/eba2b9c32c0df7536d5884ab0d645b2701815206/chrome/browser/memory/tab_manager_delegate_chromeos.h
[modify] https://crrev.com/eba2b9c32c0df7536d5884ab0d645b2701815206/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc

Re-landed.
Chrome PFQ has been red all the time since May 1, and because of that, my Chromium CLs are not live on canary yet.

Cc: abdulsyed@chromium.org
Labels: Merge-Request-59
+Merge-Request-59 +abdulsyed@
I'd like to merge https://codereview.chromium.org/2852093003 and https://codereview.chromium.org/2861613002 to M59 Chrome. Could you have a look?

They only affect Chrome OS version of Chrome. The changes are live in canary channel since Friday morning.

Project Member

Comment 93 by sheriffbot@chromium.org, May 8 2017

Labels: -Merge-Request-59 Merge-Review-59 Hotlist-Merge-Review
This bug requires manual review: Reverts referenced in bugdroid comments after merge request.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: gkihumba@chromium.org
Hi amineer@, gkihumba@, could you have a look at comment#92?

Owner: amineer@chromium.org
Cc: -abdulsyed@chromium.org -gkihumba@chromium.org
Owner: gkihumba@chromium.org
I'm sorry you've been waiting a week for a review here :(  That said the changes appear to be 100% CrOS based, and given where we are in the M59 release cycle the change should be reviewed by CrOS TPM (gkihumba@).
Labels: Merge-Approved-59
Owner: yusukes@chromium.org
Thanks!
Project Member

Comment 100 by bugdroid1@chromium.org, May 11 2017

Labels: -merge-approved-59 merge-merged-3071
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/22a4c0ee2fa868cb6560730ab8e579ddccb8c09c

commit 22a4c0ee2fa868cb6560730ab8e579ddccb8c09c
Author: yusukes <yusukes@chromium.org>
Date: Thu May 11 04:14:09 2017

Don't kill a process if it has IMPORTANT_FOREGROUND or higher priority

Actively killing IMPORTANT_FOREGROUND processes in Chrome doesn't make
much sense because the processes are user-visible to some extent and
because of that, they will likely respawn immediately after being
killed.

BUG=706048
TEST=try
TEST=Follow crbug.com/706048#c76, confirm we no longer see infinite
 the kill-respawn loop.

Review-Url: https://codereview.chromium.org/2852093003
Cr-Commit-Position: refs/heads/master@{#468788}
(cherry picked from commit b9b315ec73b88bebaaa1b45dd85b338392eb8fef)

Review-Url: https://codereview.chromium.org/2874023003 .
Cr-Commit-Position: refs/branch-heads/3071@{#509}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/22a4c0ee2fa868cb6560730ab8e579ddccb8c09c/chrome/browser/memory/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/22a4c0ee2fa868cb6560730ab8e579ddccb8c09c/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc

Project Member

Comment 101 by bugdroid1@chromium.org, May 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/366a857bd21112ffbb191aa37932cfae74d24c10

commit 366a857bd21112ffbb191aa37932cfae74d24c10
Author: yusukes <yusukes@chromium.org>
Date: Thu May 11 04:23:56 2017

Reland: Do not kill recently killed ARC processes again

ARC processes sometimes respawn right after being killed. In that case,
killing them every time is just a waste of resources.

BUG=706048
TEST=try
TEST=Follow crbug.com/706048#c76, confirm Chrome no longer kills the
 processes too often.
TEST=gn gen out/Debug --args='is_debug=true use_goma=true target_os="chromeos"
 is_component_build=true remove_webcore_debug_symbols=true' &&
 ninja -C out/Debug unit_tests

Review-Url: https://codereview.chromium.org/2861613002
Cr-Commit-Position: refs/heads/master@{#469092}
(cherry picked from commit eba2b9c32c0df7536d5884ab0d645b2701815206)

Review-Url: https://codereview.chromium.org/2877653003 .
Cr-Commit-Position: refs/branch-heads/3071@{#510}
Cr-Branched-From: a106f0abbf69dad349d4aaf4bcc4f5d376dd2377-refs/heads/master@{#464641}

[modify] https://crrev.com/366a857bd21112ffbb191aa37932cfae74d24c10/chrome/browser/memory/tab_manager_delegate_chromeos.cc
[modify] https://crrev.com/366a857bd21112ffbb191aa37932cfae74d24c10/chrome/browser/memory/tab_manager_delegate_chromeos.h
[modify] https://crrev.com/366a857bd21112ffbb191aa37932cfae74d24c10/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc

Merged to M59. I think it's too late to merge them to M58 Chrome, so I'd close this now. Please reopen if you strongly think they need to be in M58.

Status: Fixed (was: Started)
Status: Archived (was: Fixed)
Showing comments 5 - 104 of 104 Older

Sign in to add a comment