android system processes should not get randomly OOM-killed |
||||||
Issue descriptionhttps://feedback.corp.google.com/#/Report/59663784209 There are OOM kills and these lines are reported for one of them: 05-07 22:08:28.950 291 304 W ActivityManager: Killing processes LowMemoryKill at adjustment 900 05-07 22:08:28.957 291 304 I ActivityManager: Killing 16838:com.google.android.gms.unstable/u0a14 (adj 904): LowMemoryKill This looks like an android process, so the main question is why does this process still exist since we're supposed to do all the killing from Chrome, and the second question (which may be moot) is why does a system process has an OOM kill score of 900, it should be -1000.
,
May 8 2017
Thank you Cheng-yu! Further clarification request: is the LowMemoryKill process the entity that receives the IPC and kills the other processes as requested? Is it killing itself in this case? Also, this is a very different situation from standalone Android. In that case, after you kill most or all apps, it's pretty much guaranteed that there will be enough memory for the system processes. Thus it is possible then that these kill priorities are assigned incorrectly, but it doesn't matter because those processes will never be OOM-killed. In the ARC++ case, after all android apps have been killed (either by Chrome or by kernel OOM), there can still be pressure from the Chrome side. We probably don't want to have random android system processes OOM-killed. Therefore their OOM kill priority should be lower than that of all Chrome tabs. In theory this should be enough. However we may want to shut down the entire Android container rather than killing the foreground tab(s)---however unlikely this may be. Then one Android system process should have a higher priority than the rest, and its death should bring down the entire container. (Or course, this assumes that the Chrome tab discarder/app killer was too slow and the kernel OOM killer had to take over, which might be unlikely under these circumstances.)
,
May 8 2017
Ha no, it's a weird legacy log message from upstream Android: https://cs.corp.google.com/aosp-master/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java?rcl=52bc790f616f3f1578f31e7161eb93f670213bdb&l=13049 "LowMemoryKill" is the "reason" of the process kill. I don't know why it's printed in such a funny manner. To my understanding, killing that specific process doesn't bring down Android. As I noted in the other bug, kill -9 all those processes doesn't seems to have a fatal impact to Android. Also in current implementation the tab discarder should have the same killing order as kernel OOM. So the process indeed is prone to be killed by tab discarder. Nevertheless, higher priority of Android system processes sounds fine to me. But as the discussion https://bugs.chromium.org/p/chromium/issues/detail?id=706048 says, there doesn't seem to be an elegant way to identify "important" processes. Probably we'll need to whitelist them by process name ?
,
May 8 2017
WAI probably? As cylee@ wrote, in ARC, /system/bin/lmkd does nothing. Actually I'm currently trying to remove the no-op daemon from the container. Also, ActivityManager is not killing the lmkd daemon. It's just killing com.google.android.gms.unstable process which is low priority. I don't think killing the process affects UX.
,
May 8 2017
> We probably don't want to have random android system processes OOM-killed. This is done in https://bugs.chromium.org/p/chromium/issues/detail?id=706048#c85 . I'm not saying this is perfect, and I do think we should optimize this further (e.g. giving a better score to each ARC process,) but basic protections are already in place.
,
May 8 2017
Yes sorry, I misunderstood what was getting killed. Thank you for explaining. Overall though, I'd be a lot happier if we only killed Android processes connected to user apps, instead of trying to reason that it's fine to kill various system processes. It may be hard to do, but it seems safer and more robust. Android system processes are unlikely to use much RAM anyhow. Feel free to close if we have some other bug tracking this. Thanks!
,
May 8 2017
Depending on a definition of "system process", we might already be doing that.
Chrome only kills processes at the bottom of the process list (sorted by importance) from ActivityManagerService. So at least from ActivityManagerService' point of view, com.google.android.gms for example is in the safe-to-kill category unless the current foreground app is actively using it.
Process LRU list (sorted by oom_adj, 22 total, non-act at 3, non-svc at 3):
PERS #21: sys F/ /P trm: 0 303:system/1000 (fixed)
PERS #20: pers F/ /P trm: 0 412:com.android.systemui/u0a21 (fixed)
PERS #19: pers F/ /P trm: 0 590:org.chromium.arc.crash_collector/u0a0 (fixed)
PERS #18: pers F/ /P trm: 0 597:org.chromium.arc.file_system/u0a1 (fixed)
PERS #17: pers F/ /P trm: 0 618:org.chromium.arc.gms/u0a26 (fixed)
PERS #16: pers F/ /P trm: 0 634:org.chromium.arc.ime/u0a28 (fixed)
PERS #15: pers F/ /P trm: 0 649:org.chromium.arc.tts/u0a29 (fixed)
PERS #14: pers F/ /P trm: 0 665:org.chromium.arc.applauncher/1000 (fixed)
PERS #13: pers F/ /P trm: 0 671:org.chromium.arc.intent_helper/u0a1 (fixed)
Proc # 0: fore T/A/T trm: 0 858:com.android.vending/u0a17 (top-activity)
Proc # 1: vis F/ /SB trm: 0 562:com.google.android.gms.persistent/u0a13 (service)
com.google.android.gms/.backup.BackupTransportService<=Proc{303:system/1000}
Proc #12: vis F/ /IF trm: 0 1167:com.google.android.tts/u0a40 (service)
com.google.android.tts/.service.GoogleTTSService<=Proc{649:org.chromium.arc.tts/u0a29}
Proc # 2: home B/ /HO trm: 0 486:org.chromium.arc.home/u0a27 (home)
Proc # 5: cch B/ /CE trm: 0 1646:com.google.process.gapps/u0a37 (cch-empty)
Proc # 4: cch B/ /CE trm: 0 821:com.google.process.gapps/u0a13 (cch-empty)
Proc # 3: cch B/ /CE trm: 0 802:com.google.android.gms/u0a13 (cch-empty)
Proc # 8: cch+2 B/ /CE trm: 0 1441:com.android.defcontainer/u0a6 (cch-empty)
Proc # 7: cch+2 B/ /CE trm: 0 1114:com.google.android.gms.unstable/u0a13 (cch-empty)
Proc # 6: cch+2 B/ /CE trm: 0 961:android.process.acore/u0a4 (cch-empty)
Proc #11: cch+4 B/ /CE trm: 0 1257:com.android.phone/1001 (cch-empty)
Proc #10: cch+4 B/ /CE trm: 0 690:com.android.printspooler/u0a44 (cch-empty)
Proc # 9: cch+4 B/ /CE trm: 0 1388:com.google.android.gms:car/u0a13 (cch-empty)
Having that said, I'd still keep this open for two reasons:
1) I think the way Chrome uses the process list for selecting a process to kill and for adjusting OOM scores is not consistent with /system/bin/lmkd's. We should at least check what exactly lmkd does, and consider doing the same as needed.
2) Chrome's current OOM adjustment seems to have an issue. Let's say at t=0 Process X was 'cch' (i.e. killable) with OOM score Y (>0), but at t=1 X moved to 'fore' (i.e. not killable). Although Chrome never kills X when it's 'fore', Chrome doesn't adjust X's oom_score_adj to negative. Because of this, the kernel still may kill X even when it's 'fore'.
org.chromium.arc.home, Java processes with 'pers' and 'sys', and C++ processes like netd always have -1000 adjustment, but other Java processes might not. In the example above, com.android.vending with 'fore' might have a positive OOM adjustment which is killable from the kernel POV. I think we should fix the Chrome issue (WDYT cylee@? I'll send you a patch shortly.)
,
May 9 2017
Re 2) in comment#7, I noticed that the current Chrome code uses -1000 oom_adj_score only for PERSISTENT, PERSISTENT_UI, and HOME processes. Chrome does not use -1000 for 'fore' but I guess it's still WAI. For now, let me reorganize some of the Chrome functions so that we can experiment other settings more easily. With the current Chrome code, it's difficult to let a process have -1000 only when it is 'fore', for example.
,
May 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ee315bafcc115470e4b0b994b48602650b417db2 commit ee315bafcc115470e4b0b994b48602650b417db2 Author: yusukes <yusukes@chromium.org> Date: Thu May 11 22:46:38 2017 Refactor ARC OOM handler code This CL moves around some code for Chrome OS OOM handling to make the code a bit easier to follow: * Previously, code for selecting "protected" processes was scattered around the file. One was in TabManagerDelegate::GetSortedCandidates for protecting PERSISTENT processes from *both* Chrome and kernel OOM handler, and the other was in LowMemoryKillImpl for protecting FOCUSED and VISIBLE processes from *only* Chrome. This CL moves the former to AdjustOomPrioritiesImpl to simplify the code. * Previously, tab_manager_delegate_chromeos.cc also directly used arc::mojom::ProcessState enums. This CL moves the code to arc::ArcProcess with unit tests to hide ARC details from the TabManagerDelegate class. * Previously, AdjustOomPrioritiesImpl had an assumption that the processes the function protected from the kernel always had -1000 OOM adjustment scores. The assumption is fine, but will make it difficult for us to e.g. protect arc::mojom::ProcessState::TOP processes from the kernel because unlike PERSISTENT ones, TOP processes change time to time and may have positive OOM adjustment scores. With this CL, AdjustOomPrioritiesImpl explicitly sets -1000 for processes the function protects to remove the assumption. In the future, we can modify ArcProcess::IsKillableForKernel as we like to experiment various OOM scoring policy without any artificial restrictions. * Add more unit tests for better code coverage. This CL does NOT change the current Chrome/kernel OOM handling behavior at all. This is pure code refactoring. Also, this CL does not touch the code for handling Chrome tabs. This is only for ARC. BUG= 719537 TEST=try TEST=just in case, manually confirmed PERSISTENT processes still have -1000 adjustment. Review-Url: https://codereview.chromium.org/2874543002 Cr-Commit-Position: refs/heads/master@{#471099} [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/chromeos/arc/process/arc_process.cc [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/chromeos/arc/process/arc_process.h [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/chromeos/arc/process/arc_process_unittest.cc [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/memory/tab_manager_delegate_chromeos.cc [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/memory/tab_manager_delegate_chromeos.h [modify] https://crrev.com/ee315bafcc115470e4b0b994b48602650b417db2/chrome/browser/memory/tab_manager_delegate_chromeos_unittest.cc
,
Sep 11 2017
We already have several, more specific bugs for the issue. Closing this one.
,
Jan 22 2018
,
Jan 23 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by cylee@chromium.org
, May 8 2017