OOM killer should pick processes based only on priority |
||||
Issue descriptionThe oom_badness() function in oom_kill.c uses heuristics that include oom_score_adj (controllable from user space) and the process size. In a Chromium OS/ARC++ environment, we want tighter control over what processes to kill and should use oom_score_adj exclusively.
,
Mar 15 2017
First let's see if this really makes a big difference. Suppose our typical process uses 100 MB, that's 25,000 pages.
adj = (long)p->signal->oom_score_adj;
if (adj == OOM_SCORE_ADJ_MIN) {
task_unlock(p);
return 0;
}
For renderers, the range of adj is roughly 100-1000. (I see 300, 300, 417, 533 on my cyan.)
Other processes have either 0 or -1000. Let's say adj = 400 here.
/*
* The baseline for the badness score is the proportion of RAM that each
* task's rss, pagetable and swap space use.
*/
points = get_mm_rss(p->mm) + atomic_long_read(&p->mm->nr_ptes) +
get_mm_counter(p->mm, MM_SWAPENTS);
task_unlock(p);
So here points = 25,000.
/*
* Root processes get 3% bonus, just like the __vm_enough_memory()
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
points -= (points * 3) / 100;
Root processes (some daemons) are all tiny, and likely already unkillable.
/* Normalize to oom_score_adj units */
adj *= totalpages / 1000;
points += adj;
Say we're on a 2GB system with 2GB swap. That's 4GB total, which is 1,000,000 pages.
So adj is multiplied by 1000 and becomes adj = 400,000. So it completely dominates and does not depend on the process size. And that's for a small renderer.
/*
* Never return 0 for an eligible task regardless of the root bonus and
* oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here).
*/
return points > 0 ? points : 1;
So maybe there isn't a problem, but I'll check which actual range of oom_score_adj we use.
,
Mar 15 2017
To answer #2: yes that could be a concern. However, I have yet to see a case in which the kernel panicked when there were still killable processes. Since it's the kernel, it always can choose between killing a process vs. panicking. For a user-level killer/discarder, that's different.
,
Mar 15 2017
However: I have checked the behavior of oom score adjustments from the Chrome browser. The range is 300-1000, and the scores are spread uniformly across all processes. Suppose then that we have 20 tabs, with a spread of 35 in adj, which translates to 35,000 points. That corresponds to about 150 MB. So two adjacent processes in the list which differ by more than about 150 MB in size could get swapped. Basically the chrome code makes the assumption that only the ordering is important. I realize that killing a larger process reaps a bigger benefit, and if two processes are not actively used, we might as well kill the larger one. Note that in the presence of only a few large processes, the oom_score_adj spread is greater, so these calculations are different. But still, this makes it harder to understand edge cases.
,
Mar 16 2017
,
Mar 17 2017
,
Aug 1
|
||||
►
Sign in to add a comment |
||||
Comment 1 by cylee@google.com
, Mar 15 2017