New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 701848 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 702243


Participants' hotlists:
Hotlist-4


Sign in to add a comment

OOM killer should pick processes based only on priority

Project Member Reported by semenzato@chromium.org, Mar 15 2017

Issue description

The oom_badness() function in oom_kill.c uses heuristics that include oom_score_adj (controllable from user space) and the process size.  In a Chromium OS/ARC++ environment, we want tighter control over what processes to kill and should use oom_score_adj exclusively.
 

Comment 1 by cylee@google.com, Mar 15 2017

Just a note: I assume kernel kills processes which uses more memory because it can reclaim memory faster. If kernel couldn't reclaim memory in time, system may actually run out of memory and cause crash or panic. I hope that removing the heuristic wouldn't have a big impact from the perspective?
First let's see if this really makes a big difference.  Suppose our typical process uses 100 MB, that's 25,000 pages.

	adj = (long)p->signal->oom_score_adj;
        if (adj == OOM_SCORE_ADJ_MIN) {
		task_unlock(p);
		return 0;
	}

For renderers, the range of adj is roughly 100-1000.  (I see 300, 300, 417, 533 on my cyan.)
Other processes have either 0 or -1000.  Let's say adj = 400 here.

	/*
	 * The baseline for the badness score is the proportion of RAM that each
	 * task's rss, pagetable and swap space use.
	 */
	points = get_mm_rss(p->mm) + atomic_long_read(&p->mm->nr_ptes) +
		 get_mm_counter(p->mm, MM_SWAPENTS);
	task_unlock(p);

So here points = 25,000.

	/*
	 * Root processes get 3% bonus, just like the __vm_enough_memory()
	 * implementation used by LSMs.
	 */
	if (has_capability_noaudit(p, CAP_SYS_ADMIN))
		points -= (points * 3) / 100;

Root processes (some daemons) are all tiny, and likely already unkillable.

	/* Normalize to oom_score_adj units */
	adj *= totalpages / 1000;
	points += adj;

Say we're on a 2GB system with 2GB swap.  That's 4GB total, which is 1,000,000 pages.
So adj is multiplied by 1000 and becomes adj = 400,000.  So it completely dominates and does not depend on the process size.  And that's for a small renderer.

	/*
	 * Never return 0 for an eligible task regardless of the root bonus and
	 * oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here).
	 */
	return points > 0 ? points : 1;

So maybe there isn't a problem, but I'll check which actual range of oom_score_adj we use.
To answer #2: yes that could be a concern.  However, I have yet to see a case in which the kernel panicked when there were still killable processes.  Since it's the kernel, it always can choose between killing a process vs. panicking.  For a user-level killer/discarder, that's different.
Owner: semenzato@chromium.org
However:

I have checked the behavior of oom score adjustments from the Chrome browser.  The range is 300-1000, and the scores are spread uniformly across all processes.

Suppose then that we have 20 tabs, with a spread of 35 in adj, which translates to 35,000 points.  That corresponds to about 150 MB.  So two adjacent processes in the list which differ by more than about 150 MB in size could get swapped.

Basically the chrome code makes the assumption that only the ordering is important.

I realize that killing a larger process reaps a bigger benefit, and if two processes are not actively used, we might as well kill the larger one.  Note that in the presence of only a few large processes, the oom_score_adj spread is greater, so these calculations are different.

But still, this makes it harder to understand edge cases.


Blocking: 702243
Status: Assigned (was: Untriaged)

Sign in to add a comment