New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 730323 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Reevaluate min_filelist_kbytes

Project Member Reported by teravest@chromium.org, Jun 7 2017

Issue description

This tunable touches a bunch of things inside mm, and was introduced before swap.

There's a cognitive load we carry with this that should be justified, and we should have good explanations for keeping it, and an argument for the value(s) we provide.
 
Cc: dtor@chromium.org
Cc: groeck@chromium.org
When we took min_filelist_kbytes out then a lot of really simple memory pressure tests got REALLY bad, so we need to do something.  ...but having a solution that's out of tree isn't so great because we don't have enough "mm" expertise in Chrome OS to know that all the corner cases of our hacky patch work on each new version of Linux.

Looking at the current state of the world, it seems likely that we're missing a whole bunch of important places where we need to account for "min_filelist_kbytes" and these may be causing the mm code to enter into some nasty thrashing / looping / spinning cases.

---

Upstream has generally considered min_filelist_kbytes a bit of a hack.  I think generally folks suggest using "swappiness" instead, possibly even extending swappiness to have a max of 200 instead of 100.  I think testing that Luigi did showed that this still wasn't quite as good.

---

General justification for needing something like min_filelist_kbytes is that on normal Linux systems swap is twice the cost of throwing out a file-backed page (for swap, you need to write out, then read back later).

With zram, swap is cheaper (in terms of time) compared to throwing out a file-backed page.  It uses a lot more CPU, but it can be done more quickly.


Labels: -Pri-3 Pri-2
I'd also consider this to be somewhat of a higher priority.  I think some of the spinning we're doing is contributing to our bad behavior under memory pressure.  ...but probably not a P1 since I think existing patches alleviate the emergency.
Also please note issue 718270 where some of these problems are discussed.

Min_filelist_kbytes was introduced to avoid serious thrashing on code pages.  By "serious thrashing" I mean the system would not hang, but slow down to the point of becoming unusable. As was pointed out, this happened before the introduction of zram, and the problem, and hacky solution, seem orthogonal to zram.

Without this hack, the kernel doesn't appear to have a lower bound on the number of code pages.  Also there is no "thrash detection", but in fact "thrashing" may be hard to define, since allowing short-lasting spurts of page faults may help optimize RAM use.  The problem lies in the frequency and duration of such spurts.  Also, when low on RAM/swap, the only alternative (from a kernel perspective) to such thrashing, once it starts happening, is to OOM-kill, which we know is a can of worms.

Thus the general idea of keeping a minimum number of LRU file pages in RAM seems pretty reasonable to me.  The problem we're having right now (as pointed out in issue 718270) is that the concept of a fixed min_filelist_kbytes no longer works well with the introduction of ARC++.  I started looking on whether we could make this dynamic based on the code size of all processes, taking code page sharing into account etc. but haven't gone very far.
another alternative is to try an mlock some or all of the chrome text to prevent paging
I think the design space of possible solutions is large. :)

Fundamentally, min_filelist_kbytes is applied to prevent refaulting of file pages, correct? The number of "file page refaults" seems like a reasonable metric to examine when evaluating changes.

Johannes Weiner uses that as one metric (of many) to evaluate an alternative solution here:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1160326.html
re #6 -- I think that's a pretty good metric, but the most important thing for us is really preventing UI jank and that's related to file refaulting of *certain* pages.  

I'm not sure whether the distinction between all refaults and just chrome text or library refaults is super important but I feel like we should investigate what is being faulted a bit more 
Status: Assigned (was: Untriaged)

Sign in to add a comment