New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 709999 link

Starred by 4 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Apr 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug


Participants' hotlists:
Hotlist-3
Hotlist-1


Sign in to add a comment

x86: When memory is low, 5 second delay in i915 can lead to sluggies or hung task reboot

Project Member Reported by diand...@chromium.org, Apr 10 2017

Issue description

Forked from b/36197895

Basically when memory is super tight on x86 devices (Caroline, in particular, was tested), we can end up in the kernel OOM killer.  That's OK, but...

In the i915 driver (for graphics on x86 devices) there's code that tries a last ditch attempt to free graphics memory to avoid an OOM.

This code has a few issues:

1. It keeps trying for up to 5 seconds to get the lock so it can try to free memory.  This can block an OOM from happening for 5 seconds.  Due to mutex contention (especially with ARC++ and binder) and that fact that each OOM kill might not give back that much memory, we might need quite a few OOMs to actually free up memory in the system, so this can potentially end up blocking the system for a long time.  See  bug #702707  for some details about why we might need more than one OOM kill and b/36197895 for some examples of the 5 second delay killing us.


2. At least on kernel 3.18 on Caroline, it appears that we often end up waiting the whole 5 seconds.  See printouts in b/36197895 like:

[ 5136.215108] Unable to purge GPU memory due lock contention.

...probably other processes in the system are spinning trying to free up memory themselves and are constantly calling the shrinker functions, which themselves are all trying to grab the mutex and free up memory.

...it's possible that in later upstream kernels the rest of the memory subsystem doesn't spin quite as aggressively so maybe this isn't such a big deal there?  I haven't personally confirmed.


3. By the time this code runs the system has already had lots of time to try freeing memory through the normal count/scan methods (as proof, those are actually the ones blocking the OOM handler from running).


4. The code assumes that OOM is 1 step away from catastrophe (hence waiting 5 seconds to try a last ditch effort to free memory).  On Chrome OS this isn't actually true.  OOM isn't the end of the world and it's actually worse to kill the system performance than it is to OOM.

---

In any case, marcheu@ killed this code in 3.18 and 4.4.  See, for instance:

https://chromium-review.googlesource.com/c/471880/
https://chromium-review.googlesource.com/c/471881/

---

Question: do we want this on M-58?

...adding Merge-Requested to start the discussion.
 
Labels: -Merge-Request-58 Merge-Approved-58
If Doug and Stephane think this is a good idea to bring into 58, SGTM.
Cc: bhthompson@chromium.org
Owner: marc...@chromium.org
@1: it seems good to me, so will wait for marcheu@ to confirm.
Sure, let's merge.
Labels: -Merge-Approved-58 Merge-Merged
Status: Fixed (was: Untriaged)

Comment 5 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment