x86: When memory is low, 5 second delay in i915 can lead to sluggies or hung task reboot |
||||
Issue descriptionForked from b/36197895 Basically when memory is super tight on x86 devices (Caroline, in particular, was tested), we can end up in the kernel OOM killer. That's OK, but... In the i915 driver (for graphics on x86 devices) there's code that tries a last ditch attempt to free graphics memory to avoid an OOM. This code has a few issues: 1. It keeps trying for up to 5 seconds to get the lock so it can try to free memory. This can block an OOM from happening for 5 seconds. Due to mutex contention (especially with ARC++ and binder) and that fact that each OOM kill might not give back that much memory, we might need quite a few OOMs to actually free up memory in the system, so this can potentially end up blocking the system for a long time. See bug #702707 for some details about why we might need more than one OOM kill and b/36197895 for some examples of the 5 second delay killing us. 2. At least on kernel 3.18 on Caroline, it appears that we often end up waiting the whole 5 seconds. See printouts in b/36197895 like: [ 5136.215108] Unable to purge GPU memory due lock contention. ...probably other processes in the system are spinning trying to free up memory themselves and are constantly calling the shrinker functions, which themselves are all trying to grab the mutex and free up memory. ...it's possible that in later upstream kernels the rest of the memory subsystem doesn't spin quite as aggressively so maybe this isn't such a big deal there? I haven't personally confirmed. 3. By the time this code runs the system has already had lots of time to try freeing memory through the normal count/scan methods (as proof, those are actually the ones blocking the OOM handler from running). 4. The code assumes that OOM is 1 step away from catastrophe (hence waiting 5 seconds to try a last ditch effort to free memory). On Chrome OS this isn't actually true. OOM isn't the end of the world and it's actually worse to kill the system performance than it is to OOM. --- In any case, marcheu@ killed this code in 3.18 and 4.4. See, for instance: https://chromium-review.googlesource.com/c/471880/ https://chromium-review.googlesource.com/c/471881/ --- Question: do we want this on M-58? ...adding Merge-Requested to start the discussion.
,
Apr 10 2017
@1: it seems good to me, so will wait for marcheu@ to confirm.
,
Apr 11 2017
Sure, let's merge.
,
Apr 11 2017
,
Jan 22 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by bhthompson@google.com
, Apr 10 2017