New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 675021 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug


Participants' hotlists:
Hotlist-3
Hotlist-1


Sign in to add a comment

dm_bufio improvements

Project Member Reported by diand...@chromium.org, Dec 16 2016

Issue description

In http://crosbug.com/p/59819 it can be seen that the patch we landed:

https://chromium-review.googlesource.com/411946

...had some feedback upstream.  In response to that, upstream landed two other patches:

  41c73a49df31 dm bufio: drop the lock when doing GFP_NOIO allocation
  d12067f428c0 dm bufio: don't take the lock in dm_bufio_shrink_count

We should probably take both of those into any kernels that the original fix landed in.
 
remote:   https://chromium-review.googlesource.com/419680 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation        
remote:   https://chromium-review.googlesource.com/421129 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count        

Project Member

Comment 2 by bugdroid1@chromium.org, Dec 22 2016

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/21eafe1899ac3ed441fbc5ea9908a5788f4bbd72

commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed Nov 23 22:04:00 2016

UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.

BUG= chromium:675021 
TEST=See bug

Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c)
Reviewed-on: https://chromium-review.googlesource.com/419680
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/21eafe1899ac3ed441fbc5ea9908a5788f4bbd72/drivers/md/dm-bufio.c

Project Member

Comment 3 by bugdroid1@chromium.org, Dec 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f1a7c2e30c46db262972d7e4d608d2bf77f375da

commit f1a7c2e30c46db262972d7e4d608d2bf77f375da
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed Nov 23 21:52:01 2016

UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.

BUG= chromium:675021 
TEST=See bug

Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a)
Reviewed-on: https://chromium-review.googlesource.com/421129
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/f1a7c2e30c46db262972d7e4d608d2bf77f375da/drivers/md/dm-bufio.c

Project Member

Comment 4 by bugdroid1@chromium.org, Dec 22 2016

Labels: merge-merged-release-R56-9000.B-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/01c78142cf558105a5d3036a809b7a67f29d39bb

commit 01c78142cf558105a5d3036a809b7a67f29d39bb
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed Nov 23 21:52:01 2016

UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.

BUG= chromium:675021 
TEST=See bug

Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a)
Reviewed-on: https://chromium-review.googlesource.com/421129
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da)
Reviewed-on: https://chromium-review.googlesource.com/423251

[modify] https://crrev.com/01c78142cf558105a5d3036a809b7a67f29d39bb/drivers/md/dm-bufio.c

Project Member

Comment 5 by bugdroid1@chromium.org, Dec 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3a6653fd33b3d8b56fdbbefa974537c8e18c01b9

commit 3a6653fd33b3d8b56fdbbefa974537c8e18c01b9
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed Nov 23 22:04:00 2016

UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.

BUG= chromium:675021 
TEST=See bug

Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c)
Reviewed-on: https://chromium-review.googlesource.com/419680
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72)
Reviewed-on: https://chromium-review.googlesource.com/423250

[modify] https://crrev.com/3a6653fd33b3d8b56fdbbefa974537c8e18c01b9/drivers/md/dm-bufio.c

OK, landed on 4.4 on ToT and R56.

Dan: I picked to 3.18 too and am leaving it to you to test.

I haven't landed the original version on 3.14 and older yet since the problem wasn't as obvious there and I haven't had enough time to test.  Hopefully we can get that eventually...
Cc: igo@chromium.org diand...@chromium.org
Components: OS>Kernel
Labels: -M-56 Merge-Request-58 Arch-MIPS
Owner: semenzato@chromium.org
I tested 3.18 ToT (caroline) with and without these patches.  Bottom line: I could not reproduce the hang with the patches, and I could reproduce it pretty quickly without the patches.

The hang is pretty easy to achieve without the patch so it is a serious bug.  I recommend merging this to 3.18 both into ToT and the next release branch (R58 I think).
Project Member

Comment 8 by sheriffbot@chromium.org, Apr 13 2017

Labels: -Merge-Request-58 Merge-Review-58 Hotlist-Merge-Review
This bug requires manual review: We are only 11 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Arch-MIPS
Owner: diand...@chromium.org
By the way, the merge request is a bit premature.  We want to test for a few days on canaries first.  Note that these patches have been in 4.4 for a while, although their impact may be different there because that code (the memory manager) keeps changing.

Either Doug or I can merge when the time comes.

(I am removing the Arch-MIPS tag... where did that come from???)

Comment 10 by bccheng@google.com, Apr 14 2017

Cc: bccheng@chromium.org
Project Member

Comment 12 by bugdroid1@chromium.org, Apr 14 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f

commit caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri Apr 14 06:54:11 2017

UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.

BUG= chromium:675021 
TEST=See bug

Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c)
Reviewed-on: https://chromium-review.googlesource.com/419680
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72)
Reviewed-on: https://chromium-review.googlesource.com/423252
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f/drivers/md/dm-bufio.c

Project Member

Comment 13 by bugdroid1@chromium.org, Apr 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88c2b3e81afceeb741cbcefb290906b7103a9809

commit 88c2b3e81afceeb741cbcefb290906b7103a9809
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri Apr 14 06:54:14 2017

UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.

BUG= chromium:675021 
TEST=See bug

Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a)
Reviewed-on: https://chromium-review.googlesource.com/421129
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da)
Reviewed-on: https://chromium-review.googlesource.com/423253
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/88c2b3e81afceeb741cbcefb290906b7103a9809/drivers/md/dm-bufio.c

Comment 14 by bccheng@google.com, Apr 14 2017

I cherry-picked the CLs to my build but still got kernel panics (see attached file). The good thing is dm_bufio_shrink_count is no longer present in the stack track, but hung_task watchdog still fires for some reason.
ramoops.txt
128 KB View Download
Good job Ben.

I think this is a new OOM kill deadlock, so I am opening another bug for it.

I am also wondering how many of these deadlocks exist.  We need to figure out why we're not sufficiently aggressive with tab discards and app kills.  We should try to stay away from OOM kills for this and other reasons.


@14: yes, please open a new bug.  BTW: I've never heard of jdb2, but one of the top hits when I search is <https://access.redhat.com/solutions/96783>.
Cc: semenzato@chromium.org
#16 thank you, I opened issue 711673 and added your comments there.
... and by the way, it's not necessarily related to OOM kills, although the load increase from high swap activity is likely a factor.

Comment 19 by bccheng@google.com, Apr 17 2017

I am seeing occasional stalls on ToT R60 which contains these CLs. The machine will freeze for a few seconds but can recover. Feedback is at 

https://feedback.corp.google.com/product/208/neutron?lView=rd&lRSort=1&lROrder=2&lRFilter=1&lReportSearch=bccheng&lReport=57782171053

And I also collected a perf profile. It shows heavy use of CPU cycles in swapper:

# Overhead  Command          Shared Object                 Symbol
# ........  ...............  ............................  .....................................
#
    18.79%  swapper          [kernel.kallsyms]             [k] mwait_idle_with_hints.constprop.3
     1.92%  kswapd0          [kernel.kallsyms]             [k] lzo1x_1_do_compress
     0.83%  swapper          [unknown]                     [k] 0000000000000000

I'm fairly certain the stall is introduced in the past two weeks or so. I will roll back to an older version to see when the regression happens.
swapper_mwait_with_hints.txt
4.1 MB View Download
I am wondering if mwait_idle_with_hints isn't just the CPU idle handler.  Using 2% of CPU for compression should not be an issue.


Labels: Merge-Request-59 M-59
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 18 2017

Labels: merge-merged-release-R59-9460.B-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d3dcaa7ca2338901b03584095faf22e7a2c3ae69

commit d3dcaa7ca2338901b03584095faf22e7a2c3ae69
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue Apr 18 00:29:13 2017

UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.

BUG= chromium:675021 
TEST=See bug

Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c)
Reviewed-on: https://chromium-review.googlesource.com/419680
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72)
Reviewed-on: https://chromium-review.googlesource.com/423252
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
(cherry picked from commit caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f)
Reviewed-on: https://chromium-review.googlesource.com/479797
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/d3dcaa7ca2338901b03584095faf22e7a2c3ae69/drivers/md/dm-bufio.c

Project Member

Comment 23 by bugdroid1@chromium.org, Apr 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3c2b9fbcd29b95809db4147760865b113a045c3c

commit 3c2b9fbcd29b95809db4147760865b113a045c3c
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue Apr 18 00:29:17 2017

UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.

BUG= chromium:675021 
TEST=See bug

Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a)
Reviewed-on: https://chromium-review.googlesource.com/421129
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da)
Reviewed-on: https://chromium-review.googlesource.com/423253
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
(cherry picked from commit 88c2b3e81afceeb741cbcefb290906b7103a9809)
Reviewed-on: https://chromium-review.googlesource.com/479798
Commit-Queue: Luigi Semenzato <semenzato@chromium.org>

[modify] https://crrev.com/3c2b9fbcd29b95809db4147760865b113a045c3c/drivers/md/dm-bufio.c

#19 should we open a separate bug for this?

Also, if it is easy to reproduce, do you have enough time for a triple alt-volup-X and then send feedback, so we can see if it's stuck in the kernel and what it's waiting for?

Thanks!

Project Member

Comment 25 by sheriffbot@chromium.org, Apr 18 2017

Labels: -Merge-Request-59 Hotlist-Merge-Approved Merge-Approved-59
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-59 Merge-Merged
Sorry, I didn't wait for Merge-Approved after discussing with Chris, since there was some urgency to get this in before the nightly build.  Hope it's OK.

Are we still looking at this for 58? 

Comment 28 by igo@chromium.org, May 2 2017

I will like the big changes for memory to be in 58 and I think we need this as well for stability. Luigi?
Cc: sonnyrao@chromium.org
If we take these into R58 they could affect anyone on a 3.18 kernel.  At this point they're pretty well tested, but it's also very late for R58.

People thought that they made a difference to how things performed at low memory time, so I guess it depends on how critical it is to make low memory work better in R58 timeframe.
fyi Doug and I talked about doing this for 3.14 as well -- I was initially enthusiastic about pulling it to older kernels, however I looked through a bunch of kcrash reports for minnie and couldn't find a single instance so I'm inclined to just leave them alone now.
@31: That's definitely good data and matches what I saw when I was originally working on that patch.  One thing I was never 100% sure about was how much it mattered that tasks were stuck in dm_bufio_shrink_count().  I guess that would prevent them from getting into the direct reclaim path, right?  ...but I also always figured that maybe there were enough tasks working on that?

In any case, it still seems good to get the fix...
Project Member

Comment 33 by bugdroid1@chromium.org, May 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0d40ed0d87c400807553474b78105af577cb8b8e

commit 0d40ed0d87c400807553474b78105af577cb8b8e
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed May 03 03:53:28 2017

UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation

If the first allocation attempt using GFP_NOWAIT fails, drop the lock
and retry using GFP_NOIO allocation (lock is dropped because the
allocation can take some time).

Note that we won't do GFP_NOIO allocation when we loop for the second
time, because the lock shouldn't be dropped between __wait_for_free_buffer
and __get_unclaimed_buffer.

BUG= chromium:675021 
TEST=See bug

Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c)
Reviewed-on: https://chromium-review.googlesource.com/419680
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72)
Reviewed-on: https://chromium-review.googlesource.com/493918

[modify] https://crrev.com/0d40ed0d87c400807553474b78105af577cb8b8e/drivers/md/dm-bufio.c

Project Member

Comment 34 by bugdroid1@chromium.org, May 3 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88c4c4ef0638af91e9fce1c498c88be26b67c3ed

commit 88c4c4ef0638af91e9fce1c498c88be26b67c3ed
Author: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed May 03 03:53:39 2017

UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count

dm_bufio_shrink_count() is called from do_shrink_slab to find out how many
freeable objects are there. The reported value doesn't have to be precise,
so we don't need to take the dm-bufio lock.

BUG= chromium:675021 
TEST=See bug

Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a)
Reviewed-on: https://chromium-review.googlesource.com/421129
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da)
Reviewed-on: https://chromium-review.googlesource.com/493919

[modify] https://crrev.com/88c4c4ef0638af91e9fce1c498c88be26b67c3ed/drivers/md/dm-bufio.c

Labels: -Hotlist-Merge-Review -Hotlist-Merge-Approved -Merge-Review-58 M-58
Status: Fixed (was: Started)

Comment 36 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment