dm_bufio improvements |
|||||||||||||||||
Issue descriptionIn http://crosbug.com/p/59819 it can be seen that the patch we landed: https://chromium-review.googlesource.com/411946 ...had some feedback upstream. In response to that, upstream landed two other patches: 41c73a49df31 dm bufio: drop the lock when doing GFP_NOIO allocation d12067f428c0 dm bufio: don't take the lock in dm_bufio_shrink_count We should probably take both of those into any kernels that the original fix landed in.
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/21eafe1899ac3ed441fbc5ea9908a5788f4bbd72 commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed Nov 23 22:04:00 2016 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. BUG= chromium:675021 TEST=See bug Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c) Reviewed-on: https://chromium-review.googlesource.com/419680 Reviewed-by: Guenter Roeck <groeck@chromium.org> [modify] https://crrev.com/21eafe1899ac3ed441fbc5ea9908a5788f4bbd72/drivers/md/dm-bufio.c
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f1a7c2e30c46db262972d7e4d608d2bf77f375da commit f1a7c2e30c46db262972d7e4d608d2bf77f375da Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed Nov 23 21:52:01 2016 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. BUG= chromium:675021 TEST=See bug Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3 Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a) Reviewed-on: https://chromium-review.googlesource.com/421129 Reviewed-by: Guenter Roeck <groeck@chromium.org> [modify] https://crrev.com/f1a7c2e30c46db262972d7e4d608d2bf77f375da/drivers/md/dm-bufio.c
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/01c78142cf558105a5d3036a809b7a67f29d39bb commit 01c78142cf558105a5d3036a809b7a67f29d39bb Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed Nov 23 21:52:01 2016 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. BUG= chromium:675021 TEST=See bug Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3 Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a) Reviewed-on: https://chromium-review.googlesource.com/421129 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da) Reviewed-on: https://chromium-review.googlesource.com/423251 [modify] https://crrev.com/01c78142cf558105a5d3036a809b7a67f29d39bb/drivers/md/dm-bufio.c
,
Dec 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3a6653fd33b3d8b56fdbbefa974537c8e18c01b9 commit 3a6653fd33b3d8b56fdbbefa974537c8e18c01b9 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed Nov 23 22:04:00 2016 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. BUG= chromium:675021 TEST=See bug Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c) Reviewed-on: https://chromium-review.googlesource.com/419680 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72) Reviewed-on: https://chromium-review.googlesource.com/423250 [modify] https://crrev.com/3a6653fd33b3d8b56fdbbefa974537c8e18c01b9/drivers/md/dm-bufio.c
,
Dec 22 2016
OK, landed on 4.4 on ToT and R56. Dan: I picked to 3.18 too and am leaving it to you to test. I haven't landed the original version on 3.14 and older yet since the problem wasn't as obvious there and I haven't had enough time to test. Hopefully we can get that eventually...
,
Apr 13 2017
I tested 3.18 ToT (caroline) with and without these patches. Bottom line: I could not reproduce the hang with the patches, and I could reproduce it pretty quickly without the patches. The hang is pretty easy to achieve without the patch so it is a serious bug. I recommend merging this to 3.18 both into ToT and the next release branch (R58 I think).
,
Apr 13 2017
This bug requires manual review: We are only 11 days from stable. Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 13 2017
By the way, the merge request is a bit premature. We want to test for a few days on canaries first. Note that these patches have been in 4.4 for a while, although their impact may be different there because that code (the memory manager) keeps changing. Either Doug or I can merge when the time comes. (I am removing the Arch-MIPS tag... where did that come from???)
,
Apr 14 2017
,
Apr 14 2017
CLs for 3.18 in progress here: https://chromium-review.googlesource.com/c/423253/ https://chromium-review.googlesource.com/c/423252/
,
Apr 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f commit caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f Author: Mikulas Patocka <mpatocka@redhat.com> Date: Fri Apr 14 06:54:11 2017 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. BUG= chromium:675021 TEST=See bug Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c) Reviewed-on: https://chromium-review.googlesource.com/419680 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72) Reviewed-on: https://chromium-review.googlesource.com/423252 Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f/drivers/md/dm-bufio.c
,
Apr 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88c2b3e81afceeb741cbcefb290906b7103a9809 commit 88c2b3e81afceeb741cbcefb290906b7103a9809 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Fri Apr 14 06:54:14 2017 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. BUG= chromium:675021 TEST=See bug Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3 Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a) Reviewed-on: https://chromium-review.googlesource.com/421129 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da) Reviewed-on: https://chromium-review.googlesource.com/423253 Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/88c2b3e81afceeb741cbcefb290906b7103a9809/drivers/md/dm-bufio.c
,
Apr 14 2017
I cherry-picked the CLs to my build but still got kernel panics (see attached file). The good thing is dm_bufio_shrink_count is no longer present in the stack track, but hung_task watchdog still fires for some reason.
,
Apr 14 2017
Good job Ben. I think this is a new OOM kill deadlock, so I am opening another bug for it. I am also wondering how many of these deadlocks exist. We need to figure out why we're not sufficiently aggressive with tab discards and app kills. We should try to stay away from OOM kills for this and other reasons.
,
Apr 14 2017
@14: yes, please open a new bug. BTW: I've never heard of jdb2, but one of the top hits when I search is <https://access.redhat.com/solutions/96783>.
,
Apr 14 2017
,
Apr 14 2017
... and by the way, it's not necessarily related to OOM kills, although the load increase from high swap activity is likely a factor.
,
Apr 17 2017
I am seeing occasional stalls on ToT R60 which contains these CLs. The machine will freeze for a few seconds but can recover. Feedback is at https://feedback.corp.google.com/product/208/neutron?lView=rd&lRSort=1&lROrder=2&lRFilter=1&lReportSearch=bccheng&lReport=57782171053 And I also collected a perf profile. It shows heavy use of CPU cycles in swapper: # Overhead Command Shared Object Symbol # ........ ............... ............................ ..................................... # 18.79% swapper [kernel.kallsyms] [k] mwait_idle_with_hints.constprop.3 1.92% kswapd0 [kernel.kallsyms] [k] lzo1x_1_do_compress 0.83% swapper [unknown] [k] 0000000000000000 I'm fairly certain the stall is introduced in the past two weeks or so. I will roll back to an older version to see when the regression happens.
,
Apr 17 2017
I am wondering if mwait_idle_with_hints isn't just the CPU idle handler. Using 2% of CPU for compression should not be an issue.
,
Apr 17 2017
,
Apr 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/d3dcaa7ca2338901b03584095faf22e7a2c3ae69 commit d3dcaa7ca2338901b03584095faf22e7a2c3ae69 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Tue Apr 18 00:29:13 2017 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. BUG= chromium:675021 TEST=See bug Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c) Reviewed-on: https://chromium-review.googlesource.com/419680 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72) Reviewed-on: https://chromium-review.googlesource.com/423252 Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> (cherry picked from commit caaf42f1e98e1f3c8cda1bec20089c2b7f5ae59f) Reviewed-on: https://chromium-review.googlesource.com/479797 Commit-Queue: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/d3dcaa7ca2338901b03584095faf22e7a2c3ae69/drivers/md/dm-bufio.c
,
Apr 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3c2b9fbcd29b95809db4147760865b113a045c3c commit 3c2b9fbcd29b95809db4147760865b113a045c3c Author: Mikulas Patocka <mpatocka@redhat.com> Date: Tue Apr 18 00:29:17 2017 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. BUG= chromium:675021 TEST=See bug Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3 Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a) Reviewed-on: https://chromium-review.googlesource.com/421129 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da) Reviewed-on: https://chromium-review.googlesource.com/423253 Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> (cherry picked from commit 88c2b3e81afceeb741cbcefb290906b7103a9809) Reviewed-on: https://chromium-review.googlesource.com/479798 Commit-Queue: Luigi Semenzato <semenzato@chromium.org> [modify] https://crrev.com/3c2b9fbcd29b95809db4147760865b113a045c3c/drivers/md/dm-bufio.c
,
Apr 18 2017
#19 should we open a separate bug for this? Also, if it is easy to reproduce, do you have enough time for a triple alt-volup-X and then send feedback, so we can see if it's stuck in the kernel and what it's waiting for? Thanks!
,
Apr 18 2017
Your change meets the bar and is auto-approved for M59. Please go ahead and merge the CL to branch 3071 manually. Please contact milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), gkihumba@(ChromeOS), Abdul Syed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 18 2017
Sorry, I didn't wait for Merge-Approved after discussing with Chris, since there was some urgency to get this in before the nightly build. Hope it's OK.
,
May 2 2017
Are we still looking at this for 58?
,
May 2 2017
I will like the big changes for memory to be in 58 and I think we need this as well for stability. Luigi?
,
May 2 2017
If we take these into R58 they could affect anyone on a 3.18 kernel. At this point they're pretty well tested, but it's also very late for R58. People thought that they made a difference to how things performed at low memory time, so I guess it depends on how critical it is to make low memory work better in R58 timeframe.
,
May 2 2017
fyi Doug and I talked about doing this for 3.14 as well -- I was initially enthusiastic about pulling it to older kernels, however I looked through a bunch of kcrash reports for minnie and couldn't find a single instance so I'm inclined to just leave them alone now.
,
May 3 2017
As another data point, I looked at Caroline kernel crashes here: https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27%20OMIT%20RECORD%20IF%20SOME(ProductData.Key%3D%27exec_name%27%20AND%20ProductData.Value%3D%27kernel%27)%20%3D%200%20OR%20SOME(ProductData.Key%3D%27hwclass%27%20AND%20ProductData.Value%20contains%20%27CAROLINE%27)%20%3D%200&ignore_case=false&enable_rewrite=false&omit_field_name=&omit_field_value=&omit_field_opt= 33% of them are categorized as hung_tasks and of a random sampling of those, 4 of 14 showed some task in dm_bufio_shrink_count
,
May 3 2017
@31: That's definitely good data and matches what I saw when I was originally working on that patch. One thing I was never 100% sure about was how much it mattered that tasks were stuck in dm_bufio_shrink_count(). I guess that would prevent them from getting into the direct reclaim path, right? ...but I also always figured that maybe there were enough tasks working on that? In any case, it still seems good to get the fix...
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0d40ed0d87c400807553474b78105af577cb8b8e commit 0d40ed0d87c400807553474b78105af577cb8b8e Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed May 03 03:53:28 2017 UPSTREAM: dm bufio: drop the lock when doing GFP_NOIO allocation If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. BUG= chromium:675021 TEST=See bug Change-Id: Ifa33001b9c34d45dab2c82bbae1e21fe44a9fb06 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit 41c73a49df31151f4ff868f28fe4f129f113fa2c) Reviewed-on: https://chromium-review.googlesource.com/419680 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 21eafe1899ac3ed441fbc5ea9908a5788f4bbd72) Reviewed-on: https://chromium-review.googlesource.com/493918 [modify] https://crrev.com/0d40ed0d87c400807553474b78105af577cb8b8e/drivers/md/dm-bufio.c
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88c4c4ef0638af91e9fce1c498c88be26b67c3ed commit 88c4c4ef0638af91e9fce1c498c88be26b67c3ed Author: Mikulas Patocka <mpatocka@redhat.com> Date: Wed May 03 03:53:39 2017 UPSTREAM: dm bufio: don't take the lock in dm_bufio_shrink_count dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. BUG= chromium:675021 TEST=See bug Change-Id: If5902126ec3728de5b5a38eeb81dc82704d5ebc3 Suggested-by: David Rientjes <rientjes@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> (cherry picked from commit d12067f428c037b4575aaeb2be00847fc214c24a) Reviewed-on: https://chromium-review.googlesource.com/421129 Reviewed-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit f1a7c2e30c46db262972d7e4d608d2bf77f375da) Reviewed-on: https://chromium-review.googlesource.com/493919 [modify] https://crrev.com/88c4c4ef0638af91e9fce1c498c88be26b67c3ed/drivers/md/dm-bufio.c
,
May 3 2017
,
Jan 22 2018
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by diand...@chromium.org
, Dec 16 2016