New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 770980 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

lockdep splat on 4.4 in mali kbase

Project Member Reported by sonnyrao@chromium.org, Oct 2 2017

Issue description

R63

build kernel for kevin with USE=lockdebug

get a splat on boot:
[    6.156290] ======================================================
[    6.156296] [ INFO: possible circular locking dependency detected ]
[    6.156308] 4.4.86 #5 Not tainted
[    6.156314] -------------------------------------------------------
[    6.156321] kworker/u13:0/102 is trying to acquire lock:
[    6.156328]  ((&rcb->work)){+.+...}, at: [<ffffffc00023879c>] flush_work+0x28/0xb0
[    6.156361] 
               but task is already holding lock:
[    6.156368]  (&kctx->jctx.lock){+.+.+.}, at: [<ffffffc0005c5fac>] kbase_jd_done_worker+0x68/0x354
[    6.156394] 
               which lock already depends on the new lock.

[    6.156403] 
               the existing dependency chain (in reverse order) is:
[    6.156409] 
               -> #1 (&kctx->jctx.lock){+.+.+.}:
[    6.156427]        [<ffffffc000272a4c>] __lock_acquire+0xa80/0xc98
[    6.156441]        [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[    6.156451]        [<ffffffc000925d58>] __mutex_lock_common+0x98/0xa68
[    6.156466]        [<ffffffc000925cb0>] mutex_lock_nested+0x54/0x64
[    6.156476]        [<ffffffc0005c67c4>] resv_resource_dep_clear+0x30/0xc0
[    6.156490]        [<ffffffc0005802a0>] reservation_cb_work+0x28/0x34
[    6.156502]        [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[    6.156516]        [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[    6.156527]        [<ffffffc000241da8>] kthread+0xe4/0xf4
[    6.156539]        [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
[    6.156552] 
               -> #0 ((&rcb->work)){+.+...}:
[    6.156571]        [<ffffffc0002751dc>] print_circular_bug+0x50/0x1f0
[    6.156582]        [<ffffffc000272c54>] __lock_acquire+0xc88/0xc98
[    6.156592]        [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[    6.156602]        [<ffffffc0002387c0>] flush_work+0x4c/0xb0
[    6.156613]        [<ffffffc000238b30>] __cancel_work_timer+0x108/0x180
[    6.156623]        [<ffffffc000238a18>] cancel_work_sync+0x20/0x30
[    6.156634]        [<ffffffc000580674>] drm_reservation_cb_fini+0x20/0x34
[    6.156644]        [<ffffffc0005c4ea0>] jd_done_nolock+0x2b0/0x458
[    6.156655]        [<ffffffc0005c6130>] kbase_jd_done_worker+0x1ec/0x354
[    6.156665]        [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[    6.156676]        [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[    6.156686]        [<ffffffc000241da8>] kthread+0xe4/0xf4
[    6.156695]        [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
[    6.156706] 
               other info that might help us debug this:

[    6.156716]  Possible unsafe locking scenario:

[    6.156723]        CPU0                    CPU1
[    6.156728]        ----                    ----
[    6.156733]   lock(&kctx->jctx.lock);
[    6.156745]                                lock((&rcb->work));
[    6.156756]                                lock(&kctx->jctx.lock);
[    6.156767]   lock((&rcb->work));
[    6.156777] 
                *** DEADLOCK ***

[    6.156786] 3 locks held by kworker/u13:0/102:
[    6.156792]  #0:  ("mali_jd"){.+.+..}, at: [<ffffffc00023ccbc>] process_one_work+0x15c/0x5b8
[    6.156817]  #1:  ((&katom->work)){+.+...}, at: [<ffffffc00023cce4>] process_one_work+0x184/0x5b8
[    6.156842]  #2:  (&kctx->jctx.lock){+.+.+.}, at: [<ffffffc0005c5fac>] kbase_jd_done_worker+0x68/0x354
[    6.156867] 
               stack backtrace:
[    6.156879] CPU: 1 PID: 102 Comm: kworker/u13:0 Not tainted 4.4.86 #5
[    6.156886] Hardware name: Google Kevin (DT)
[    6.156898] Workqueue: mali_jd kbase_jd_done_worker
[    6.156906] Call trace:
[    6.156916] [<ffffffc000208874>] dump_backtrace+0x0/0x16c
[    6.156924] [<ffffffc000208868>] show_stack+0x20/0x2c
[    6.156936] [<ffffffc0004c21dc>] __dump_stack+0x20/0x28
[    6.156943] [<ffffffc0004c2180>] dump_stack+0x8c/0xc8
[    6.156952] [<ffffffc000275364>] print_circular_bug+0x1d8/0x1f0
[    6.156961] [<ffffffc000272c54>] __lock_acquire+0xc88/0xc98
[    6.156969] [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[    6.156977] [<ffffffc0002387c0>] flush_work+0x4c/0xb0
[    6.156985] [<ffffffc000238b30>] __cancel_work_timer+0x108/0x180
[    6.156992] [<ffffffc000238a18>] cancel_work_sync+0x20/0x30
[    6.157000] [<ffffffc000580674>] drm_reservation_cb_fini+0x20/0x34
[    6.157009] [<ffffffc0005c4ea0>] jd_done_nolock+0x2b0/0x458
[    6.157017] [<ffffffc0005c6130>] kbase_jd_done_worker+0x1ec/0x354
[    6.157025] [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[    6.157033] [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[    6.157041] [<ffffffc000241da8>] kthread+0xe4/0xf4
[    6.157048] [<ffffffc000203dd0>] ret_from_fork+0x10/0x40


 
Owner: dbehr@chromium.org
Status: Assigned (was: Untriaged)
Cc: tfiga@chromium.org
Yeah, I sure wish this would get fixed.  This is 100% a duplicate of b/35586182.  From March 29th:

> Will get fixed when we will get new kbase (kernel) mali driver 
> from ARM. Not blocking


Comment 3 by dbehr@chromium.org, Oct 2 2017

duplicate of b/35586182
waiting for mali kbase update from ARM which is supposed to fix it.
Project Member

Comment 4 by bugdroid1@chromium.org, Oct 10 2017

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7a165fd2907b259fefe06adcfaa0a8bc84e97f97

commit 7a165fd2907b259fefe06adcfaa0a8bc84e97f97
Author: Dominik Behr <dbehr@chromium.org>
Date: Tue Oct 10 03:07:56 2017

CHROMIUM: mali: run dep_clear from jd_done to avoid deadlock

When jd_done_nolock/post_external_resources cleans up katom's reservation
callback and waits for worker to finish ctx lock is held, which the worker
needs to complete resulting in a deadlock.

We fix it here by using an atomic variable to synchronize with the worker and
running worker's payload from cleanup inside the locked section and converting
the worker to use mutex_trylock. So if the worker has not run its payload by the
time the katom is being cleaned up, the payload s going to run from cleanup and
the worker will do nothing.

BUG=b:35586182, chromium:770980 
TEST=boot kevin with kernel build with USE=lockdebug

Change-Id: I7f67a6521fc59117e58bb61d9d49e329781120a0
Signed-off-by: Dominik Behr <dbehr@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/706104
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>

[modify] https://crrev.com/7a165fd2907b259fefe06adcfaa0a8bc84e97f97/drivers/gpu/arm/midgard/mali_kbase_jd.c
[modify] https://crrev.com/7a165fd2907b259fefe06adcfaa0a8bc84e97f97/drivers/gpu/arm/midgard/mali_kbase_defs.h

Status: Fixed (was: Assigned)

Sign in to add a comment