lockdep splat on 4.4 in mali kbase |
||||
Issue description
R63
build kernel for kevin with USE=lockdebug
get a splat on boot:
[ 6.156290] ======================================================
[ 6.156296] [ INFO: possible circular locking dependency detected ]
[ 6.156308] 4.4.86 #5 Not tainted
[ 6.156314] -------------------------------------------------------
[ 6.156321] kworker/u13:0/102 is trying to acquire lock:
[ 6.156328] ((&rcb->work)){+.+...}, at: [<ffffffc00023879c>] flush_work+0x28/0xb0
[ 6.156361]
but task is already holding lock:
[ 6.156368] (&kctx->jctx.lock){+.+.+.}, at: [<ffffffc0005c5fac>] kbase_jd_done_worker+0x68/0x354
[ 6.156394]
which lock already depends on the new lock.
[ 6.156403]
the existing dependency chain (in reverse order) is:
[ 6.156409]
-> #1 (&kctx->jctx.lock){+.+.+.}:
[ 6.156427] [<ffffffc000272a4c>] __lock_acquire+0xa80/0xc98
[ 6.156441] [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[ 6.156451] [<ffffffc000925d58>] __mutex_lock_common+0x98/0xa68
[ 6.156466] [<ffffffc000925cb0>] mutex_lock_nested+0x54/0x64
[ 6.156476] [<ffffffc0005c67c4>] resv_resource_dep_clear+0x30/0xc0
[ 6.156490] [<ffffffc0005802a0>] reservation_cb_work+0x28/0x34
[ 6.156502] [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[ 6.156516] [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[ 6.156527] [<ffffffc000241da8>] kthread+0xe4/0xf4
[ 6.156539] [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
[ 6.156552]
-> #0 ((&rcb->work)){+.+...}:
[ 6.156571] [<ffffffc0002751dc>] print_circular_bug+0x50/0x1f0
[ 6.156582] [<ffffffc000272c54>] __lock_acquire+0xc88/0xc98
[ 6.156592] [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[ 6.156602] [<ffffffc0002387c0>] flush_work+0x4c/0xb0
[ 6.156613] [<ffffffc000238b30>] __cancel_work_timer+0x108/0x180
[ 6.156623] [<ffffffc000238a18>] cancel_work_sync+0x20/0x30
[ 6.156634] [<ffffffc000580674>] drm_reservation_cb_fini+0x20/0x34
[ 6.156644] [<ffffffc0005c4ea0>] jd_done_nolock+0x2b0/0x458
[ 6.156655] [<ffffffc0005c6130>] kbase_jd_done_worker+0x1ec/0x354
[ 6.156665] [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[ 6.156676] [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[ 6.156686] [<ffffffc000241da8>] kthread+0xe4/0xf4
[ 6.156695] [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
[ 6.156706]
other info that might help us debug this:
[ 6.156716] Possible unsafe locking scenario:
[ 6.156723] CPU0 CPU1
[ 6.156728] ---- ----
[ 6.156733] lock(&kctx->jctx.lock);
[ 6.156745] lock((&rcb->work));
[ 6.156756] lock(&kctx->jctx.lock);
[ 6.156767] lock((&rcb->work));
[ 6.156777]
*** DEADLOCK ***
[ 6.156786] 3 locks held by kworker/u13:0/102:
[ 6.156792] #0: ("mali_jd"){.+.+..}, at: [<ffffffc00023ccbc>] process_one_work+0x15c/0x5b8
[ 6.156817] #1: ((&katom->work)){+.+...}, at: [<ffffffc00023cce4>] process_one_work+0x184/0x5b8
[ 6.156842] #2: (&kctx->jctx.lock){+.+.+.}, at: [<ffffffc0005c5fac>] kbase_jd_done_worker+0x68/0x354
[ 6.156867]
stack backtrace:
[ 6.156879] CPU: 1 PID: 102 Comm: kworker/u13:0 Not tainted 4.4.86 #5
[ 6.156886] Hardware name: Google Kevin (DT)
[ 6.156898] Workqueue: mali_jd kbase_jd_done_worker
[ 6.156906] Call trace:
[ 6.156916] [<ffffffc000208874>] dump_backtrace+0x0/0x16c
[ 6.156924] [<ffffffc000208868>] show_stack+0x20/0x2c
[ 6.156936] [<ffffffc0004c21dc>] __dump_stack+0x20/0x28
[ 6.156943] [<ffffffc0004c2180>] dump_stack+0x8c/0xc8
[ 6.156952] [<ffffffc000275364>] print_circular_bug+0x1d8/0x1f0
[ 6.156961] [<ffffffc000272c54>] __lock_acquire+0xc88/0xc98
[ 6.156969] [<ffffffc000271fa0>] lock_acquire+0x20c/0x238
[ 6.156977] [<ffffffc0002387c0>] flush_work+0x4c/0xb0
[ 6.156985] [<ffffffc000238b30>] __cancel_work_timer+0x108/0x180
[ 6.156992] [<ffffffc000238a18>] cancel_work_sync+0x20/0x30
[ 6.157000] [<ffffffc000580674>] drm_reservation_cb_fini+0x20/0x34
[ 6.157009] [<ffffffc0005c4ea0>] jd_done_nolock+0x2b0/0x458
[ 6.157017] [<ffffffc0005c6130>] kbase_jd_done_worker+0x1ec/0x354
[ 6.157025] [<ffffffc00023ce70>] process_one_work+0x310/0x5b8
[ 6.157033] [<ffffffc00023c790>] worker_thread+0x190/0x2b0
[ 6.157041] [<ffffffc000241da8>] kthread+0xe4/0xf4
[ 6.157048] [<ffffffc000203dd0>] ret_from_fork+0x10/0x40
,
Oct 2 2017
Yeah, I sure wish this would get fixed. This is 100% a duplicate of b/35586182. From March 29th: > Will get fixed when we will get new kbase (kernel) mali driver > from ARM. Not blocking
,
Oct 2 2017
duplicate of b/35586182 waiting for mali kbase update from ARM which is supposed to fix it.
,
Oct 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7a165fd2907b259fefe06adcfaa0a8bc84e97f97 commit 7a165fd2907b259fefe06adcfaa0a8bc84e97f97 Author: Dominik Behr <dbehr@chromium.org> Date: Tue Oct 10 03:07:56 2017 CHROMIUM: mali: run dep_clear from jd_done to avoid deadlock When jd_done_nolock/post_external_resources cleans up katom's reservation callback and waits for worker to finish ctx lock is held, which the worker needs to complete resulting in a deadlock. We fix it here by using an atomic variable to synchronize with the worker and running worker's payload from cleanup inside the locked section and converting the worker to use mutex_trylock. So if the worker has not run its payload by the time the katom is being cleaned up, the payload s going to run from cleanup and the worker will do nothing. BUG=b:35586182, chromium:770980 TEST=boot kevin with kernel build with USE=lockdebug Change-Id: I7f67a6521fc59117e58bb61d9d49e329781120a0 Signed-off-by: Dominik Behr <dbehr@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/706104 Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> [modify] https://crrev.com/7a165fd2907b259fefe06adcfaa0a8bc84e97f97/drivers/gpu/arm/midgard/mali_kbase_jd.c [modify] https://crrev.com/7a165fd2907b259fefe06adcfaa0a8bc84e97f97/drivers/gpu/arm/midgard/mali_kbase_defs.h
,
Oct 11 2017
|
||||
►
Sign in to add a comment |
||||
Comment 1 by marc...@chromium.org
, Oct 2 2017Status: Assigned (was: Untriaged)