Debuggerd in disk-sleep state during engrave_tombstone |
|||
Issue descriptionWe are seeing lots of crashes similar to https://code.google.com/p/android/issues/detail?id=69107 on minnie in the hung_tasks bucket: https://goto.google.com/gzjul 75c1c77700000000 is one such report. The backtrace looks like: <3>[16188.280328] Freezing of tasks failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0): <6>[16188.280774] debuggerd D c06e2f4c 0 3622 3211 0x00000001 <5>[16188.281024] [<c06e2f4c>] (__schedule) from [<c06e32a8>] (schedule+0xa4/0xa8) <5>[16188.281171] [<c06e32a8>] (schedule) from [<c012d114>] (ptrace_trapping_sleep_fn+0x18/0x20) <5>[16188.281316] [<c012d114>] (ptrace_trapping_sleep_fn) from [<c06e383c>] (__wait_on_bit+0x64/0xb4) <5>[16188.281461] [<c06e383c>] (__wait_on_bit) from [<c06e391c>] (out_of_line_wait_on_bit+0x90/0xb4) <5>[16188.281596] [<c06e391c>] (out_of_line_wait_on_bit) from [<c012dad8>] (SyS_ptrace+0x2c4/0x508) <5>[16188.281733] [<c012dad8>] (SyS_ptrace) from [<c0106460>] (ret_fast_syscall+0x0/0x30) This may be a partial fix. It's a very simple change -- do you think it is worth applying? commit 7c3b00e06d731a28fc3d17ed02ba250642b15b81 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jan 20 14:59:55 2016 -0800 ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE. This doesn't really solve the problem(s) and we probably need to fix the freezer. In particular, note that this means that pm freezer will fail if it races attach-to-stopped-task. And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is not clear if we really need to hide this transition from debugger, WNOHANG after PTRACE_ATTACH can fail anyway if it races with SIGCONT. Another option is to add a crash bucket that matches "Freezing of tasks failed after" or maybe "SyS_ptrace".
,
Nov 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fa8a4fe6a7bc677b6bde6721007eb242d6c000f0 commit fa8a4fe6a7bc677b6bde6721007eb242d6c000f0 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jan 20 22:59:55 2016 BACKPORT: ptrace: make wait_on_bit(JOBCTL_TRAPPING_BIT) in ptrace_attach() killable ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE. This doesn't really solve the problem(s) and we probably need to fix the freezer. In particular, note that this means that pm freezer will fail if it races attach-to-stopped-task. And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is not clear if we really need to hide this transition from debugger, WNOHANG after PTRACE_ATTACH can fail anyway if it races with SIGCONT. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Roland McGrath <roland@hack.frob.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 7c3b00e06d731a28fc3d17ed02ba250642b15b81) (backport: fixed up conflict due to wait_on_bit function signature change) BUG=chromium:667899 TEST=suspend and resume a couple of times on minnie Change-Id: I85b0da4456623891e174a642417c5f40253f5c36 Reviewed-on: https://chromium-review.googlesource.com/413690 Commit-Ready: Kevin Cernekee <cernekee@chromium.org> Tested-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/fa8a4fe6a7bc677b6bde6721007eb242d6c000f0/kernel/ptrace.c
,
Jan 18 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by dgreid@chromium.org
, Nov 22 2016