New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 900598 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 8
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

kernel-4.19: make kgdb work better

Project Member Reported by diand...@chromium.org, Oct 31

Issue description

Tracking bug for taking fixes for kgdb to 4.19.  Specifically we want these:

3bd67b37e350 kdb: print real address of pointers instead of hashed addresses
a0ca72c2d1ac kdb: use correct pointer when 'btc' calls 'btt'

...and these (which can be found on lore.kernel.org/patchwork/patch/ID):

| 1004670 | [v2,2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_f...
| 1004669 | [v2,1/2] kgdb: Remove irq flags from roundup
 
In theory we'll get the kgdb fixes that already landed from stable, but I use kgdb all the time.  Hopefully it's OK to land them ahead of time?

remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1310053 FROMGIT: kdb: use correct pointer when 'btc' calls 'btt'        
remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1310054 FROMGIT: kdb: print real address of pointers instead of hashed addresses 
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 1

Labels: merge-merged-chromeos-4.19
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a0069d4ff25ca6bb7c93c4e3ffabb1aee7188605

commit a0069d4ff25ca6bb7c93c4e3ffabb1aee7188605
Author: Christophe Leroy <christophe.leroy@c-s.fr>
Date: Thu Nov 01 00:38:22 2018

FROMGIT: kdb: use correct pointer when 'btc' calls 'btt'

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74d2469d9b82 ("printk: hash addresses printed with %p")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: <stable@vger.kernel.org> # 4.15+
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
(cherry picked from commit a0ca72c2d1ac83d0853a23ffde8f3624648b1ee8
 git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git kgdb-next)

BUG= chromium:900598 
TEST=kgdb works better

Change-Id: I5ecb3860244f98d1ae17edcd7d95398b101ca85a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1310053
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/a0069d4ff25ca6bb7c93c4e3ffabb1aee7188605/kernel/debug/kdb/kdb_bt.c

Project Member

Comment 3 by bugdroid1@chromium.org, Nov 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/84d58e0c2625dfc93fb2cd8c75fbd8f46309d03b

commit 84d58e0c2625dfc93fb2cd8c75fbd8f46309d03b
Author: Christophe Leroy <christophe.leroy@c-s.fr>
Date: Thu Nov 01 00:38:24 2018

FROMGIT: kdb: print real address of pointers instead of hashed addresses

Since commit ad67b74d2469 ("printk: hash addresses printed with %p"),
all pointers printed with %p are printed with hashed addresses
instead of real addresses in order to avoid leaking addresses in
dmesg and syslog. But this applies to kdb too, with is unfortunate:

    Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
    kdb> ps
    15 sleeping system daemon (state M) processes suppressed,
    use 'ps A' to see all.
    Task Addr       Pid   Parent [*] cpu State Thread     Command
    0x(ptrval)      329      328  1    0   R  0x(ptrval) *sh

    0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
    0x(ptrval)        3        2  0    0   D  0x(ptrval)  rcu_gp
    0x(ptrval)        4        2  0    0   D  0x(ptrval)  rcu_par_gp
    0x(ptrval)        5        2  0    0   D  0x(ptrval)  kworker/0:0
    0x(ptrval)        6        2  0    0   D  0x(ptrval)  kworker/0:0H
    0x(ptrval)        7        2  0    0   D  0x(ptrval)  kworker/u2:0
    0x(ptrval)        8        2  0    0   D  0x(ptrval)  mm_percpu_wq
    0x(ptrval)       10        2  0    0   D  0x(ptrval)  rcu_preempt

The whole purpose of kdb is to debug, and for debugging real addresses
need to be known. In addition, data displayed by kdb doesn't go into
dmesg.

This patch replaces all %p by %px in kdb in order to display real
addresses.

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
(cherry picked from commit 3bd67b37e35069b43cc5de8d138727360c503e8f
 git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git kgdb-next)

BUG= chromium:900598 
TEST=kgdb works better

Change-Id: Ied1e96b6e07b7f6097d77ed1fbb0ec1155261edb
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1310054
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/84d58e0c2625dfc93fb2cd8c75fbd8f46309d03b/kernel/debug/kdb/kdb_support.c
[modify] https://crrev.com/84d58e0c2625dfc93fb2cd8c75fbd8f46309d03b/kernel/debug/kdb/kdb_main.c

Patches on lore.kernel.org/patchwork/patch/ID:

| 1011511 | [v4,4/4] kdb: Don't back trace on a cpu that didn't round up             
| 1011510 | [v4,3/4] kgdb: Don't round up a CPU that failed rounding up before       
| 1011509 | [v4,2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_f... 
| 1011508 | [v4,1/4] kgdb: Remove irq flags from roundup                             

remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1399182 UPSTREAM: kgdb: Remove irq flags from roundup        
remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1399183 UPSTREAM: kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()        
remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1399184 UPSTREAM: kgdb: Don't round up a CPU that failed rounding up before        
remote:   https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1399185 UPSTREAM: kdb: Don't back trace on a cpu that didn't round up        

...if anyone wants to rubberstamp the upstream picks then feel free.  Otherwise I'll throw a dart board and select a volunteer / victim.  ;-P
Project Member

Comment 7 by bugdroid1@chromium.org, Jan 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/0ece7052207e9850410f83742b6de1622e853159

commit 0ece7052207e9850410f83742b6de1622e853159
Author: Douglas Anderson <dianders@chromium.org>
Date: Tue Jan 08 03:40:17 2019

UPSTREAM: kgdb: Remove irq flags from roundup

The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
(cherry picked from commit 9ef7fa507d6b53a96de4da3298c5f01bde603c0a)

BUG= chromium:900598 
TEST=Run kgdb w/ lockdep

Change-Id: Ifb3d236dfcd96e423d63f40a6a4df2f31a941b1c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1399182
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/kernel/debug/debug_core.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/powerpc/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/x86/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/include/linux/kgdb.h
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/sparc/kernel/smp_64.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/hexagon/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/arm/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/mips/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/arc/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/arm64/kernel/kgdb.c
[modify] https://crrev.com/0ece7052207e9850410f83742b6de1622e853159/arch/sh/kernel/kgdb.c

Project Member

Comment 8 by bugdroid1@chromium.org, Jan 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/c8204a0da80093d47eba8c4b245ab8fe9335a63b

commit c8204a0da80093d47eba8c4b245ab8fe9335a63b
Author: Douglas Anderson <dianders@chromium.org>
Date: Tue Jan 08 03:40:18 2019

UPSTREAM: kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system.  Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)

Specifically it looked like this:
  sysrq: SysRq : DEBUG
  ------------[ cut here ]------------
  DEBUG_LOCKS_WARN_ON(current->hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs.  Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants.  I've attempted to keep the variants
working like they used to.  Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
  get_irq_regs().
* For arch/mips there was a bit of extra code around
  kgdb_nmicallback()

NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before.  We'll try to round it up
again and potentially hang when we try to grab the csd lock.  That's
not new behavior but we'll still try to do better in a future patch.

Suggested-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
(cherry picked from commit 3cd99ac3559855f69afbc1d5080e17eaa12394ff)

BUG= chromium:900598 
TEST=Run kgdb w/ lockdep

Change-Id: Ibcc11eaa83810fe0e62ef8dd81b9c3dc69056348
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1399183
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/powerpc/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/include/linux/kgdb.h
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/kernel/debug/debug_core.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/hexagon/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/arm/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/mips/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/arc/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/arm64/kernel/kgdb.c
[modify] https://crrev.com/c8204a0da80093d47eba8c4b245ab8fe9335a63b/arch/sh/kernel/kgdb.c

Project Member

Comment 9 by bugdroid1@chromium.org, Jan 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/c22884316477c90262f4418ea53995e930f11e82

commit c22884316477c90262f4418ea53995e930f11e82
Author: Douglas Anderson <dianders@chromium.org>
Date: Tue Jan 08 03:40:20 2019

UPSTREAM: kgdb: Don't round up a CPU that failed rounding up before

If we're using the default implementation of kgdb_roundup_cpus() that
uses smp_call_function_single_async() we can end up hanging
kgdb_roundup_cpus() if we try to round up a CPU that failed to round
up before.

Specifically smp_call_function_single_async() will try to wait on the
csd lock for the CPU that we're trying to round up.  If the previous
round up never finished then that lock could still be held and we'll
just sit there hanging.

There's not a lot of use trying to round up a CPU that failed to round
up before.  Let's keep a flag that indicates whether the CPU started
but didn't finish to round up before.  If we see that flag set then
we'll skip the next round up.

In general we have a few goals here:
- We never want to end up calling smp_call_function_single_async()
  when the csd is still locked.  This is accomplished because
  flush_smp_call_function_queue() unlocks the csd _before_ invoking
  the callback.  That means that when kgdb_nmicallback() runs we know
  for sure the the csd is no longer locked.  Thus when we set
  "rounding_up = false" we know for sure that the csd is unlocked.
- If there are no timeouts rounding up we should never skip a round
  up.

NOTE #1: In general trying to continue running after failing to round
up CPUs doesn't appear to be supported in the debugger.  When I
simulate this I find that kdb reports "Catastrophic error detected"
when I try to continue.  I can overrule and continue anyway, but it
should be noted that we may be entering the land of dragons here.
Possibly the "Catastrophic error detected" was added _because_ of the
future failure to round up, but even so this is an area of the code
that hasn't been strongly tested.

NOTE #2: I did a bit of testing before and after this change.  I
introduced a 10 second hang in the kernel while holding a spinlock
that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...".

Before this change if I did:
- Invoke hang
- Enter debugger
- g (which warns about Catastrophic error, g again to go anyway)
- g
- Enter debugger

...I'd hang the rest of the 10 seconds without getting a debugger
prompt.  After this change I end up in the debugger the 2nd time after
only 1 second with the standard warning about 'Timed out waiting for
secondary CPUs.'

I'll also note that once the CPU finished waiting I could actually
debug it (aka "btc" worked)

I won't promise that everything works perfectly if the errant CPU
comes back at just the wrong time (like as we're entering or exiting
the debugger) but it certainly seems like an improvement.

NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in
kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some
implementations override kgdb_call_nmi_hook().  It shouldn't hurt to
have it in kgdb_nmicallback() in any case.

NOTE #4: this logic is really only needed because there is no API call
like "smp_try_call_function_single_async()" or "smp_csd_is_locked()".
If such an API existed then we'd use it instead, but it seemed a bit
much to add an API like this just for kgdb.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
(cherry picked from commit 87b095928584da7d5cb3149016f00b0b139c2292)

BUG= chromium:900598 
TEST=Run kgdb w/ lockdep

Change-Id: I31c348ee31972fba9837dea1618f5762abe890f0
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1399184
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/c22884316477c90262f4418ea53995e930f11e82/kernel/debug/debug_core.h
[modify] https://crrev.com/c22884316477c90262f4418ea53995e930f11e82/kernel/debug/debug_core.c

Project Member

Comment 10 by bugdroid1@chromium.org, Jan 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45ec9741c205df0442b447885c9fe033c3130bf8

commit 45ec9741c205df0442b447885c9fe033c3130bf8
Author: Douglas Anderson <dianders@chromium.org>
Date: Tue Jan 08 03:40:21 2019

UPSTREAM: kdb: Don't back trace on a cpu that didn't round up

If you have a CPU that fails to round up and then run 'btc' you'll end
up crashing in kdb becaue we dereferenced NULL.  Let's add a check.
It's wise to also set the task to NULL when leaving the debugger so
that if we fail to round up on a later entry into the debugger we
won't backtrace a stale task.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
(cherry picked from commit 162bc7f5afd75b72acbe3c5f3488ef7e64a3fe36)

BUG= chromium:900598 
TEST=Run kgdb w/ lockdep

Change-Id: I15301ed573b6d83a3f7e7c5392bc3689396c43d4
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1399185
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/45ec9741c205df0442b447885c9fe033c3130bf8/kernel/debug/kdb/kdb_debugger.c
[modify] https://crrev.com/45ec9741c205df0442b447885c9fe033c3130bf8/kernel/debug/kdb/kdb_bt.c
[modify] https://crrev.com/45ec9741c205df0442b447885c9fe033c3130bf8/kernel/debug/debug_core.c

Status: Fixed (was: Untriaged)

Sign in to add a comment