New issue
Advanced search Search tips

Issue 908723 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

kgdb: handle NULL IRQ regs in kgdb_call_nmi_hook()

Project Member Reported by dianders@google.com, Nov 27

Issue description

In <https://lkml.kernel.org/r/20181114220744.GB4044@brain-police> Will pointed out that get_irq_regs() might return NULL.  This can make certain things go boom in kdb / kgdb.  Specifically if you simulate get_irq_regs() returning NULL then typing "cpu 1" in kdb goes boom.


We should try to find a way to make this more robust.


I _think_ one way would be to just use kgdb_breakpoint() on each of the rounded up CPUs.  This would definitely give us some pt_regs() and would be robust.

...one problem is that having a kgdb_breakpoint() in kgdb_call_nmi_hook() seems to confuse kgdb, but presumably that could be worked around.

...another problem is that due to bug #908721 we can't backtrace past the IPI interrupt.  That's not so great.

---

This isn't a terribly high priority issue since 


A) It seems very uncommon for smp_call_function_single_async() to call its callback in a context other than from an IPI interrupt.  As far as I can tell the only place we call the callback (assuming we aren't targetting the same CPU that calls smp_call_function_single_async()) is from flush_smp_call_function_queue().  ...and the only place that calls flush_smp_call_function_queue() besides the IPI interrupt is smpcfd_dying_cpu().  CPUs don't seem to go down too often and kgdb is already in a bit of trouble in such a case.  See bug #908722


B) Even if get_irq_regs() returns NULL things don't go boom too fast.  I can still drop in the debugger and backtrace on the crashing CPU.  I can still go over to kgdb and debug the system with gdb.

---

It would still be nice to fix this.
 

Sign in to add a comment