In <https://lkml.kernel.org/r/20181114220744.GB4044@brain-police> Will pointed out that get_irq_regs() might return NULL. This can make certain things go boom in kdb / kgdb. Specifically if you simulate get_irq_regs() returning NULL then typing "cpu 1" in kdb goes boom.
We should try to find a way to make this more robust.
I _think_ one way would be to just use kgdb_breakpoint() on each of the rounded up CPUs. This would definitely give us some pt_regs() and would be robust.
...one problem is that having a kgdb_breakpoint() in kgdb_call_nmi_hook() seems to confuse kgdb, but presumably that could be worked around.
...another problem is that due to bug #908721 we can't backtrace past the IPI interrupt. That's not so great.
---
This isn't a terribly high priority issue since
A) It seems very uncommon for smp_call_function_single_async() to call its callback in a context other than from an IPI interrupt. As far as I can tell the only place we call the callback (assuming we aren't targetting the same CPU that calls smp_call_function_single_async()) is from flush_smp_call_function_queue(). ...and the only place that calls flush_smp_call_function_queue() besides the IPI interrupt is smpcfd_dying_cpu(). CPUs don't seem to go down too often and kgdb is already in a bit of trouble in such a case. See bug #908722
B) Even if get_irq_regs() returns NULL things don't go boom too fast. I can still drop in the debugger and backtrace on the crashing CPU. I can still go over to kgdb and debug the system with gdb.
---
It would still be nice to fix this.