New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 630015 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Can't use kgdb to debug ARM64 kernel

Project Member Reported by diand...@chromium.org, Jul 20 2016

Issue description

I've been seeing this for a few months now and keep hoping to find time to dig more, but I haven't.  I'm hoping someone on the toolchain team can help (pretty please?)

I'm using rk3399-gru and I drop into the kernel debugger (kgdb) by:
* Building the kernel with USE="kgdb"
* Adding "kgdboc=ttyS2" to the kernel command line arguments.
* Dropping into the debugger with "SysRq-g" (echo g > /proc/sysrq-trigger)

I tend to run with agent-proxy to get a mux between by console the kgdb like this ($1 indicates the device number):
  agent-proxy "127.0.0.1:$((5500 + $1 * 10))^127.0.0.1:$((5501 + $1 * 10))" 0 "${DUT_RESULT//*:/}",115200&

...but I can reproduce these same problems without agent-proxy.

I connect GDB like:
  aarch64-cros-linux-gnu-gdb \
    /build/${BOARD}/usr/lib/debug/boot/vmlinux \
    -ex "set remotebaud 115200" \
    -ex "target remote localhost:5541"

---

All the above worked with arm32 (though using armv7a-cros-linux-gnueabi-gdb instead of aarch64-cros-linux-gnu-gdb).  When doing it with arm64 I get:

Remote 'g' packet reply is too long: 8889e501c0ffffff010000000000000000000000000000009089e501c0ffffff0000000000000000010000000000000084292800c0ffffff00000000000000002c37e001c0ffffff7f7f7f7f7f7f7f7f6b6b6466606b1f637f7f7f7f7f7f7f7f01010101010101010800000000000000feffffffffffff0f0000000000000000cc223900c0ffffff0000000000000000000000000000000000100a01c0fffffff8960a01c0ffffff00200a01c0ffffff670000000000000001000000000000000700000000000000180000000000000000639cedc0ffffff0050a000c0ffffff0000000000000000a03a0801c0ffffff04cf2d00c0ffffffa03a0801c0ffffff64ce2d00c0ffffffc

---

Anyone on the toolchain team have any ideas?  This would be hugely helpful to get working, since kgdb has proven to be invaluable in the past.

---

If anyone has gotten kgdb working on arm64 before (or failed) in either the kernel or coreboot, I'd be interested to hear about that too...
 
My first blind guess would be... have you tried compiling a newer GDB from upstream sources? IIRC 'g' is the command to dump registers, and if the response is longer than expected maybe there's a protocol version incompatibility between the kernel and your GDB.

We still don't have ARM64 GDB support in firmware (depthcharge) because nobody really had time to look at it yet (and also because despite the original huge fuss about needing a command line and hackability in firmware, I think actual usage of the feature has been nearly non-existent). The few times I've tried to look at it I've been scared away pretty quickly by the lack of protocol documentation from GDB (especially on arch-specific details like the register format, which you seem to be running into here).
No, I haven't tried compiling from upstream source and I was hoping to avoid learning how to build gdb.  ;)  I agree that this is probably the problem, though, since searching the web for "Remote 'g' packet reply is too long" shows mostly that people fixed it by properly configuring gdb.

...that's why I'm hoping the toolchain team can come up with a fix since they presumably have all the expertise to build / configure gdb.

If folks don't want to try to get setup for kgdb, I'm happy to test any binaries that someone wanted to provide...
Owner: cmt...@chromium.org
Status: Assigned (was: Untriaged)

Comment 4 by cmt...@chromium.org, Jul 20 2016

Just double checking:

If you try the following (notice the extra line)

  aarch64-cros-linux-gnu-gdb \
    /build/${BOARD}/usr/lib/debug/boot/vmlinux \
    -ex "set remotebaud 115200" \
    -ex "target remote localhost:5541" \
    -ex "set arch aarch64"

do you still see the same problem?
@4: yes

$   aarch64-cros-linux-gnu-gdb \
>     /build/${BOARD}/usr/lib/debug/boot/vmlinux \
>     -ex "set remotebaud 115200" \
>     -ex "target remote localhost:5541" \
>     -ex "set arch aarch64"
GNU gdb (Chromium OS 7.11.20160511 vanilla) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-cros-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://crbug.com/new>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /build/kevin/usr/lib/debug/boot/vmlinux...done.
No symbol "remotebaud" in current context.
Remote debugging using localhost:5541
Remote 'g' packet reply is too long: 8889e501c0ffffff010000000000000000000000000000009089e501c0ffffff0000000000000000010000000000000084292800c0ffffff00000000000000002c37e001c0ffffff7f7f7f7f7f7f7f7f6b6b6466606b1f637f7f7f7f7f7f7f7f01010101010101010800000000000000feffffffffffff0f0000000000000000cc223900c0ffffff0000000000000000000000000000000000100a01c0fffffff8960a01c0ffffff00200a01c0ffffff670000000000000001000000000000000700000000000000180000000000000000639cedc0ffffff0050a000c0ffffff0000000000000000a03a0801c0ffffff04cf2d00c0ffffffa03a0801c0ffffff64ce2d00c0ffffffc
The target architecture is assumed to be aarch64
Also: in case it's helpful:

gdb) show configuration
This GDB was configured as follows:
   configure --host=x86_64-pc-linux-gnu --target=aarch64-cros-linux-gnu
             --with-auto-load-dir=$debugdir:$datadir/auto-load
             --with-auto-load-safe-path=$debugdir:$datadir/auto-load
             --with-expat
             --with-gdb-datadir=/usr/share/gdb/aarch64-cros-linux-gnu (relocatable)
             --with-jit-reader-dir=/usr/lib64/gdb (relocatable)
             --without-libunwind-ia64
             --without-lzma
             --with-python=/usr (relocatable)
             --without-guile
             --with-separate-debug-dir=/usr/lib/debug (relocatable)
             --with-sysroot=/usr/aarch64-cros-linux-gnu (relocatable)
             --without-babeltrace

Comment 7 by cmt...@chromium.org, Jul 20 2016

Ok, I'll get to work on this.

Comment 8 by cmt...@chromium.org, Jul 21 2016

What did you do on the Chromebook (what process or program do are you running gdbserver on) that you are trying to connect to with this?  I'm trying to reproduce your problem...
@8: It's not at all trivial, unfortunately.  Are you setup to build a custom kernel?  Do you have an arm64 device to work on?  Do you have a servo?

The setup here is to use kgdb (the kernel debugger), which presents itself as a gdb server over a serial port.  The way I get a TCP/IP connection is that I have something that reads the serial port, demuxes kgdb / console, and then listens on TCP/IP.

I haven't yet managed to build an arm64 executable that will run on the device to test whether the normal gdbserver will work...
I have an oak board I can test with; I don't have a servo.  I can build a custom kernel if I have to.

@10: To test kgdb you'll need a servo since it only works over serial console.

...alternatively if you can find some way to get a tiny arm64 userspace executable running on oak, that would be interesting.  I think I tried that quickly and I couldn't figure out how to get it to run.  If the arm64 userspace app works then that would prove that it's the kernel's fault and then I'll debug it myself.  I presume that the kernel interface is working though since I checked upstream in the kernel sources and saw no patches I could find...
> If the arm64 userspace app works then that would prove that it's the kernel's fault and then I'll debug it myself.

It might just prove that the kernel is expecting a different protocol version than the CrOS toolchain (which would be both the GDB you're running on the host and the server you're running in userspace on the DUT).
I have attached a patched & gzip'ed version of gdb that *might* fix your problem.
Please download  & unzip it, and replace the /usr/bin/aarch64-cros-linux-gnu-gdb in your chroot with this, then let me know if this fixes your problem or not.  If it does then I will create a CL with the gdb patch in it.

I have not been able, so far, to create a set up for reproducing your problem (what *board* are you doing this with?) so I can't test the patch myself.
aarch64-cros-linux-gnu-gdb.gz
2.3 MB Download
I will test shortly.  Getting prepped for an interview now.  I am working with kevin and gru.
@13: Yup, at least it attaches now and super quick tests show that it works.  You rock!  Thanks!

What did you do?

===

/b/tip/aarch64-cros-linux-gnu-gdb          /build/${BOARD}/usr/lib/debug/boot/
  -ex "target remote localhost:5541"
GNU gdb (Chromium OS 7.11.20160511 vanilla) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-cros-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://crbug.com/new>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /build/kevin/usr/lib/debug/boot/vmlinux...done.
No symbol "remotebaud" in current context.
Remote debugging using localhost:5541
arch_kgdb_breakpoint () at /mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/include/asm/kgdb.h:32
32              asm ("brk %0" : : "I" (KGDB_COMPILED_DBG_BRK_IMM));
(gdb) bt
#0  arch_kgdb_breakpoint () at /mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/include/asm/kgdb.h:32
#1  kgdb_breakpoint () at /mnt/host/source/src/third_party/kernel/v4.4/kernel/debug/debug_core.c:1071
#2  0xffffffc0002dcee4 in sysrq_handle_dbg (key=<optimized out>) at /mnt/host/source/src/third_party/kernel/v4.4/kernel/debug/debug_core.c:825
#3  0xffffffc00058ee34 in __handle_sysrq (key=103, check_mask=true) at /mnt/host/source/src/third_party/kernel/v4.4/drivers/tty/sysrq.c:606
#4  0xffffffc00058ef78 in handle_sysrq (key=103) at /mnt/host/source/src/third_party/kernel/v4.4/drivers/tty/sysrq.c:635
#5  0xffffffc00059537c in uart_handle_sysrq_char (port=<optimized out>, ch=<optimized out>) at /mnt/host/source/src/third_party/kernel/v4.4/include/l
#6  serial8250_rx_chars (up=0xffffffc001ebca08 <serial8250_ports+1216>, lsr=97 'a') at /mnt/host/source/src/third_party/kernel/v4.4/drivers/tty/seria
#7  0xffffffc000596c40 in serial8250_handle_irq (port=0xffffffc001ebca08 <serial8250_ports+1216>, iir=204) at /mnt/host/source/src/third_party/kernel
#8  0xffffffc00059c5f0 in dw8250_handle_irq (p=0xffffffc001ebca08 <serial8250_ports+1216>) at /mnt/host/source/src/third_party/kernel/v4.4/drivers/tt
#9  0xffffffc000593b9c in serial8250_interrupt (irq=24, dev_id=0xffffffc0e6c27e00) at /mnt/host/source/src/third_party/kernel/v4.4/drivers/tty/serial
#10 0xffffffc000284858 in handle_irq_event_percpu (desc=0xffffffc0ed9e2300) at /mnt/host/source/src/third_party/kernel/v4.4/kernel/irq/handle.c:146
#11 0xffffffc000284b44 in handle_irq_event (desc=0xffffffc0ed9e2300) at /mnt/host/source/src/third_party/kernel/v4.4/kernel/irq/handle.c:194
#12 0xffffffc0002885e0 in handle_fasteoi_irq (desc=0xffffffc0ed9e2300) at /mnt/host/source/src/third_party/kernel/v4.4/kernel/irq/chip.c:551
#13 0xffffffc000283d64 in generic_handle_irq_desc (desc=<optimized out>) at /mnt/host/source/src/third_party/kernel/v4.4/include/linux/irqdesc.h:140
#14 generic_handle_irq (irq=24) at /mnt/host/source/src/third_party/kernel/v4.4/kernel/irq/irqdesc.c:350
#15 0xffffffc0002840fc in __handle_domain_irq (domain=0xffffffc000029800, hwirq=<optimized out>, lookup=true, regs=<optimized out>)
    at /mnt/host/source/src/third_party/kernel/v4.4/kernel/irq/irqdesc.c:387
#16 0xffffffc0002006f4 in handle_domain_irq (regs=<optimized out>, hwirq=<optimized out>, domain=<optimized out>) at /mnt/host/source/src/third_party
#17 gic_handle_irq (regs=0xffffffc00107fdb0 <init_thread_union+15792>) at /mnt/host/source/src/third_party/kernel/v4.4/drivers/irqchip/irq-gic-v3.c:3
#18 0xffffffc0002035ac in el1_irq () at /mnt/host/source/src/third_party/kernel/v4.4/arch/arm64/kernel/entry.S:361

I found a tentative patch that had been submitted upstream (but not committed) and applied it to our gdb.  I don't like the patch very much because it fixes the symptom rather than the root problem, but at least it got you unstuck.  I will create a CL to push it into our GDB, along with creating a new issue to track down the root cause.
Project Member

Comment 17 by bugdroid1@chromium.org, Jul 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/a9f03d140ccd45d285f33dd54607a6b88ddb3840

commit a9f03d140ccd45d285f33dd54607a6b88ddb3840
Author: Caroline Tice <cmtice@google.com>
Date: Mon Jul 25 22:24:50 2016

[gdb] Patch GDB with patch from upstream to fix remote packet issue.

This applies the patch from
https://sourceware.org/bugzilla/show_bug.cgi?id=13984
to fix an issue with remote kernel debugging with aarch64.  The
patch has not been accepted upstream, which is why we apply it
here rather than in the actual gdb source repository.

BUG= chromium:630015 
TEST=Built gdb with this CL; gave patched GDB binary to person
with issue and it fixed the problem.

Change-Id: Ib014858c9d3fbe6de0a67bc13db5445e8251dc0c
Reviewed-on: https://chromium-review.googlesource.com/362984
Commit-Ready: Caroline Tice <cmtice@chromium.org>
Tested-by: Caroline Tice <cmtice@chromium.org>
Reviewed-by: Yunlian Jiang <yunlian@chromium.org>

[add] https://crrev.com/a9f03d140ccd45d285f33dd54607a6b88ddb3840/sys-devel/gdb/files/gdb-7.11-remote-arm64.patch
[rename] https://crrev.com/a9f03d140ccd45d285f33dd54607a6b88ddb3840/sys-devel/gdb/gdb-7.11.20160511-r1.ebuild

Status: Fixed (was: Assigned)
Labels: VerifyIn-54
Status: Verified (was: Fixed)
bulk verified
BTW: I think we're finally going to pick a true fix for this in the kernel.  See

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/959554/

I verified that I can still use the arm64 gdb in the chroot even with that fix applied, so I'm going to land it on 4.4 (since it avoids a merge conflict with a future patch that I want to land).  Yell if you think that's wrong.
Just curious, did you test removing the fix in comment 17 to see if your new fix really fixes that issue?
1. Went to src/third_party/chromiumos-overlay/sys-devel/gdb
2. Renamed:    gdb-8.0.1.20171030-r3.ebuild -> gdb-8.0.1.20171030-r4.ebuild
3. Changed:

diff --git a/sys-devel/gdb/gdb-8.0.1.20171030.ebuild b/sys-devel/gdb/gdb-8.0.1.20171030.ebuild
index b75a8abe7206..4a6e829bf8a3 100644
--- a/sys-devel/gdb/gdb-8.0.1.20171030.ebuild
+++ b/sys-devel/gdb/gdb-8.0.1.20171030.ebuild
@@ -75,7 +75,7 @@ src_unpack() {
 src_prepare() {
        [[ -n ${RPM} ]] && rpm_spec_epatch "${WORKDIR}"/gdb.spec
        ! use vanilla && [[ -n ${PATCH_VER} ]] && EPATCH_SUFFIX="patch" epatch "${WORKDIR}"/patch
-       epatch "${FILESDIR}"/gdb-8.0.1-remote-arm64.patch
+       #epatch "${FILESDIR}"/gdb-8.0.1-remote-arm64.patch
 
        default

=====

In chroot:

sudo emerge cross-aarch64-cros-linux-gnu/gdb

=====

Tried kevin without the above fix (to confirm I properly got rid of the patch).  Got:

Remote 'g' packet reply is too long (expected 788 bytes, got 792 bytes): 00102702c0ffffff01000000000000000000000000000000281c2702c0ffffff00404b01c0ffffff01000000000000000000000000000000107ac391c0ffffffc8a746

=====

Tried kevin with the above fix.  It worked.

=====

...so once we land the above fix, we can probably kill the old gdb workaround.
Project Member

Comment 24 by bugdroid1@chromium.org, Mar 14 2018

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e909bac3c55177b670bd370c41ed0f9b858bae99

commit e909bac3c55177b670bd370c41ed0f9b858bae99
Author: Daniel Thompson <daniel.thompson@linaro.org>
Date: Wed Mar 14 21:21:59 2018

UPSTREAM: arm64: kgdb: Match pstate size with gdbserver protocol

Current versions of gdb do not interoperate cleanly with kgdb on arm64
systems because gdb and kgdb do not use the same register description.
This patch modifies kgdb to work with recent releases of gdb (>= 7.8.1).

Compatibility with gdb (after the patch is applied) is as follows:

  gdb-7.6 and earlier  Ok
  gdb-7.7 series       Works if user provides custom target description
  gdb-7.8(.0)          Works if user provides custom target description
  gdb-7.8.1 and later  Ok

When commit 44679a4f142b ("arm64: KGDB: Add step debugging support") was
introduced it was paired with a gdb patch that made an incompatible
change to the gdbserver protocol. This patch was eventually merged into
the gdb sources:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=a4d9ba85ec5597a6a556afe26b712e878374b9dd

The change to the protocol was mostly made to simplify big-endian support
inside the kernel gdb stub. Unfortunately the gdb project released
gdb-7.7.x and gdb-7.8.0 before the protocol incompatibility was identified
and reversed:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bdc144174bcb11e808b4e73089b850cf9620a7ee

This leaves us in a position where kgdb still uses the no-longer-used
protocol; gdb-7.8.1, which restored the original behaviour, was
released on 2014-10-29.

I don't believe it is possible to detect/correct the protocol
incompatiblity which means the kernel must take a view about which
version of the gdb remote protocol is "correct". This patch takes the
view that the original/current version of the protocol is correct
and that version found in gdb-7.7.x and gdb-7.8.0 is anomalous.

Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>

BUG= chromium:821174 ,  chromium:630015 
TEST=kgdb still works OK

Change-Id: I80fad7279537c5ee3ec1735a26560a77e9fceb38
Signed-off-by: Douglas Anderson <dianders@chromium.org>
(cherry picked from commit 0d15ef677839dab8313fbb86c007c3175b638d03)
Reviewed-on: https://chromium-review.googlesource.com/959554
Reviewed-by: Caroline Tice <cmtice@chromium.org>

[modify] https://crrev.com/e909bac3c55177b670bd370c41ed0f9b858bae99/arch/arm64/include/asm/kgdb.h
[modify] https://crrev.com/e909bac3c55177b670bd370c41ed0f9b858bae99/arch/arm64/kernel/kgdb.c

Sign in to add a comment