New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 765044 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Chameleon hangs on parallel accesses to /dev/ttyUSBs

Project Member Reported by alent@google.com, Sep 14 2017

Issue description

This happens when both a RN-42-EK and a Bluefruit LE Friend (v2) are connected to a hub connected to Chameleon's USB OTG port (in host mode). When we attempt to write to their serial consoles programmatically, Chameleon hangs until we manually pull out one dongle. Note that this hang only happens when we are accessing a different dongle. If we only ever access one dongle, things are fine, but if we run tests one after the other on each dongle, things are broken.
 

Comment 1 by alent@google.com, Sep 14 2017

Owner: alent@google.com

Comment 2 by alent@google.com, Sep 14 2017

They saw it on socfpga 3.8, but we're using socfpga 4.2.
I've tried both powered and unpowered hubs, that doesn't seem to help.

Comment 3 by alent@google.com, Sep 14 2017

Summary: Chameleon Bluetooth flows or underlying code hangs on certain accesses to USB serial ttys (was: Chameleon may have a kernel bug which causes lockups when doing certain USB accesses)
rjahagir@chromium.org and I are looking into a workaround.

Comment 4 by alent@google.com, Sep 15 2017

Description: Show this description

Comment 5 by alent@google.com, Sep 15 2017

Old description for reference:

Chameleon may have a kernel bug which causes lockups when doing certain USB accesses
Additionally, users of the socfpga kernel (on a Zynq instead of a Cyclone V) seem to occasionally have this problem as well.
https://lists.rocketboards.org/pipermail/rfi/2013-October/000711.html

The error dump looks like this, suggesting a hang in some usb-related code:
INFO: rcu_sched self-detected stall on CPU
 0: (2100 ticks this GP) idle=88d/140000000000002/0 softirq=719124/719124 fqs=0 
  (t=2100 jiffies g=172625 c=172624 q=610)
rcu_sched kthread starved for 2100 jiffies! g172625 c172624 f0x0
Task dump for CPU 0:
python          R running      0  3490   3232 0x00000002
[<c0017648>] (unwind_backtrace) from [<c0013424>] (show_stack+0x20/0x24)
[<c0013424>] (show_stack) from [<c004e064>] (sched_show_task+0xa4/0x100)
[<c004e064>] (sched_show_task) from [<c004feec>] (dump_cpu_task+0x48/0x4c)
[<c004feec>] (dump_cpu_task) from [<c006fe14>] (rcu_dump_cpu_stacks+0xa4/0xd0)
[<c006fe14>] (rcu_dump_cpu_stacks) from [<c0072d04>] (rcu_check_callbacks+0x4b4/0x8c0)
[<c0072d04>] (rcu_check_callbacks) from [<c0077564>] (update_process_times+0x40/0x6c)
[<c0077564>] (update_process_times) from [<c0087a70>] (tick_sched_timer+0x70/0x238)
[<c0087a70>] (tick_sched_timer) from [<c0077f88>] (__hrtimer_run_queues+0x144/0x278)
[<c0077f88>] (__hrtimer_run_queues) from [<c0078718>] (hrtimer_interrupt+0xd4/0x234)
[<c0078718>] (hrtimer_interrupt) from [<c0016944>] (twd_handler+0x3c/0x50)
[<c0016944>] (twd_handler) from [<c0069e1c>] (handle_percpu_devid_irq+0x88/0x124)
[<c0069e1c>] (handle_percpu_devid_irq) from [<c0065a60>] (generic_handle_irq+0x30/0x40)
[<c0065a60>] (generic_handle_irq) from [<c0065bac>] (__handle_domain_irq+0x64/0xc4)
[<c0065bac>] (__handle_domain_irq) from [<c0009454>] (gic_handle_irq+0x30/0x6c)
[<c0009454>] (gic_handle_irq) from [<c0014000>] (__irq_svc+0x40/0x54)
Exception stack(0xec2e1a10 to 0xec2e1a58)
1a00:                                     c08085c0 00000100 00000000 00000000
1a20: 00000010 c079f394 00000282 00000000 c08051ac ec3ae800 00000003 ec2e1a9c
1a40: 0000000a ec2e1a58 c0029f18 c0029a18 60070113 ffffffff
[<c0014000>] (__irq_svc) from [<c0029a18>] (__do_softirq+0x8c/0x2e0)
[<c0029a18>] (__do_softirq) from [<c0029f18>] (irq_exit+0x80/0xc0)
[<c0029f18>] (irq_exit) from [<c0065bb0>] (__handle_domain_irq+0x68/0xc4)
[<c0065bb0>] (__handle_domain_irq) from [<c0009454>] (gic_handle_irq+0x30/0x6c)
[<c0009454>] (gic_handle_irq) from [<c0014000>] (__irq_svc+0x40/0x54)
Exception stack(0xec2e1b08 to 0xec2e1b50)
1b00:                   eee30610 eb8e4300 eb8f2000 c08156cc eb8e4300 2b8f2000
1b20: eefcc000 00000000 00000002 ec3ae800 00000003 ec2e1b8c 00000200 ec2e1b50
1b40: c07a916c c03a4830 a0070013 ffffffff
[<c0014000>] (__irq_svc) from [<c03a4830>] (usb_hcd_map_urb_for_dma+0x360/0x400)
[<c03a4830>] (usb_hcd_map_urb_for_dma) from [<c03a5100>] (usb_hcd_submit_urb+0xd4/0x8ac)
[<c03a5100>] (usb_hcd_submit_urb) from [<c03a6de4>] (usb_submit_urb+0x258/0x4ac)
[<c03a6de4>] (usb_submit_urb) from [<bf0022a4>] (usb_serial_generic_submit_read_urb+0x54/0x9c [usbserial])
[<bf0022a4>] (usb_serial_generic_submit_read_urb [usbserial]) from [<bf002524>] (usb_serial_generic_submit_read_urbs+0x40/0x78 [usbserial])
[<bf002524>] (usb_serial_generic_submit_read_urbs [usbserial]) from [<bf002b44>] (usb_serial_generic_open+0x54/0x58 [usbserial])
[<bf002b44>] (usb_serial_generic_open [usbserial]) from [<bf06ca00>] (ftdi_open+0x88/0x90 [ftdi_sio])
[<bf06ca00>] (ftdi_open [ftdi_sio]) from [<bf0003a0>] (serial_port_activate+0x64/0x94 [usbserial])
[<bf0003a0>] (serial_port_activate [usbserial]) from [<c0313474>] (tty_port_open+0x9c/0xe0)
[<c0313474>] (tty_port_open) from [<bf0007a0>] (serial_open+0x28/0x2c [usbserial])
[<bf0007a0>] (serial_open [usbserial]) from [<c030bd18>] (tty_open+0xc0/0x5e4)
[<c030bd18>] (tty_open) from [<c011cf40>] (chrdev_open+0xd4/0x190)
[<c011cf40>] (chrdev_open) from [<c0116ef8>] (do_dentry_open.isra.13+0xf8/0x300)
[<c0116ef8>] (do_dentry_open.isra.13) from [<c0117f0c>] (vfs_open+0x68/0x70)
[<c0117f0c>] (vfs_open) from [<c01259b0>] (path_openat+0x184/0xf24)
[<c01259b0>] (path_openat) from [<c0127540>] (do_filp_open+0x70/0xc4)
[<c0127540>] (do_filp_open) from [<c0118264>] (do_sys_open+0x120/0x1dc)
[<c0118264>] (do_sys_open) from [<c0118348>] (SyS_open+0x28/0x2c)
[<c0118348>] (SyS_open) from [<c000fb40>] (ret_fast_syscall+0x0/0x3c)

Comment 6 by alent@google.com, Sep 15 2017

I've repoduced this by running
while true; do date; done
on the chameleon's serial console, and ssh-ing into it.

When I open up a minicom session in parallel to the /dev/ttyUSB*, Chameleon locks up with this message, so I'm starting to suspect a kernel/driver bug:

[  193.361158] INFO: rcu_sched self-detected stall on CPU
[  193.366303]  0: (2099 ticks this GP) idle=c81/140000000000002/0 softirq=73139/73139 fqs=1458 
[  193.374787]   (t=2100 jiffies g=7870 c=7869 q=567)
[  193.379572] Task dump for CPU 0:
[  193.382786] date            R running      0 19120    782 0x00000002
[  193.389169] [<c0017648>] (unwind_backtrace) from [<c0013424>] (show_stack+0x20/0x24)
[  193.396896] [<c0013424>] (show_stack) from [<c004e064>] (sched_show_task+0xa4/0x100)
[  193.404614] [<c004e064>] (sched_show_task) from [<c004feec>] (dump_cpu_task+0x48/0x4c)
[  193.412510] [<c004feec>] (dump_cpu_task) from [<c006fe14>] (rcu_dump_cpu_stacks+0xa4/0xd0)
[  193.420748] [<c006fe14>] (rcu_dump_cpu_stacks) from [<c0072d04>] (rcu_check_callbacks+0x4b4/0x8c0)
[  193.429677] [<c0072d04>] (rcu_check_callbacks) from [<c0077564>] (update_process_times+0x40/0x6c)
[  193.438521] [<c0077564>] (update_process_times) from [<c0087a70>] (tick_sched_timer+0x70/0x238)
[  193.447189] [<c0087a70>] (tick_sched_timer) from [<c0077f88>] (__hrtimer_run_queues+0x144/0x278)
[  193.455941] [<c0077f88>] (__hrtimer_run_queues) from [<c0078718>] (hrtimer_interrupt+0xd4/0x234)
[  193.464694] [<c0078718>] (hrtimer_interrupt) from [<c0016944>] (twd_handler+0x3c/0x50)
[  193.472590] [<c0016944>] (twd_handler) from [<c0069e1c>] (handle_percpu_devid_irq+0x88/0x124)
[  193.481084] [<c0069e1c>] (handle_percpu_devid_irq) from [<c0065a60>] (generic_handle_irq+0x30/0x40)
[  193.490095] [<c0065a60>] (generic_handle_irq) from [<c0065bac>] (__handle_domain_irq+0x64/0xc4)
[  193.498760] [<c0065bac>] (__handle_domain_irq) from [<c0009454>] (gic_handle_irq+0x30/0x6c)
[  193.507077] [<c0009454>] (gic_handle_irq) from [<c0014000>] (__irq_svc+0x40/0x54)
[  193.514526] Exception stack(0xeb88bc20 to 0xeb88bc68)
[  193.519559] bc20: c08085c0 00000100 00000000 00000000 0000001e c079f394 00000008 00000000
[  193.527702] bc40: c08051ac ee9fbd0c eea15288 eb88bcac 0000000a eb88bc68 c0029f18 c0029a18
[  193.535841] bc60: 60000113 ffffffff
[  193.539326] [<c0014000>] (__irq_svc) from [<c0029a18>] (__do_softirq+0x8c/0x2e0)
[  193.546696] [<c0029a18>] (__do_softirq) from [<c0029f18>] (irq_exit+0x80/0xc0)
[  193.553892] [<c0029f18>] (irq_exit) from [<c0065bb0>] (__handle_domain_irq+0x68/0xc4)
[  193.561692] [<c0065bb0>] (__handle_domain_irq) from [<c0009454>] (gic_handle_irq+0x30/0x6c)
[  193.570009] [<c0009454>] (gic_handle_irq) from [<c0014000>] (__irq_svc+0x40/0x54)
[  193.577457] Exception stack(0xeb88bd18 to 0xeb88bd60)
[  193.582486] bd00:                                                       00748000 00000001
[  193.590629] bd20: c0815790 00000000 00000000 c0815790 ec338064 00000000 00000001 ee9fbd0c
[  193.598772] bd40: eea15288 eb88bd74 00000000 eb88bd60 c001e218 c010a0d8 20000113 ffffffff
[  193.606920] [<c0014000>] (__irq_svc) from [<c010a0d8>] (memblock_is_memory+0x28/0x80)
[  193.614724] [<c010a0d8>] (memblock_is_memory) from [<c001e218>] (pfn_valid+0x1c/0x20)
[  193.622524] [<c001e218>] (pfn_valid) from [<c001e670>] (__sync_icache_dcache+0x3c/0x9c)
[  193.630498] [<c001e670>] (__sync_icache_dcache) from [<c00fabe4>] (do_set_pte+0x104/0x118)
[  193.638735] [<c00fabe4>] (do_set_pte) from [<c00d1ae0>] (filemap_map_pages+0x268/0x290)
[  193.646710] [<c00d1ae0>] (filemap_map_pages) from [<c00fb43c>] (handle_mm_fault+0x844/0xd14)
[  193.655114] [<c00fb43c>] (handle_mm_fault) from [<c001df5c>] (do_page_fault+0x264/0x38c)
[  193.663173] [<c001df5c>] (do_page_fault) from [<c0009238>] (do_DataAbort+0x44/0xc8)
[  193.670799] [<c0009238>] (do_DataAbort) from [<c0014260>] (__dabt_usr+0x40/0x60)
[  193.678160] Exception stack(0xeb88bfb0 to 0xeb88bff8)
[  193.683192] bfa0:                                     b6f18c30 00000000 00000000 b6e191b8
[  193.691334] bfc0: b6f18c30 00000000 b6e19000 bec1a458 bec1a390 000e1580 bec1a430 bec1a360
[  193.699476] bfe0: 00000000 bec1a420 b6f001ed b6f02fce 40000030 ffffffff

Comment 7 by alent@google.com, Sep 15 2017

When I pull out a dongle, I get these familiar error messages from the driver:

[  354.591026] usb 1-1.4: USB disconnect, device number 3
[  354.598869] ftdi_sio ttyUSB0: ftdi_set_termios error from disable flowcontrol urb
[  354.606381] ftdi_sio ttyUSB0: urb failed to set to rts/cts flow control
[  354.619647] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
[  354.629008] ftdi_sio 1-1.4:1.0: device disconnected

Comment 8 by alent@google.com, Sep 15 2017

A quick peek at the ftdi_sio commit logs makes it seem like that driver isn't the culprit, since very little has changed between socfpga-4.2 and current ToT, but I'm not experienced with kernel debugging, and it could be elsewhere in the kernel, or maybe even a bug in Cyclone V.

I cannot reproduce this on my workstation.

Comment 9 by alent@google.com, Sep 15 2017

Summary: Chameleon hangs on parallel accesses to /dev/ttyUSBs (was: Chameleon Bluetooth flows or underlying code hangs on certain accesses to USB serial ttys)
I tried uprevving the kernel. Seems to work.
Details tomorrow.

Comment 10 by alent@google.com, Sep 15 2017

Steps to build a new kernel:
Setup your chameleon as usual.
Shut it down, and take out the sd card.

Get the kernel source ready:
git clone https://chromium.googlesource.com/linux-fpga-chameleon
cd ~/linux-fpga-chameleon/
git remote add altera-upstream https://github.com/altera-opensource/linux-socfpga.git
git fetch altera-upstream
git checkout -b fpga-chameleon-4.12 altera-upstream/socfpga-4.12
git log --oneline altera-upstream/socfpga-4.2..origin/fpga-chameleon-4.2
# [Looks like we diverge very little, so cherry-pick:]
git cherry-pick 8c0d826f7ab1 607457eebceb 79b862d20232 f17a6391e729 b7dad171ecdc 3e9f310c0536
# [Resolve the conflict in your $EDITOR by choosing the Chameleon-specific changes, deleting the HEAD section.]
git add arch/arm/configs/socfpga_defconfig
git cherry-pick --continue
# [Save the commit message in your $EDITOR.]


Make the kernel:
(I referenced http://www.alterawiki.com/wiki/Compiling_u-boot_and_Linux_Kernel_for_Cyclone_V_SoC and https://github.com/umiddelb/armhf/wiki/How-To-compile-a-custom-Linux-kernel-for-your-ARM-device .)
export ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
make socfpga_defconfig
make -j $CORES zImage modules dtbs


Move the images to your SD Card:
[Assuming your Chameleon SD Card is at $SD_CARD("/dev/sda", for example):]
mkdir ~/mountuboot
mkdir ~/mountsystem
sudo mount ${SD_CARD}1 ~/mountuboot
sudo mount ${SD_CARD}2 ~/mountsystem
sudo cp ~/linux-fpga-chameleon/arch/arm/boot/zImage ~/mountuboot/zImage
sudo cp ~/linux-fpga-chameleon/arch/arm/boot/dts/socfpga_cyclone5_sockit.dtb ~/mountuboot/socfpga.dtb
sudo chown -R $USER ~/mountsystem/lib/modules
# [Necessary b/c existing modules take up too much space otherwise.]
sudo rm -rf ~/mountsystem/lib/modules/4.2.0-ga14fbd2
make modules_install INSTALL_MOD_PATH=~/mountsystem
sudo chown -R root ~/mountsystem/lib/modules
sudo umount ~/mountuboot
sudo umount ~/mountsystem
rmdir ~/mountuboot
rmdir ~/mountsystem

Now run some tests! (I doesn't seem like all of Chameleon works, but this is enough to do BT tests. Maybe an intermediate kernel version would be better?)

Comment 11 by alent@google.com, Sep 15 2017

Owner: rjahagir@chromium.org
Interesting to note that we still see stalls on my hacked-together fpga-chameleon-4.12 kernel, but they're no longer hanging the system. At the same time various parts of the Chameleon unrelated to Bluetooth may or may not work, as the screen is not backlit and shows no chameleon image...

Reassigning as I am no longer with the Chromium project.
Status: Assigned (was: Untriaged)

Sign in to add a comment