Major issue running the CrOS EC for Cortex-M0/Cortex-M built with recent toolchains
Reported by
cont...@paulk.fr,
Jul 26 2016
|
|||||
Issue description
UserAgent: Mozilla/5.0 (Windows NT 6.1; rv:47.0) Gecko/20100101 Firefox/47.0
Steps to reproduce the problem:
1. Build the CrOS EC for a Cortex-M0 board with a recent GCC version (> 4.8)
2. Run the built binary on the board
3. Experience run-time error:
ASSERTION FAILURE 'tskid < TASK_ID_COUNT' in timer_cancel() at common/timer.c:137
I'm working with a veyron_speedy, that was removed from ToT, so my work is based on the release-R44-7077.B branch that still has support for it. It is using a Cortex-M0-based EC. Sadly, I don't have any board with a Cortex-M0 that is supported in ToT, but the code is still affected.
What is the expected behavior?
There should be no run-time assertion failure.
What went wrong?
The way Cortex processors handle exceptions allows writing exception routines directly in C, as return from exception is handled by providing a special value for the link register.
However, it is not safe to do this when doing context switching. In particular, C handlers may push some general-purpose registers that are used by the handler and pop them later, even when context switch has happened in the meantime. While the processor will restore {r0-r3} from the stack when returning from an exception, the C handler code may push, use and pop another register, such as r4.
It turns out that GCC 4.8 would generally only use r3 in svc_handler and pendsv_handler, but newer versions tend to use r4, thus clobbering r4 that was restored from the context switch and leading up to a fault when r4 is used by the task code.
An occurrence of this behaviour takes place with GCC > 4.8 in __wait_evt, where "me" is stored in r4, which gets clobbered after an exception triggers pendsv_handler. The exception handler uses r4 internally, does a context switch and then restores the previous value of r4, which is not restored by the processor's internal, thus clobbering r4. This ends up with the following assertion failure: 'tskid < TASK_ID_COUNT' in timer_cancel() at common/timer.c:137 For this reason, it is safer to have assembly routines for exception handlers that do context switching.
This bug occurred on a Cortex-M0, but Cortex-M in general is just as prone to it. Only that the code structure tends to keep register use within {r0-r3} for now, but we can't rely on that so this should be fixed on Cortex-M0 and Cortex-M.
Disassembly of the routine that caused the issue on Cortex-M0:
* built with GCC 4.8 (no run-time issue, using r3)
* built with GCC 5.3 (run-time issue, using r4)
Did this work before? N/A
Chrome version: <Copy from: 'about:version'> Channel: n/a
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version:
,
Jul 26 2016
,
Jul 26 2016
Patches fixing this issue were submitted at: * https://chromium-review.googlesource.com/#/c/362830/ * https://chromium-review.googlesource.com/#/c/362831/
,
Jan 3 2017
,
Mar 15 2017
,
Jul 22 2017
Sent out a simpler fix that just takes the instructions generated by GCC and corrects them in the most minimalist way at: https://chromium-review.googlesource.com/c/362830/6 Abandoned the Cortex-M change as described at: https://chromium-review.googlesource.com/c/362831/
,
Aug 18 2017
,
Aug 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/e5c69151dfd895162118023b6c591686dd49d4c9 commit e5c69151dfd895162118023b6c591686dd49d4c9 Author: Paul Kocialkowski <contact@paulk.fr> Date: Mon Aug 28 17:55:41 2017 cortex-m0: Use assembly exception handlers for task switching The way Cortex processors handle exceptions allows writing exception routines directly in C, as return from exception is handled by providing a special value for the link register. However, it is not safe to do this when doing context switching. In particular, C handlers may push some general-purpose registers that are used by the handler and pop them later, even when context switch has happened in the meantime. While the processor will restore {r0-r3} from the stack when returning from an exception, the C handler code may push, use and pop another register, such as r4. It turns out that GCC 4.8 would generally only use r3 in svc_handler and pendsv_handler, but newer versions tend to use r4, thus clobbering r4 that was restored from the context switch and leading up to a fault when r4 is used by the task code. An occurrence of this behaviour takes place with GCC > 4.8 in __wait_evt, where "me" is stored in r4, which gets clobbered after an exception triggers pendsv_handler. The exception handler uses r4 internally, does a context switch and then restores the previous value of r4, which is not restored by the processor's internal, thus clobbering r4. This ends up with the following assertion failure: 'tskid < TASK_ID_COUNT' in timer_cancel() at common/timer.c:137 For this reason, it is safer to have assembly routines for exception handlers that do context switching. BUG=chromium:631514 BRANCH=None TEST=Build and run speedy EC with a recent GCC version Change-Id: Ib068bc12ce2204aee3e0f563efcb94f15aa87013 Signed-off-by: Paul Kocialkowski <contact@paulk.fr> Reviewed-on: https://chromium-review.googlesource.com/362830 Commit-Ready: Shawn N <shawnn@chromium.org> Tested-by: Shawn N <shawnn@chromium.org> Reviewed-by: Shawn N <shawnn@chromium.org> [modify] https://crrev.com/e5c69151dfd895162118023b6c591686dd49d4c9/core/cortex-m0/switch.S [modify] https://crrev.com/e5c69151dfd895162118023b6c591686dd49d4c9/core/cortex-m0/task.c |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dlaurie@chromium.org
, Jul 26 2016Labels: -Via-Wizard Cr-OS-Firmware-EC
Owner: reinauer@chromium.org
Status: Available (was: Unconfirmed)