Unaligned host command params/response structs are inefficient |
|||||||
Issue descriptionCurrently, our host command structs are all __packed. This is important to ensure that they aren't padded on 64-bit host architectures, even though the vast majority of those structs have all their members on size-multiple boundaries. Using __packed by itself causes the ARM compiler to generate inefficient code for accessing structure members, because it can't assume that the structs themselves are aligned on 32-bit boundaries. This causes the compiler to emit very inefficient code to do unaligned reads/writes. We control the host command buffers, so we can ensure that they're on 32-bit boundaries inside the EC. (LM4 does this. STM32 previously did this and might still. NPCX doesn't right now but it's fixable; not sure about MEC1322).) If we do that, we can __attribute__((aligned(4))) those structs and generate much more efficent code. Over all of the host commands, this adds up to a *substantial* code size savings. In my test on kevin, this saves >1600 bytes of binary size. This needs to be done carefully to make sure we don't introduce unaligned 32-bit reads/writes, but the savings is enough that we should fix this. Suggest adding CONFIG_HOST_COMMAND_BUFFERS_ALIGNED to do this on a chip-by-chip basis. If that's not defined (as it won't be on the host), code will default to what it does now.
,
Sep 20 2016
So, this is getting messy. Some of the structs are used both nested and unnested, and some aren't packed now even though they probably should be. I'm going to keep the current behavior the default, and turn it on a struct at a time, a chip at a time as I'm able to verify all the uses of the struct in the EC codebase. It's still worth doing - there's a huge code size win - but it's going to take more work. I'll get the framework set up, and then we can convert commands as we have time and/or need code space.
,
Sep 20 2016
Re#1: I should have compiled with -Os, since that's how the firmware is compiled. that changes the output to:
bar:
push {r4, lr}
ldrb r3, [r0, #5]
ldrb r2, [r0, #1]
lsl r1, r3, #8
ldrb r3, [r0]
lsl r2, r2, #8
orr r2, r3
ldrb r3, [r0, #2]
ldrb r4, [r0, #4]
ldrb r0, [r0, #3]
lsl r3, r3, #16
orr r3, r2
lsl r0, r0, #24
orr r1, r4
orr r0, r3
add r0, r1, r0
@ sp needed
pop {r4}
pop {r1}
bx r1
vs. with alignment:
bar2:
ldrh r2, [r0, #4]
ldr r3, [r0]
@ sp needed
add r0, r2, r3
bx lr
So, 19 instructions vs. 4 total.
,
Sep 20 2016
Looking at the struct packing - particularly for nested structs - I'm not sure how the code currently even works without crashing for some structs.
If you do this:
/* This is unpacked; size=12 */
struct foo6 {
uint32_t a;
uint32_t b;
uint32_t c;
};
/* This includes foo6, and is packed; size=13 */
struct __packed foo7 {
uint8_t cmd;
struct foo6 f;
};
Then foo7.f is at offset 1 in the struct. That will make its members not 32-bit-aligned. But now look at these functions:
int barfoo6(struct foo6 *f)
{
return f->a + f->b;
}
int barfoo7(uint8_t *buf)
{
struct foo7 *f7 = (struct foo7 *)buf;
return barfoo6(&f7->f);
}
barfoo7() generates the unaligned struct foo6 *, and passes it to barfoo6(). But barfoo6() assumes the struct is aligned, so will blow up unpleasantly if it's not aligned.
But we already do this! Maybe we're just getting lucky?
,
Sep 20 2016
Also, there doesn't appear to be a way to hint to the compiler that a pointer is actually aligned. Something like this would be handy: void *x; struct __aligned(4) *foo7 = x; return foo7.f.a + foo7.f.b;
,
Sep 27 2016
I've made a first pass at annotating the command param/response structs here: https://chromium-review.googlesource.com/387126 This reduces binary size by ~2%. I think there's another 1% that will be a bit harder to get due to sub-optimal struct sizes and use of nested structs.
,
Oct 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52 commit 5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52 Author: Randall Spangler <rspangler@chromium.org> Date: Fri Sep 16 17:36:54 2016 Support alignment for EC host command structures The host command parameter and response buffers should be explicitly aligned by the LPC/SPI/I2C drivers. But the host command handlers don't know that, and the structs are all __packed, so the compiler generates horribly inefficient ARM Cortex-M code to cope with unaligned accesses. Add __ec_align{1,2,4} to force the param / response structs to be aligned. Use it in a few structs now which were straightforward to test. It should be added to more structs as space is needed, but that would make this change unwieldy to review and test. Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment. Currently, this is enabled only for LM4 and samus_pd, so that EC code can be tested without affecting other non-samus ToT development (none of which uses LM4). Fix the two handlers that weren't actually aligned (despite one of them having comments to the contrary). Also, add a CHROMIUM_EC define that can be used to determine if a file is being compiled for an EC target. We need that so that we only force structure alignment for EC binaries. On the AP side, buffers may not be aligned, so we should not force alignment. BUG=chromium:647727 BRANCH=none TEST=Flash samus and samus_pd. Boot samus and run a bunch of ectool commands (with and without --dev=1, so it tests both EC and PD). System boots and all commands return expected results. Change-Id: I4537d61a75cf087647e24281288392eb85f22eba Signed-off-by: Randall Spangler <rspangler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/387126 [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/stm32/i2c-stm32f0.c [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/include/config.h [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/Makefile.toolchain [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/board/samus_pd/board.h [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/include/ec_commands.h [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/lm4/config_chip.h [modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/npcx/shi.c
,
Nov 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/a3278005e3f741a153940b74d78ce49e3c19879e commit a3278005e3f741a153940b74d78ce49e3c19879e Author: Randall Spangler <rspangler@chromium.org> Date: Fri Sep 16 17:36:54 2016 CHERRY-PICK: Support alignment for EC host command structures The host command parameter and response buffers should be explicitly aligned by the LPC/SPI/I2C drivers. But the host command handlers don't know that, and the structs are all __packed, so the compiler generates horribly inefficient ARM Cortex-M code to cope with unaligned accesses. Add __ec_align{1,2,4} to force the param / response structs to be aligned. Use it in a few structs now which were straightforward to test. It should be added to more structs as space is needed, but that would make this change unwieldy to review and test. Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment. Currently, this is enabled only for LM4 and samus_pd, so that EC code can be tested without affecting other non-samus ToT development (none of which uses LM4). Fix the two handlers that weren't actually aligned (despite one of them having comments to the contrary). Also, add a CHROMIUM_EC define that can be used to determine if a file is being compiled for an EC target. We need that so that we only force structure alignment for EC binaries. On the AP side, buffers may not be aligned, so we should not force alignment. BUG=chromium:647727 BRANCH=none TEST=Flash samus and samus_pd. Boot samus and run a bunch of ectool commands (with and without --dev=1, so it tests both EC and PD). System boots and all commands return expected results. Change-Id: Ia0fcdea001e3a0350d67b5a42b0236023ce197b8 ORIGIN-Change-Id: I4537d61a75cf087647e24281288392eb85f22eba Signed-off-by: Randall Spangler <rspangler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/387126 Reviewed-on: https://chromium-review.googlesource.com/411562 Reviewed-by: Shawn N <shawnn@chromium.org> Commit-Queue: BoChao Jhan <james_chao@asus.com> Tested-by: BoChao Jhan <james_chao@asus.com> [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/stm32/i2c-stm32f0.c [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/include/config.h [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/Makefile.toolchain [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/board/samus_pd/board.h [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/include/ec_commands.h [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/lm4/config_chip.h [modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/npcx/shi.c
,
Feb 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/3390d1470ab117f6154804c394cffe373a2ca418 commit 3390d1470ab117f6154804c394cffe373a2ca418 Author: Randall Spangler <rspangler@chromium.org> Date: Wed Feb 22 03:05:38 2017 Support alignment for EC host command structures The host command parameter and response buffers should be explicitly aligned by the LPC/SPI/I2C drivers. But the host command handlers don't know that, and the structs are all __packed, so the compiler generates horribly inefficient ARM Cortex-M code to cope with unaligned accesses. Add __ec_align{1,2,4} to force the param / response structs to be aligned. Use it in a few structs now which were straightforward to test. It should be added to more structs as space is needed, but that would make this change unwieldy to review and test. Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment. Currently, this is enabled only for LM4 and samus_pd, so that EC code can be tested without affecting other non-samus ToT development (none of which uses LM4). Fix the two handlers that weren't actually aligned (despite one of them having comments to the contrary). Also, add a CHROMIUM_EC define that can be used to determine if a file is being compiled for an EC target. We need that so that we only force structure alignment for EC binaries. On the AP side, buffers may not be aligned, so we should not force alignment. BUG=chromium:647727 BRANCH=none TEST=Flash samus and samus_pd. Boot samus and run a bunch of ectool commands (with and without --dev=1, so it tests both EC and PD). System boots and all commands return expected results. Change-Id: I4537d61a75cf087647e24281288392eb85f22eba Signed-off-by: Randall Spangler <rspangler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/387126 Reviewed-on: https://chromium-review.googlesource.com/446066 Commit-Queue: Daisuke Nojiri <dnojiri@chromium.org> Tested-by: Daisuke Nojiri <dnojiri@chromium.org> [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/stm32/i2c-stm32f0.c [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/include/config.h [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/Makefile.toolchain [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/board/samus_pd/board.h [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/include/ec_commands.h [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/lm4/config_chip.h [modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/npcx/shi.c
,
Jun 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3 commit 5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3 Author: Randall Spangler <rspangler@chromium.org> Date: Sat Jun 03 00:09:48 2017 Support alignment for EC host command structures The host command parameter and response buffers should be explicitly aligned by the LPC/SPI/I2C drivers. But the host command handlers don't know that, and the structs are all __packed, so the compiler generates horribly inefficient ARM Cortex-M code to cope with unaligned accesses. Add __ec_align{1,2,4} to force the param / response structs to be aligned. Use it in a few structs now which were straightforward to test. It should be added to more structs as space is needed, but that would make this change unwieldy to review and test. Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment. Currently, this is enabled only for LM4 and samus_pd, so that EC code can be tested without affecting other non-samus ToT development (none of which uses LM4). Fix the two handlers that weren't actually aligned (despite one of them having comments to the contrary). Also, add a CHROMIUM_EC define that can be used to determine if a file is being compiled for an EC target. We need that so that we only force structure alignment for EC binaries. On the AP side, buffers may not be aligned, so we should not force alignment. BUG=chromium:647727 BRANCH=none TEST=Flash samus and samus_pd. Boot samus and run a bunch of ectool commands (with and without --dev=1, so it tests both EC and PD). System boots and all commands return expected results. Change-Id: I4537d61a75cf087647e24281288392eb85f22eba Signed-off-by: Randall Spangler <rspangler@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/387126 Reviewed-on: https://chromium-review.googlesource.com/522972 Reviewed-by: Aseda Aboagye <aaboagye@chromium.org> Commit-Queue: Aseda Aboagye <aaboagye@chromium.org> Tested-by: Aseda Aboagye <aaboagye@chromium.org> [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/stm32/i2c-stm32f0.c [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/include/config.h [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/Makefile.toolchain [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/board/samus_pd/board.h [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/include/ec_commands.h [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/lm4/config_chip.h [modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/npcx/shi.c
,
Feb 23 2018
,
Jun 22 2018
Another bonus advantage of using CONFIG_HOSTCMD_ALIGNED would be to make clang happy (for host tests): https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/1107522/2/common/motion_sense.c#1091 |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by rspangler@chromium.org
, Sep 16 2016Building test code this way: arm-none-eabi-gcc -S -c foo.c If I specify this: struct foo { uint32_t x; uint16_t y; } __attribute__((packed)); int bar(uint8_t *buf) { struct foo *f = (struct foo *)buf; return f->x + f->y; } Then I get this assembly: bar: push {r7, lr} sub sp, sp, #16 add r7, sp, #0 str r0, [r7, #4] ldr r3, [r7, #4] str r3, [r7, #12] ldr r3, [r7, #12] ldrb r2, [r3] ldrb r1, [r3, #1] lsl r1, r1, #8 orr r2, r1 ldrb r1, [r3, #2] lsl r1, r1, #16 orr r2, r1 ldrb r3, [r3, #3] lsl r3, r3, #24 orr r3, r2 mov r1, r3 ldr r3, [r7, #12] ldrb r2, [r3, #4] ldrb r3, [r3, #5] lsl r3, r3, #8 orr r3, r2 lsl r3, r3, #16 lsr r3, r3, #16 add r3, r1, r3 mov r0, r3 mov sp, r7 add sp, sp, #16 @ sp needed pop {r7} pop {r1} bx r1 But if change that to: struct foo { uint32_t x; uint16_t y; } __attribute__((packed)) __attribute__((aligned (4))); Then I get this: bar: push {r7, lr} sub sp, sp, #16 add r7, sp, #0 str r0, [r7, #4] ldr r3, [r7, #4] str r3, [r7, #12] ldr r3, [r7, #12] ldr r3, [r3] ldr r2, [r7, #12] ldrh r2, [r2, #4] add r3, r3, r2 mov r0, r3 mov sp, r7 add sp, sp, #16 @ sp needed pop {r7} pop {r1} bx r1 So, 18 instructions turn into 3. That's a huge win.