New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 647727 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Unaligned host command params/response structs are inefficient

Project Member Reported by rspangler@chromium.org, Sep 16 2016

Issue description

Currently, our host command structs are all __packed.  This is important to ensure that they aren't padded on 64-bit host architectures, even though the vast majority of those structs have all their members on size-multiple boundaries.

Using __packed by itself causes the ARM compiler to generate inefficient code for accessing structure members, because it can't assume that the structs themselves are aligned on 32-bit boundaries.  This causes the compiler to emit very inefficient code to do unaligned reads/writes.

We control the host command buffers, so we can ensure that they're on 32-bit boundaries inside the EC.  (LM4 does this.  STM32 previously did this and might still.  NPCX doesn't right now but it's fixable; not sure about MEC1322).)  If we do that, we can __attribute__((aligned(4))) those structs and generate much more efficent code.

Over all of the host commands, this adds up to a *substantial* code size savings.  In my test on kevin, this saves >1600 bytes of binary size.

This needs to be done carefully to make sure we don't introduce unaligned 32-bit reads/writes, but the savings is enough that we should fix this.  Suggest adding CONFIG_HOST_COMMAND_BUFFERS_ALIGNED to do this on a chip-by-chip basis.  If that's not defined (as it won't be on the host), code will default to what it does now.

 
Building test code this way:

arm-none-eabi-gcc -S -c foo.c

If I specify this:

struct foo {
	uint32_t x;
	uint16_t y;
} __attribute__((packed));

int bar(uint8_t *buf) {
	struct foo *f = (struct foo *)buf;

	return f->x + f->y;
}

Then I get this assembly:

bar:
	push	{r7, lr}
	sub	sp, sp, #16
	add	r7, sp, #0
	str	r0, [r7, #4]
	ldr	r3, [r7, #4]
	str	r3, [r7, #12]
	ldr	r3, [r7, #12]
	ldrb	r2, [r3]
	ldrb	r1, [r3, #1]
	lsl	r1, r1, #8
	orr	r2, r1
	ldrb	r1, [r3, #2]
	lsl	r1, r1, #16
	orr	r2, r1
	ldrb	r3, [r3, #3]
	lsl	r3, r3, #24
	orr	r3, r2
	mov	r1, r3
	ldr	r3, [r7, #12]
	ldrb	r2, [r3, #4]
	ldrb	r3, [r3, #5]
	lsl	r3, r3, #8
	orr	r3, r2
	lsl	r3, r3, #16
	lsr	r3, r3, #16
	add	r3, r1, r3
	mov	r0, r3
	mov	sp, r7
	add	sp, sp, #16
	@ sp needed
	pop	{r7}
	pop	{r1}
	bx	r1

But if change that to:

struct foo {
	uint32_t x;
	uint16_t y;
} __attribute__((packed)) __attribute__((aligned (4)));

Then I get this:

bar:
	push	{r7, lr}
	sub	sp, sp, #16
	add	r7, sp, #0
	str	r0, [r7, #4]
	ldr	r3, [r7, #4]
	str	r3, [r7, #12]
	ldr	r3, [r7, #12]
	ldr	r3, [r3]
	ldr	r2, [r7, #12]
	ldrh	r2, [r2, #4]
	add	r3, r3, r2
	mov	r0, r3
	mov	sp, r7
	add	sp, sp, #16
	@ sp needed
	pop	{r7}
	pop	{r1}
	bx	r1

So, 18 instructions turn into 3.  That's a huge win.

Cc: aaboagye@chromium.org reinauer@chromium.org
So, this is getting messy.  Some of the structs are used both nested and unnested, and some aren't packed now even though they probably should be.

I'm going to keep the current behavior the default, and turn it on a struct at a time, a chip at a time as I'm able to verify all the uses of the struct in the EC codebase.

It's still worth doing - there's a huge code size win - but it's going to take more work.  I'll get the framework set up, and then we can convert commands as we have time and/or need code space.
Re#1: I should have compiled with -Os, since that's how the firmware is compiled.  that changes the output to:

bar:
        push    {r4, lr}
        ldrb    r3, [r0, #5]
        ldrb    r2, [r0, #1]
        lsl     r1, r3, #8
        ldrb    r3, [r0]
        lsl     r2, r2, #8
        orr     r2, r3
        ldrb    r3, [r0, #2]
        ldrb    r4, [r0, #4]
        ldrb    r0, [r0, #3]
        lsl     r3, r3, #16
        orr     r3, r2
        lsl     r0, r0, #24
        orr     r1, r4
        orr     r0, r3
        add     r0, r1, r0
        @ sp needed
        pop     {r4}
        pop     {r1}
        bx      r1

vs. with alignment:

bar2:
        ldrh    r2, [r0, #4]
        ldr     r3, [r0]
        @ sp needed
        add     r0, r2, r3
        bx      lr

So, 19 instructions vs. 4 total.

Looking at the struct packing - particularly for nested structs - I'm not sure how the code currently even works without crashing for some structs.

If you do this:

/* This is unpacked; size=12 */
struct foo6 {
	uint32_t a;
	uint32_t b;
	uint32_t c;
};

/* This includes foo6, and is packed; size=13 */
struct __packed foo7 {
	uint8_t cmd;
	struct foo6 f;
};

Then foo7.f is at offset 1 in the struct.  That will make its members not 32-bit-aligned.  But now look at these functions:

int barfoo6(struct foo6 *f)
{
	return f->a + f->b;
}

int barfoo7(uint8_t *buf)
{
	struct foo7 *f7 = (struct foo7 *)buf;
	return barfoo6(&f7->f);
}

barfoo7() generates the unaligned struct foo6 *, and passes it to barfoo6().  But barfoo6() assumes the struct is aligned, so will blow up unpleasantly if it's not aligned.

But we already do this!  Maybe we're just getting lucky?

Also, there doesn't appear to be a way to hint to the compiler that a pointer is actually aligned.  Something like this would be handy:

void *x;
struct __aligned(4) *foo7 = x;
return foo7.f.a + foo7.f.b;

I've made a first pass at annotating the command param/response structs here:
https://chromium-review.googlesource.com/387126

This reduces binary size by ~2%.  I think there's another 1% that will be a bit harder to get due to sub-optimal struct sizes and use of nested structs.

Project Member

Comment 7 by bugdroid1@chromium.org, Oct 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52

commit 5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52
Author: Randall Spangler <rspangler@chromium.org>
Date: Fri Sep 16 17:36:54 2016

Support alignment for EC host command structures

The host command parameter and response buffers should be explicitly
aligned by the LPC/SPI/I2C drivers.  But the host command handlers don't
know that, and the structs are all __packed, so the compiler generates
horribly inefficient ARM Cortex-M code to cope with unaligned accesses.

Add __ec_align{1,2,4} to force the param / response structs to be
aligned.  Use it in a few structs now which were straightforward to
test.  It should be added to more structs as space is needed, but that
would make this change unwieldy to review and test.

Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment.
Currently, this is enabled only for LM4 and samus_pd, so that EC code
can be tested without affecting other non-samus ToT development (none of
which uses LM4).

Fix the two handlers that weren't actually aligned (despite one of
them having comments to the contrary).

Also, add a CHROMIUM_EC define that can be used to determine if a file
is being compiled for an EC target.  We need that so that we only force
structure alignment for EC binaries.  On the AP side, buffers may not be
aligned, so we should not force alignment.

BUG=chromium:647727
BRANCH=none
TEST=Flash samus and samus_pd.  Boot samus and run a bunch of ectool
     commands (with and without --dev=1, so it tests both EC and PD).
     System boots and all commands return expected results.

Change-Id: I4537d61a75cf087647e24281288392eb85f22eba
Signed-off-by: Randall Spangler <rspangler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/387126

[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/stm32/i2c-stm32f0.c
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/include/config.h
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/Makefile.toolchain
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/board/samus_pd/board.h
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/include/ec_commands.h
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/lm4/config_chip.h
[modify] https://crrev.com/5e1c63f6ebc9afb847d635c36e0bec31ecdf6a52/chip/npcx/shi.c

Project Member

Comment 8 by bugdroid1@chromium.org, Nov 15 2016

Labels: merge-merged-firmware-glados-7820.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/a3278005e3f741a153940b74d78ce49e3c19879e

commit a3278005e3f741a153940b74d78ce49e3c19879e
Author: Randall Spangler <rspangler@chromium.org>
Date: Fri Sep 16 17:36:54 2016

CHERRY-PICK: Support alignment for EC host command structures

The host command parameter and response buffers should be explicitly
aligned by the LPC/SPI/I2C drivers.  But the host command handlers don't
know that, and the structs are all __packed, so the compiler generates
horribly inefficient ARM Cortex-M code to cope with unaligned accesses.

Add __ec_align{1,2,4} to force the param / response structs to be
aligned.  Use it in a few structs now which were straightforward to
test.  It should be added to more structs as space is needed, but that
would make this change unwieldy to review and test.

Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment.
Currently, this is enabled only for LM4 and samus_pd, so that EC code
can be tested without affecting other non-samus ToT development (none of
which uses LM4).

Fix the two handlers that weren't actually aligned (despite one of
them having comments to the contrary).

Also, add a CHROMIUM_EC define that can be used to determine if a file
is being compiled for an EC target.  We need that so that we only force
structure alignment for EC binaries.  On the AP side, buffers may not be
aligned, so we should not force alignment.

BUG=chromium:647727
BRANCH=none
TEST=Flash samus and samus_pd.  Boot samus and run a bunch of ectool
     commands (with and without --dev=1, so it tests both EC and PD).
     System boots and all commands return expected results.

Change-Id: Ia0fcdea001e3a0350d67b5a42b0236023ce197b8
ORIGIN-Change-Id: I4537d61a75cf087647e24281288392eb85f22eba
Signed-off-by: Randall Spangler <rspangler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/387126
Reviewed-on: https://chromium-review.googlesource.com/411562
Reviewed-by: Shawn N <shawnn@chromium.org>
Commit-Queue: BoChao Jhan <james_chao@asus.com>
Tested-by: BoChao Jhan <james_chao@asus.com>

[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/stm32/i2c-stm32f0.c
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/include/config.h
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/Makefile.toolchain
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/board/samus_pd/board.h
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/include/ec_commands.h
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/lm4/config_chip.h
[modify] https://crrev.com/a3278005e3f741a153940b74d78ce49e3c19879e/chip/npcx/shi.c

Project Member

Comment 9 by bugdroid1@chromium.org, Feb 22 2017

Labels: merge-merged-factory-reef-8811.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/3390d1470ab117f6154804c394cffe373a2ca418

commit 3390d1470ab117f6154804c394cffe373a2ca418
Author: Randall Spangler <rspangler@chromium.org>
Date: Wed Feb 22 03:05:38 2017

Support alignment for EC host command structures

The host command parameter and response buffers should be explicitly
aligned by the LPC/SPI/I2C drivers.  But the host command handlers don't
know that, and the structs are all __packed, so the compiler generates
horribly inefficient ARM Cortex-M code to cope with unaligned accesses.

Add __ec_align{1,2,4} to force the param / response structs to be
aligned.  Use it in a few structs now which were straightforward to
test.  It should be added to more structs as space is needed, but that
would make this change unwieldy to review and test.

Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment.
Currently, this is enabled only for LM4 and samus_pd, so that EC code
can be tested without affecting other non-samus ToT development (none of
which uses LM4).

Fix the two handlers that weren't actually aligned (despite one of
them having comments to the contrary).

Also, add a CHROMIUM_EC define that can be used to determine if a file
is being compiled for an EC target.  We need that so that we only force
structure alignment for EC binaries.  On the AP side, buffers may not be
aligned, so we should not force alignment.

BUG=chromium:647727
BRANCH=none
TEST=Flash samus and samus_pd.  Boot samus and run a bunch of ectool
     commands (with and without --dev=1, so it tests both EC and PD).
     System boots and all commands return expected results.

Change-Id: I4537d61a75cf087647e24281288392eb85f22eba
Signed-off-by: Randall Spangler <rspangler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/387126
Reviewed-on: https://chromium-review.googlesource.com/446066
Commit-Queue: Daisuke Nojiri <dnojiri@chromium.org>
Tested-by: Daisuke Nojiri <dnojiri@chromium.org>

[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/stm32/i2c-stm32f0.c
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/include/config.h
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/Makefile.toolchain
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/board/samus_pd/board.h
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/include/ec_commands.h
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/lm4/config_chip.h
[modify] https://crrev.com/3390d1470ab117f6154804c394cffe373a2ca418/chip/npcx/shi.c

Project Member

Comment 10 by bugdroid1@chromium.org, Jun 3 2017

Labels: merge-merged-firmware-gru-8785.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3

commit 5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3
Author: Randall Spangler <rspangler@chromium.org>
Date: Sat Jun 03 00:09:48 2017

Support alignment for EC host command structures

The host command parameter and response buffers should be explicitly
aligned by the LPC/SPI/I2C drivers.  But the host command handlers don't
know that, and the structs are all __packed, so the compiler generates
horribly inefficient ARM Cortex-M code to cope with unaligned accesses.

Add __ec_align{1,2,4} to force the param / response structs to be
aligned.  Use it in a few structs now which were straightforward to
test.  It should be added to more structs as space is needed, but that
would make this change unwieldy to review and test.

Add CONFIG_HOSTCMD_ALIGNED to enable the additional alignment.
Currently, this is enabled only for LM4 and samus_pd, so that EC code
can be tested without affecting other non-samus ToT development (none of
which uses LM4).

Fix the two handlers that weren't actually aligned (despite one of
them having comments to the contrary).

Also, add a CHROMIUM_EC define that can be used to determine if a file
is being compiled for an EC target.  We need that so that we only force
structure alignment for EC binaries.  On the AP side, buffers may not be
aligned, so we should not force alignment.

BUG=chromium:647727
BRANCH=none
TEST=Flash samus and samus_pd.  Boot samus and run a bunch of ectool
     commands (with and without --dev=1, so it tests both EC and PD).
     System boots and all commands return expected results.

Change-Id: I4537d61a75cf087647e24281288392eb85f22eba
Signed-off-by: Randall Spangler <rspangler@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/387126
Reviewed-on: https://chromium-review.googlesource.com/522972
Reviewed-by: Aseda Aboagye <aaboagye@chromium.org>
Commit-Queue: Aseda Aboagye <aaboagye@chromium.org>
Tested-by: Aseda Aboagye <aaboagye@chromium.org>

[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/stm32/i2c-stm32f0.c
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/include/config.h
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/Makefile.toolchain
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/board/samus_pd/board.h
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/include/ec_commands.h
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/lm4/config_chip.h
[modify] https://crrev.com/5ad1c75211ebfa01f4b47cc74f03f82aca2bedb3/chip/npcx/shi.c

Components: OS>Kernel
Cc: drinkcat@chromium.org gkihumba@chromium.org
Components: -OS>Kernel OS>Firmware>EC
Another bonus advantage of using CONFIG_HOSTCMD_ALIGNED would be to make clang happy (for host tests):
https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/1107522/2/common/motion_sense.c#1091

Sign in to add a comment