Issue metadata
Sign in to add a comment
|
Add EC feature for generic retry mechanism |
||||||||||||||||||||||
Issue descriptionWe always end up with problems where EC comms are only 99.99% reliable and then we end up spending tons of time trying to get that last .01%. I wonder if we should add add some stuff to the EC protocol that would allow reliable retries in the case of failure. Errors that affect reliability suddenly become much less of an issue. To implement this, you need: * A sequence number that increases with each command (doesn't need to be very big, could even be 1 bit big). * Enough RAM on the EC to store the biggest possible response. The idea is that if the EC gets a command and sees that the sequence number matches the last sequence number that it got, it will immediately return the last response rather than doing any additional work. Doing this will allow even commands with a "side effect" to be retried on communication failure. --- As an example using a "1 bit" sequence number: AP: get_next_event (seq #0) EC: gets the event, caches the result, sends result back to AP AP: get_next_event (seq #1) EC: gets the event, caches the result, sends result back to AP AP: get_next_event (seq #1) EC: same seq number as last time ==> retry; send back cache AP: get_next_event (seq #0) EC: gets the event, caches the result, sends result back to AP
,
Jan 5 2017
I'll propose a design for this.
,
Jan 5 2017
,
Jan 11 2017
Working on a design doc. We'll need to add host command protocol V4 to do this, because existing protocol request/response handlers are paranoid about verifying the reserved bytes == 0. To keep the init sequence simple, we'll keep supporting command protocol V3, at least for EC_CMD_GET_PROTOCOL_INFO. Once an AP driver determines via that command that V4 is supported, it can decide to start sending V4 packets. The EC changes are fairly straightforward and well-contained. The AP changes can be phased in over time, since it'll be a while before we can drop V3 support in the EC (just as it was a while before we could drop V2 support).
,
Feb 25 2017
Design doc in progress here: https://docs.google.com/a/google.com/document/d/1AcH2DH9lzceRncpNKj32l0W1ueaN2EXfQZWfSRRbyfY/edit?usp=sharing Not ready for review yet, but want to make sure it's linked in the bug so we don't lose it.
,
Oct 10 2017
I'm curious if this is still planned?
,
Oct 10 2017
I'd still like to do it, if we can find resources to do so.
,
Oct 10 2017
This came up recently where retries would have solved the problem. <https://patchwork.kernel.org/patch/9996149/>. I'm sure it will come up again...
,
Jun 14 2018
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by rspangler@chromium.org
, Jan 5 2017