BattOr agent failing due to "too many start bytes" in data frame, but doesn't retry |
||
Issue descriptionI stumbled across this doing local testing on my Macbook Pro, and I'll upload the serial log in a comment following this one. The really strange thing to me is that the agent doesn't seem to be retrying the download, which I'd expect it to. I know that the flush/retry logic is in this version of the binary because if I search for "(flush)" in the serial log, something comes up. This appears to be strictly a client-side bug, and I wonder if it's some sort of bug with the state machine that initiates the retry.
,
Oct 17 2017
,
Oct 17 2017
After adding a lot more logging and reproducing this locally, I believe that the issue is that there are instances where the timeout isn't canceled after a read fails and we retry the command. The rough order of events is: 1) We receive a bad data frame (or some bad message) and we call RetryCommand() (http://bit.ly/2xNjxxA). RetryCommand in turn posts a delayed task to restart the command in two seconds (http://bit.ly/2xO8hkc). 2) We immediately return from OnMessageRead() (http://bit.ly/2xO8hkc), which causes us to not hit the line of code after the switch that cancels the timeout function (http://bit.ly/2kWNgOk) that was posted just after the last read was initiated. 3) If the timeout function is set to run before the retrying command is set to run, OnActionTimeout() runs (http://bit.ly/2xOxOKo), resulting in the overall command failing with BATTOR_ERROR_TIMEOUT. The solution here should just be to cancel the timeout_callback_ as soon as any message is read. After all, the purpose of the timout callback is to prevent the situation where no response ever comes, and in this case a response has come, it's just not a good one. I have this fixed locally and am running a stress test to verify that it looks alright. Assuming that it does, I'll send out the CL to fix it.
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/da3f2b24e62846c25c5c68883e288b014bff1085 commit da3f2b24e62846c25c5c68883e288b014bff1085 Author: Charlie Andrews <charliea@google.com> Date: Wed Oct 18 16:37:56 2017 Add higher-level information to the serial log via BattOrAgent Bug: 735434 , 775520 Change-Id: Ibab9cd698d0ad74593c008956d0a90fab08fe596 Reviewed-on: https://chromium-review.googlesource.com/723595 Commit-Queue: Charlie Andrews <charliea@chromium.org> Reviewed-by: rnephew <rnephew@chromium.org> Cr-Commit-Position: refs/heads/master@{#509791} [modify] https://crrev.com/da3f2b24e62846c25c5c68883e288b014bff1085/tools/battor_agent/battor_agent.cc [modify] https://crrev.com/da3f2b24e62846c25c5c68883e288b014bff1085/tools/battor_agent/battor_agent.h [modify] https://crrev.com/da3f2b24e62846c25c5c68883e288b014bff1085/tools/battor_agent/battor_agent_unittest.cc [modify] https://crrev.com/da3f2b24e62846c25c5c68883e288b014bff1085/tools/battor_agent/battor_connection.h [modify] https://crrev.com/da3f2b24e62846c25c5c68883e288b014bff1085/tools/battor_agent/battor_connection_impl.h
,
Oct 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/67b7c1a18515f034ba788e055f8218c3676a8cf2 commit 67b7c1a18515f034ba788e055f8218c3676a8cf2 Author: Charlie Andrews <charliea@google.com> Date: Wed Oct 18 20:47:26 2017 Cancel the BattOr agent action timeout as soon as a message is read Before this change, the timeout callback was only cancelled if the message was successfuly read. This caused problems when the message *wasn't* successfully read: the action would be scheduled to be retried in two seconds, but before that two seconds elapsed, the timeout would trigger and we'd say that the action failed due to a timeout, without ever having retried it. Cancelling the timeout unconditionally should be the right thing to do here: after all, the timeout is intended to prevent nothing from happening for an extended period of time and, in the case that a bad message was read, something *did* happen, it just wasn't good. Bug: 775520 Change-Id: I9ac7c4cfb9f12e50af07a1f8011a2495447a2f40 Reviewed-on: https://chromium-review.googlesource.com/726304 Reviewed-by: rnephew <rnephew@chromium.org> Commit-Queue: Charlie Andrews <charliea@chromium.org> Cr-Commit-Position: refs/heads/master@{#509863} [modify] https://crrev.com/67b7c1a18515f034ba788e055f8218c3676a8cf2/tools/battor_agent/battor_agent.cc
,
Oct 20 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by charliea@google.com
, Oct 17 201724.2 MB
24.2 MB Download