BattOr agent (Mac) not completely flushing stream before starting initialization sequence |
|||
Issue descriptionI'm seeing this failure on mac-power-low-end on the FYI waterfall. Here's a link to the full serial log for the failure, which is manifesting as a "Broken pipe" error message in Telemetry. Here's the problematic part of the serial log: """ Opening serial connection. Serial connection open finished with success: 1. Bytes sent: 0x00 0x03 0x09 0x02 0x00 0x02 0x00 0x02 0x00 0x02 0x00 0x01. Read requested. Before doing a serial read, checking to see if we already have a complete message in the 'already read' buffer. No complete message found in the 'already read' buffer. Starting read of up to 191 bytes. 191 more bytes read: 0x06 0xda 0x02 0x00 0x0b 0x06 0xe5 0x02 0x00 0x0b 0x06 0xf6 0x02 0x00 0x0e 0x06 0xef 0x02 0x00 0x0d 0x06 0xe5 0x02 0x00 0x0e 0x06 0xe3 0x02 0x00 0x0e 0x06 0xe0 0x02 0x00 0x0e 0x06 0xde 0x02 0x00 0x0e 0x06 0xd8 0x02 0x00 0x0c 0x06 0xd9 0x02 0x00 0x0c 0x06 0xdb 0x02 0x00 0x10 0x06 0xdb 0x02 0x00 0x0f 0x06 0xdd 0x02 0x00 0x0f 0x06 0xde 0x02 0x00 0x0e 0x06 0xdd 0x02 0x00 0x0f 0x06 0xdd 0x02 0x00 0x0d 0x06 0xdf 0x02 0x00 0x0e 0x06 0xe0 0x02 0x00 0x0e 0x06 0xe0 0x02 0x00 0x0d 0x06 0xe0 0x02 0x00 0x0f 0x06 0xe4 0x02 0x00 0x0f 0x06 0xde 0x02 0x00 0x11 0x06 0xe2 0x02 0x00 0x0f 0x06 0xdf 0x02 0x00 0x0c 0x06 0xed 0x02 0x00 0x0d 0x06 0xe1 0x02 0x00 0x10 0x06 0xdf 0x02 0x00 0x0f 0x06 0xdf 0x02 0x00 0x0b 0x06 0xed 0x02 0x00 0x08 0x06 0x19 0x02 0x01 0x08 0x06 0x28 0x02 0x01 0x07 0x06 0x28 0x02 0x01 0x0a 0x06 0x0a 0x02 0x01 0x10 0x06 0xf5 0x02 0x00 0x0d 0x06 0xe1 0x02 0x00 0x0e 0x06 0xdc 0x02 0x00 0x0d 0x06 0xd4 0x02 0x00 0x13 0x06 0xbc 0x02 0x00 0x12 0x06. Read failed due to the message containing an irrecoverable error: 2. Read finished with success: 0. """ Translated to English, what's happening is: 1) The BattOr agent is sending the initialization message to the BattOr 2) The BattOr agent tries to read the initialization message ack from the serial connection. 3) There are bytes available to be read on the serial connection. However, instead of reading the initialization message ack, we get the 191 bytes listed above. These bytes seem to have a period of approximately 5 bytes, with each quintuplet looking something like "0x02 0x00 0x0b 0x06 0xf6". (The 0x02 0x00 starts every quintuplet.) Given that 0x02 is an escape sequence to indicate that the 0x00 is a real data byte rather than a start byte for the message, that means that the real message data can be interpreted as a four byte quartet of something like 0x00 0x0b 0x06 0xf6. This looks to me suspiciously like a power data frame (see: https://docs.google.com/document/d/1tmsSZNzioRf0cX8-T5czO-XDs4YLA7Mlw06c14pizKg/edit#), where the first two bytes are the voltage and the second two bytes are the current. I think the most likely culprit for this failure is that somehow the BattOr agent died while streaming back samples, leaving lots of samples on the wire. After this happened, it's failing to fully flush the serial connection before starting the next power recording.
,
Jan 5 2017
Here's an example run where this happened: https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf/builds/133 I've also attached the serial log from this run. In this particular case, the problem manifested as a failure to retrieve the git hash because that happens before actually starting tracing.
,
Jan 5 2017
Assigning this to rnephew@, who's agreed to take the lead on this effort. We have a handoff meeting later today.
,
Jan 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4a24995c28476982924693f7ea299d30e0103379 commit 4a24995c28476982924693f7ea299d30e0103379 Author: rnephew <rnephew@chromium.org> Date: Fri Jan 06 02:03:47 2017 [BattOr] Add flushing before requesting firmware git hash. BUG= 677303 Review-Url: https://codereview.chromium.org/2612333003 Cr-Commit-Position: refs/heads/master@{#441817} [modify] https://crrev.com/4a24995c28476982924693f7ea299d30e0103379/tools/battor_agent/battor_agent.cc
,
Jan 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/27140585843585d65c219884ff31719fbb61dbaf commit 27140585843585d65c219884ff31719fbb61dbaf Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Fri Jan 06 21:30:22 2017 Roll src/third_party/catapult/ 24315c519..f84aaa04d (4 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/24315c519ee9..f84aaa04d4d0 $ git log 24315c519..f84aaa04d --date=short --no-merges --format='%ad %ae %s' 2017-01-06 rnephew [BattOr] Update Win, Mac and Linux battor agent binaries in deps manager. 2017-01-06 simonhatch Dashboard - Fix some internal bisects not starting automatically. 2017-01-06 eakuefner [StyleGuide] Add specific delineation for JavaScript 2017-01-06 sullivan Do not create data stoppage alerts for ref builds. BUG= 677303 , 678659 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2617223002 Cr-Commit-Position: refs/heads/master@{#442057} [modify] https://crrev.com/27140585843585d65c219884ff31719fbb61dbaf/DEPS
,
Jan 9 2017
,
Jan 9 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by charliea@chromium.org
, Jan 3 2017