New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 679375 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Bug



Sign in to add a comment

BattOr timing out while awaiting firmware version

Project Member Reported by charliea@chromium.org, Jan 9 2017

Issue description

OS: I've only seen this on the Mac Retina bot, but it may also be happening on Windows, too

Link to buildbot run: https://build.chromium.org/p/chromium.perf/builders/Mac%20Retina%20Perf/builds/173

Link to STDIO: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FMac_Retina_Perf%2F173%2F%2B%2Frecipes%2Fsteps%2Fbattor.power_cases_on_Intel_GPU_on_Mac_on_Mac-10.11%2F0%2Fstdout

Link to serial log: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/battor-serial-log-2017-01-08_11-37.txt-41332.txt

The serial log in its entirety:

"""
Opening serial connection.

Serial connection open finished with success: 1.

Bytes sent: 0x00 0x03 0x09 0x02 0x00 0x02 0x00 0x02 0x00 0x02 0x00 0x01.

Read requested.

Before doing a serial read, checking to see if we already have a complete message in the 'already read' buffer.

No complete message found in the 'already read' buffer.

Starting read of up to 191 bytes.
"""

The battor_agent failed in a timeout. This is weird: I've never seen the BattOr have a problem with receiving an initialization ack before. It's like we're sending the initialization message, but the BattOr just isn't responsive.

In the next test, battor.power_cases.reference (which ran immediately after this one), the BattOr was working again. This makes me wonder if we should be retrying when this happens.
 
Hmmm, now that I think about it more, we already *should* be retrying if the init fails:

void BattOrAgent::OnMessageRead(bool success,
                                BattOrMessageType type,
                                std::unique_ptr<vector<char>> bytes) {
  ...
  if (!success) {
    switch (last_action_) {
      ...
      case Action::READ_INIT_ACK:
        if (num_init_attempts_++ < kMaxInitAttempts) {
          PerformDelayedAction(Action::SEND_INIT,
              base::TimeDelta::FromMilliseconds(kInitRetryDelayMilliseconds));
        } else {
          CompleteCommand(BATTOR_ERROR_TOO_MANY_INIT_RETRIES);
        }

        return;

I wonder if I somehow broke this logic with my other changes to the retry logic?
Cc: rnep...@chromium.org
Owner: charliea@chromium.org
Status: Assigned (was: Untriaged)
Labels: OS-All
Summary: BattOr timing out while awaiting initialization ack (was: BattOr timing out while awaiting initialization ack (on Mac?))
Other supporting evidence that I screwed something up: any failure at this step should result in a TOO_MANY_INIT_ATTEMPTS failure, not a TIMEOUT failure.

Removing any mention of Mac, because I believe all OSes are susceptible to this failure.
Summary: BattOr timing out while awaiting firmware version (was: BattOr timing out while awaiting initialization ack)
I made a mistake early on in this bug: the problem isn't a timeout while awaiting the init ack but a timeout while awaiting the *firmware version ack*. This makes a lot more sense: we currently don't have any retry logic in place for this request. Presumably, requesting the firmware version should have all of the retry logic associated with requesting an initialization of the BattOr.
Owner: rnep...@chromium.org
Cc: charliea@google.com
Cc: -charliea@google.com
Project Member

Comment 8 by bugdroid1@chromium.org, Jan 12 2017

Project Member

Comment 9 by bugdroid1@chromium.org, Jan 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/17b4caf4626ef05da5b86ba8482711ee7cc1178e

commit 17b4caf4626ef05da5b86ba8482711ee7cc1178e
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Fri Jan 13 06:21:26 2017

Roll src/third_party/catapult/ fe8a3c8a5..4bd198f6a (2 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/fe8a3c8a5cdb..4bd198f6a85c

$ git log fe8a3c8a5..4bd198f6a --date=short --no-merges --format='%ad %ae %s'
2017-01-12 bpastene devil: Push devil onto the path when running reset_usb.py manually.
2017-01-12 rnephew [BattOr] Update BattOr Agent binary in cloud storage.

BUG= 679375 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2631643002
Cr-Commit-Position: refs/heads/master@{#443508}

[modify] https://crrev.com/17b4caf4626ef05da5b86ba8482711ee7cc1178e/DEPS

Status: Fixed (was: Assigned)

Sign in to add a comment