New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 652384 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocked on:
issue 652379



Sign in to add a comment

BattOrs hang and reset as their SD cards fill up

Project Member Reported by charliea@google.com, Oct 3 2016

Issue description

BattOr firmware tracking bug: https://github.com/aschulm/battor/issues/53

The basic gist of this is, because the BattOr firmware currently writes to the SD card in bytes instead of blocks, the BattOr can experience intermittent 120ms hangs and eventually resets as the card gets full. This is starting to cause lots of benchmarks to fail.

Ultimately, we'll need to release a new version of the BattOr firmware to fix this. As a short-term workaround, we can wipe the BattOr SD cards for bots experiencing this problem or replace them with new SD cards.
 
Project Member

Comment 1 by sheriffbot@chromium.org, Oct 4 2016

Labels: Hotlist-Google
Owner: charliea@chromium.org
Cc: alexandermont@chromium.org rnep...@chromium.org
Labels: Performance-Sheriff-BotHealth
(Adding Performance-Sheriff-BotHealth to this because it's affecting the stability of BattOrs on the perf waterfall.)

Just an update on this: Mellow has shipped us a version of the BattOr firmware that should fix this, but Randy and I found the BattOr behaved strangely (e.g. couldn't stop tracing) with this new firmware installed. I've been working with Chris Testa from Mellow in order to figure out what's going on, and he suggested that the crappy, off-brand micro SD card that I had in my BattOr may be the problem. I'm going to try to pick up a new one tonight or tomorrow to see if that's the problem. In an ideal world, as soon as I do that, things begin working again.

Once we have a new working version of the firmware, I'm going to flash the 5 Retain Mac Bots in the lab with the new firmware and ensure that it fixes their problems. Once we have that confidence, Randy will head to the perf lab and manually flash the rest of the BattOrs on the waterfall, which should fix all flakiness.
Cc: charliea@chromium.org
 Issue 652306  has been merged into this issue.
Issue 649161 has been merged into this issue.
Should we disable these tests on the failing bots on the perf waterfall until this is in place?
Cc: vhang@chromium.org
A few of SDCards were ordered last time we replaced the malfunctioning one. We probably have some more around that we can use to replace the BattOrs sdcards to temporarily get this working.

Comment 8 by vhang@chromium.org, Oct 7 2016

I ordered 10 32GB cards recently.  If you don't have any on hand, we can help.  Let us know.

Comment 9 by zh...@chromium.org, Oct 10 2016

battor.power_cases and system_health.common_desktop start to fail on Mac Retina Perf (1) over the weekend:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Mac%20Retina%20Perf%20%281%29?numbuilds=200

battor.trivial_pages start to fail on Mac Retina Perf (2) over the weekend:
https://uberchromegw.corp.google.com/i/chromium.perf/builders/Mac%20Retina%20Perf%20%282%29?numbuilds=200
Project Member

Comment 10 by bugdroid1@chromium.org, Oct 11 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d

commit e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d
Author: charliea <charliea@chromium.org>
Date: Tue Oct 11 15:35:48 2016

[telemetry] Disable perf benchmarks that use BattOr tracing

Unfortunately, with the current iteration of the BattOr firmware,
BattOr benchmarks are failing unless the SD cards on the BattOrs are
replaced on a daily basis. Mellow, the company that manufactures the
BattOrs, is working hard on a new version of the firmware that's more
stable, but in the meanwhile we're forced to disable the benchmarks on
all waterfalls where there's an expectation that tests won't fail
repeatedly.

BUG= 652384 
NOTRY=true
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:android_s5_perf_cq;master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq

Review-Url: https://codereview.chromium.org/2404883003
Cr-Commit-Position: refs/heads/master@{#424433}

[modify] https://crrev.com/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d/tools/perf/benchmarks/battor.py
[modify] https://crrev.com/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d/tools/perf/benchmarks/system_health.py

Project Member

Comment 11 by bugdroid1@chromium.org, Oct 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/27499ad9143b1ccee534b1605a31f7aab11ba243

commit 27499ad9143b1ccee534b1605a31f7aab11ba243
Author: charliea <charliea@chromium.org>
Date: Tue Oct 18 20:58:54 2016

Revert "[telemetry] Disable perf benchmarks that use BattOr tracing"

 http://crbug.com/652384  should be resolved now that we've flashed
all BattOrs in the lab with new firmware.

This reverts commit e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d.

BUG= 652384 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq

Review-Url: https://codereview.chromium.org/2427593003
Cr-Commit-Position: refs/heads/master@{#426042}

[modify] https://crrev.com/27499ad9143b1ccee534b1605a31f7aab11ba243/tools/perf/benchmarks/battor.py
[modify] https://crrev.com/27499ad9143b1ccee534b1605a31f7aab11ba243/tools/perf/benchmarks/system_health.py

Status: Fixed (was: Assigned)

Sign in to add a comment