BattOrs hang and reset as their SD cards fill up |
|||||
Issue descriptionBattOr firmware tracking bug: https://github.com/aschulm/battor/issues/53 The basic gist of this is, because the BattOr firmware currently writes to the SD card in bytes instead of blocks, the BattOr can experience intermittent 120ms hangs and eventually resets as the card gets full. This is starting to cause lots of benchmarks to fail. Ultimately, we'll need to release a new version of the BattOr firmware to fix this. As a short-term workaround, we can wipe the BattOr SD cards for bots experiencing this problem or replace them with new SD cards.
,
Oct 6 2016
,
Oct 6 2016
(Adding Performance-Sheriff-BotHealth to this because it's affecting the stability of BattOrs on the perf waterfall.) Just an update on this: Mellow has shipped us a version of the BattOr firmware that should fix this, but Randy and I found the BattOr behaved strangely (e.g. couldn't stop tracing) with this new firmware installed. I've been working with Chris Testa from Mellow in order to figure out what's going on, and he suggested that the crappy, off-brand micro SD card that I had in my BattOr may be the problem. I'm going to try to pick up a new one tonight or tomorrow to see if that's the problem. In an ideal world, as soon as I do that, things begin working again. Once we have a new working version of the firmware, I'm going to flash the 5 Retain Mac Bots in the lab with the new firmware and ensure that it fixes their problems. Once we have that confidence, Randy will head to the perf lab and manually flash the rest of the BattOrs on the waterfall, which should fix all flakiness.
,
Oct 6 2016
,
Oct 6 2016
Issue 649161 has been merged into this issue.
,
Oct 7 2016
Should we disable these tests on the failing bots on the perf waterfall until this is in place?
,
Oct 7 2016
A few of SDCards were ordered last time we replaced the malfunctioning one. We probably have some more around that we can use to replace the BattOrs sdcards to temporarily get this working.
,
Oct 7 2016
I ordered 10 32GB cards recently. If you don't have any on hand, we can help. Let us know.
,
Oct 10 2016
battor.power_cases and system_health.common_desktop start to fail on Mac Retina Perf (1) over the weekend: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Mac%20Retina%20Perf%20%281%29?numbuilds=200 battor.trivial_pages start to fail on Mac Retina Perf (2) over the weekend: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Mac%20Retina%20Perf%20%282%29?numbuilds=200
,
Oct 11 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d commit e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d Author: charliea <charliea@chromium.org> Date: Tue Oct 11 15:35:48 2016 [telemetry] Disable perf benchmarks that use BattOr tracing Unfortunately, with the current iteration of the BattOr firmware, BattOr benchmarks are failing unless the SD cards on the BattOrs are replaced on a daily basis. Mellow, the company that manufactures the BattOrs, is working hard on a new version of the firmware that's more stable, but in the meanwhile we're forced to disable the benchmarks on all waterfalls where there's an expectation that tests won't fail repeatedly. BUG= 652384 NOTRY=true CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:android_s5_perf_cq;master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq Review-Url: https://codereview.chromium.org/2404883003 Cr-Commit-Position: refs/heads/master@{#424433} [modify] https://crrev.com/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d/tools/perf/benchmarks/battor.py [modify] https://crrev.com/e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d/tools/perf/benchmarks/system_health.py
,
Oct 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/27499ad9143b1ccee534b1605a31f7aab11ba243 commit 27499ad9143b1ccee534b1605a31f7aab11ba243 Author: charliea <charliea@chromium.org> Date: Tue Oct 18 20:58:54 2016 Revert "[telemetry] Disable perf benchmarks that use BattOr tracing" http://crbug.com/652384 should be resolved now that we've flashed all BattOrs in the lab with new firmware. This reverts commit e8e1256d25b5f8c0bf706f2b21b9d86ac7e5356d. BUG= 652384 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq Review-Url: https://codereview.chromium.org/2427593003 Cr-Commit-Position: refs/heads/master@{#426042} [modify] https://crrev.com/27499ad9143b1ccee534b1605a31f7aab11ba243/tools/perf/benchmarks/battor.py [modify] https://crrev.com/27499ad9143b1ccee534b1605a31f7aab11ba243/tools/perf/benchmarks/system_health.py
,
Oct 18 2016
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by sheriffbot@chromium.org
, Oct 4 2016