sweetberry: failure on multiple successive runs |
|
Issue description
When I try to initiate a powerlog class, and then call start() twice on the same instance, the second run fails.
Namely, I get this
Spower.wr_command(write_list=[[3, 0, 160, 134, 1, 0]] (6), read_count=5)
RET: 6
BYTES: [array('B', [3])]
STATUS: 0x03
Command START: FAIL
it seems that calling stop on the instances works:
Spower.wr_command(write_list=[[1, 0]] (2), read_count=1)
RET: 2
BYTES: [array('B', [0])]
STATUS: 0x00
Command STOP: success
I'm using the firmware packaged on the care and feeding doc.
I will dig into this more later when I have time, but maybe Nick you know what's going on here?
,
Apr 18 2018
I just realized that I never updated fw on my sweetberry. Make sure that you are using both new powerlog and new fw. Or both old, actually - I have a fix for new powerlog which is still in review.
,
Apr 18 2018
would it be feasible / valuable to add version check between powerlog.py & sweetberry FW to expose these incompatibilities?
,
Apr 19 2018
I'm not sure if it worked before, but what's the background for disallowing restarting? I think it's pretty valuable
,
Apr 25 2018
I have been hitting an error that looks like this:
19:24:02 INFO | autoserv| MQG time: 2018-04-24 19:24:02.137011
19:24:03 INFO | autoserv| MQG time: 2018-04-24 19:24:03.191180
19:24:04 INFO | autoserv| MQG time: 2018-04-24 19:24:04.243686
19:24:04 INFO | autoserv| Exception in thread Sweetberry:
19:24:04 INFO | autoserv| Traceback (most recent call last):
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
19:24:04 INFO | autoserv| self.run()
19:24:04 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 106, in run
19:24:04 INFO | autoserv| powerlog.main(self._argv)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
19:24:04 INFO | autoserv| print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
19:24:04 INFO | autoserv| self._pwr[key].reset()
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
19:24:04 INFO | autoserv| raise Exception("Power", "Failed to reset")
19:24:04 INFO | autoserv| Exception: ('Power', 'Failed to reset')
This happens when I continue calling powerlog. @Ruben, is this what you see too?
,
Apr 25 2018
How to reproduce: how a script that keeps calling powerlog (power_MeasurementWrapper test), just run and wait
What happens:
after a while, there will be a read/write error fron powerlog
19:46:38 INFO | autoserv| Traceback (most recent call last):
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
19:46:38 INFO | autoserv| self.run()
19:46:38 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 108, in run
19:46:38 INFO | autoserv| powerlog.main(self._argv)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 884, in main
19:46:38 INFO | autoserv| powerlogger.start(integration_us_request, seconds, sync_speed=sync_speed)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 768, in start
19:46:38 INFO | autoserv| self._pwr[key].stop()
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 354, in stop
19:46:38 INFO | autoserv| ret = self.wr_command(cmd)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 295, in wr_command
19:46:38 INFO | autoserv| ret = self._write_ep.write(cmd, wtimeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/core.py", line 287, in write
19:46:38 INFO | autoserv| return self.device.write(self.bEndpointAddress, data, self.interface, timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/core.py", line 633, in write
19:46:38 INFO | autoserv| self.__get_timeout(timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 771, in bulk_write
19:46:38 INFO | autoserv| timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 872, in __write
19:46:38 INFO | autoserv| _check(retval)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 571, in _check
19:46:38 INFO | autoserv| raise USBError(_str_error[ret], ret, _libusb_errno[ret])
19:46:38 INFO | autoserv| USBError: [Errno 32] Pipe error
What happens next:
then sweetberry will get stuck in a "failed to reset" state, and cannot function no matter what
19:24:04 INFO | autoserv| Exception in thread Sweetberry:
19:24:04 INFO | autoserv| Traceback (most recent call last):
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
19:24:04 INFO | autoserv| self.run()
19:24:04 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 106, in run
19:24:04 INFO | autoserv| powerlog.main(self._argv)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
19:24:04 INFO | autoserv| print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
19:24:04 INFO | autoserv| self._pwr[key].reset()
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
19:24:04 INFO | autoserv| raise Exception("Power", "Failed to reset")
19:24:04 INFO | autoserv| Exception: ('Power', 'Failed to reset')
Work around:
unplug and then plug sweetberry
Frequency:
This has never happened before I updated the fw on sweetberry, and happens 90% of the time after fw is updated on sweetberry.
Nick, can you look into this?
,
Apr 25 2018
No, I think this is more of a problem then. In my case, I ran into this issue when I did multiple runs with the same powerlog instance. But in your situation, you're invoking new instances of powerlog. So maybe the chip firmware is getting into a bad state.
,
Apr 26 2018
Hi - After trying out another sweetberry, I realized that the bug described in #6 was specific to that one sweetberry, so it's likely a hardware issue. I have to note though that this is the 3rd sweetberry that I had which was not fully functional.
,
Apr 26 2018
The issue is TOT sweetberry sets this: https://cs.corp.google.com/chromeos_public/src/platform/ec/chip/stm32/usb_power.c?l=131 Yes, the reset command can't explicitly fail, so it seems like what's occurring is that sweetberry is getting into a bad state and not even accepting the command.
,
Apr 30 2018
I got this too.
Easy command to reproduce this
for i in {1..9999}; do powerlog -b atlas_rev0.board -c atlas_rev0.scenario -s 1 --mW --print_stats --no_print_raw_data; done
Traceback (most recent call last):
File "/usr/lib/python-exec/python2.7/powerlog", line 9, in <module>
load_entry_point('powerlog==1.0', 'console_scripts', 'powerlog')()
File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
self._pwr[key].reset()
File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
raise Exception("Power", "Failed to reset")
Exception: ('Power', 'Failed to reset')
,
May 2 2018
Found mitigation for this with https://crrev.com/c/1038788
,
May 4 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/ec/+/0a8afe10c2ba566f35a9360aeb3b302669126ad5 commit 0a8afe10c2ba566f35a9360aeb3b302669126ad5 Author: Puthikorn Voravootivat <puthik@chromium.org> Date: Fri May 04 04:27:54 2018 sweetberry: Make resetting USB interface more forgiving Currently, sweetberry ocassionally throws Exception when resetting the USB interface. This CL mitigates that by - Use linearly back off algorithm with 10ms delay increment before next reset attempt to avoid flooding the sweetberry hardware with reset requests. - Increase retry amount from 10 to 100 BUG=chromium:834252 TEST=No "Exception: ('Power', 'Failed to reset')" seen Change-Id: Iaf039cb82760205d1747fd630387852b7cfd8f83 Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1038788 [modify] https://crrev.com/0a8afe10c2ba566f35a9360aeb3b302669126ad5/extra/usb_power/powerlog.py |
|
►
Sign in to add a comment |
|
Comment 1 by nsanders@chromium.org
, Apr 18 2018