New issue
Advanced search Search tips

Issue 834252 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

sweetberry: failure on multiple successive runs

Project Member Reported by coconutruben@chromium.org, Apr 18 2018

Issue description

When I try to initiate a powerlog class, and then call start() twice on the same instance, the second run fails.
Namely, I get this

Spower.wr_command(write_list=[[3, 0, 160, 134, 1, 0]] (6), read_count=5)
RET: 6
BYTES: [array('B', [3])]
STATUS: 0x03
Command START: FAIL

it seems that calling stop on the instances works:
Spower.wr_command(write_list=[[1, 0]] (2), read_count=1)
RET: 2
BYTES: [array('B', [0])]
STATUS: 0x00
Command STOP: success

I'm using the firmware packaged on the care and feeding doc.
I will dig into this more later when I have time, but maybe Nick you know what's going on here?

 
I think the lastest sweetberry firmware explicitly disallows restarts. Did this work before?

Comment 2 by mqg@chromium.org, Apr 18 2018

I just realized that I never updated fw on my sweetberry. 

Make sure that you are using both new powerlog and new fw.

Or both old, actually - I have a fix for new powerlog which is still in review. 

Comment 3 by tbroch@chromium.org, Apr 18 2018

would it be feasible / valuable to add version check between powerlog.py & sweetberry FW to expose these incompatibilities?
I'm not sure if it worked before, but what's the background for disallowing restarting? I think it's pretty valuable

Comment 5 by mqg@chromium.org, Apr 25 2018

I have been hitting an error that looks like this:

19:24:02 INFO | autoserv| MQG time: 2018-04-24 19:24:02.137011                                                                                          
19:24:03 INFO | autoserv| MQG time: 2018-04-24 19:24:03.191180                                                                                     
19:24:04 INFO | autoserv| MQG time: 2018-04-24 19:24:04.243686                                                                                                           
19:24:04 INFO | autoserv| Exception in thread Sweetberry:                                                                                              
19:24:04 INFO | autoserv| Traceback (most recent call last):                                                                                             
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner                                                     
19:24:04 INFO | autoserv| self.run()                                                                                                                                     
19:24:04 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 106, in run                    
19:24:04 INFO | autoserv| powerlog.main(self._argv)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
19:24:04 INFO | autoserv| print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
19:24:04 INFO | autoserv| self._pwr[key].reset()
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
19:24:04 INFO | autoserv| raise Exception("Power", "Failed to reset")
19:24:04 INFO | autoserv| Exception: ('Power', 'Failed to reset')


This happens when I continue calling powerlog. @Ruben, is this what you see too?

Comment 6 by mqg@chromium.org, Apr 25 2018

How to reproduce: how a script that keeps calling powerlog (power_MeasurementWrapper test), just run and wait

What happens:
after a while, there will be a read/write error fron powerlog

19:46:38 INFO | autoserv| Traceback (most recent call last):
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
19:46:38 INFO | autoserv| self.run()
19:46:38 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 108, in run
19:46:38 INFO | autoserv| powerlog.main(self._argv)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 884, in main
19:46:38 INFO | autoserv| powerlogger.start(integration_us_request, seconds, sync_speed=sync_speed)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 768, in start
19:46:38 INFO | autoserv| self._pwr[key].stop()
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 354, in stop
19:46:38 INFO | autoserv| ret = self.wr_command(cmd)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 295, in wr_command
19:46:38 INFO | autoserv| ret = self._write_ep.write(cmd, wtimeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/core.py", line 287, in write
19:46:38 INFO | autoserv| return self.device.write(self.bEndpointAddress, data, self.interface, timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/core.py", line 633, in write
19:46:38 INFO | autoserv| self.__get_timeout(timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 771, in bulk_write
19:46:38 INFO | autoserv| timeout)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 872, in __write
19:46:38 INFO | autoserv| _check(retval)
19:46:38 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/usb/backend/libusb1.py", line 571, in _check
19:46:38 INFO | autoserv| raise USBError(_str_error[ret], ret, _libusb_errno[ret])
19:46:38 INFO | autoserv| USBError: [Errno 32] Pipe error

What happens next:
then sweetberry will get stuck in a "failed to reset" state, and cannot function no matter what

19:24:04 INFO | autoserv| Exception in thread Sweetberry:                                                                                              
19:24:04 INFO | autoserv| Traceback (most recent call last):                                                                                             
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner                                                     
19:24:04 INFO | autoserv| self.run()                                                                                                                                     
19:24:04 INFO | autoserv| File "/build/nami/usr/local/build/autotest/server/cros/power/power_telemetry_logger.py", line 106, in run                    
19:24:04 INFO | autoserv| powerlog.main(self._argv)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
19:24:04 INFO | autoserv| print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
19:24:04 INFO | autoserv| self._pwr[key].reset()
19:24:04 INFO | autoserv| File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
19:24:04 INFO | autoserv| raise Exception("Power", "Failed to reset")
19:24:04 INFO | autoserv| Exception: ('Power', 'Failed to reset')

Work around:
unplug and then plug sweetberry

Frequency:
This has never happened before I updated the fw on sweetberry, and happens 90% of the time after fw is updated on sweetberry. 

Nick, can you look into this?
No, I think this is more of a problem then. In my case, I ran into this issue when I did multiple runs with the same powerlog instance. But in your situation, you're invoking new instances of powerlog. So maybe the chip firmware is getting into a bad state.

Comment 8 by mqg@chromium.org, Apr 26 2018

Hi - 

After trying out another sweetberry, I realized that the bug described in #6 was specific to that one sweetberry, so it's likely a hardware issue. I have to note though that this is the 3rd sweetberry that I had which was not fully functional. 
The issue is TOT sweetberry sets this:
https://cs.corp.google.com/chromeos_public/src/platform/ec/chip/stm32/usb_power.c?l=131

Yes, the reset command can't explicitly fail, so it seems like what's occurring is that sweetberry is getting into a bad state and not even accepting the command.
I got this too.

Easy command to reproduce this
for i in {1..9999}; do  powerlog -b atlas_rev0.board -c atlas_rev0.scenario -s 1 --mW --print_stats --no_print_raw_data; done

Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/powerlog", line 9, in <module>
    load_entry_point('powerlog==1.0', 'console_scripts', 'powerlog')()
  File "/usr/lib64/python2.7/site-packages/powerlog.py", line 881, in main
    print_raw_data=print_raw_data,raw_data_dir=raw_data_dir)
  File "/usr/lib64/python2.7/site-packages/powerlog.py", line 615, in __init__
    self._pwr[key].reset()
  File "/usr/lib64/python2.7/site-packages/powerlog.py", line 349, in reset
    raise Exception("Power", "Failed to reset")
Exception: ('Power', 'Failed to reset')

Found mitigation for this with https://crrev.com/c/1038788
Project Member

Comment 12 by bugdroid1@chromium.org, May 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/ec/+/0a8afe10c2ba566f35a9360aeb3b302669126ad5

commit 0a8afe10c2ba566f35a9360aeb3b302669126ad5
Author: Puthikorn Voravootivat <puthik@chromium.org>
Date: Fri May 04 04:27:54 2018

sweetberry: Make resetting USB interface more forgiving

Currently, sweetberry ocassionally throws Exception when
resetting the USB interface. This CL mitigates that by
- Use linearly back off algorithm with 10ms delay increment
  before next reset attempt to avoid flooding the sweetberry
  hardware with reset requests.
- Increase retry amount from 10 to 100

BUG=chromium:834252
TEST=No "Exception: ('Power', 'Failed to reset')" seen

Change-Id: Iaf039cb82760205d1747fd630387852b7cfd8f83
Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1038788

[modify] https://crrev.com/0a8afe10c2ba566f35a9360aeb3b302669126ad5/extra/usb_power/powerlog.py

Sign in to add a comment