[chameleon_audio] Auron_paine at DUT android1758-audiobox4-host1 is failing Basic USB audio tests with Unhandled Exception: RPC error: usb.plug. Needs restart to pass tests. Starts failing again. |
||||||
Issue descriptionDashboard view: https://wmatrix.googleplex.com/platform/unfiltered?suites=chameleon_audio_perbuild&tests=audio_AudioBasicUSB%2A&days_back=7&platforms=daisy Screenshot: https://screenshot.googleplex.com/GZWRfWeMwcq Green results are all from chromeos1-row1-rack4-host3 The red results are all from chromeos1-row5-rack5-host2 and failure is Unhandled Exception: RPC error: usb.plug Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 810, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled Exception: RPC error: usb.plug Traceback (most recent call last): File "./multimedia_xmlrpc_server.py", line 69, in _dispatch return func(*params) File "/usr/local/autotest/cros/multimedia/usb_facade_native.py", line 86, in plug self._wait_for_nodes_changed() File "/usr/local/autotest/cros/multimedia/usb_facade_native.py", line 119, in _wait_for_nodes_changed timeout=self._TIMEOUT_CRAS_NODES_CHANGE_SECS) File "/usr/local/autotest/bin/site_utils.py", line 244, in poll_for_condition raise TimeoutError, desc TimeoutError: Timed out waiting for condition: Find USB node After I restart chameleon and rerun the tests, they pass, e.g. https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=62800183 https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=62800187 This reminds me of issue 606573. I'll replace chameleon board with a new one and see how test outcome will change.
,
May 12 2016
Thee "Unhandled Exception: RPC error: usb.plug" failure reason is observed on chell, minnie-cheets, auron_paine, daisy
,
May 13 2016
,
May 13 2016
Hi Kalin, I just ran usb playback test on chromeos1-row5-rack5-host2 and it passed. Did you reboot chameleon after the latest failure ? Thanks!
,
May 13 2016
Yes, I rebooted it, b/c needed to run tests against M50 release candidate build. Tests passed. I guess chameleon was still "fresh" for your test run.
,
May 13 2016
I see. Maybe we can left one chameleon in bad state for debugging when it happens again. Thanks!
,
May 16 2016
I believe both failures below currently showing on chromeos1-row5-rack5-host2 are caused by the chameleon USB connection - Unhandled Exception: RPC error: usb.plug - client failed to resume from sleep after 60 seconds https://wmatrix.googleplex.com/unfiltered?suites=chameleon_audio_perbuild,chameleon_hdmi_perbuild&platforms=daisy I am not going to restart chameleon. Feel free to debug. Thanks.
,
May 18 2016
Thank you Kalin. I will check.
,
May 18 2016
I tested by calling Plug(8) and Unplug(8) on chameleon server.
DUT could not detect any USB event at all.
Then, I rebooted DUT.
It then became normal and USB playback and record tests can pass.
This suggest that chameleon is not in a bad state.
Rather, DUT is in a bad state.
This is contradictory with the previous result.
As for chell at chromeos1-row5-rack7-host1, reboot DUT does not help.
Its lsusb -t output:
localhost ~ # lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 3: Dev 2, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 3: Dev 2, If 1, Class=Wireless, Driver=btusb, 12M
|__ Port 5: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 3: Dev 5, If 0, Class=Vendor Specific Class, Driver=r8152, 480M
|__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/5p, 480M
|__ Port 1: Dev 7, If 0, Class=Vendor Specific Class, Driver=smsc95xx, 480M
|__ Port 7: Dev 4, If 0, Class=Video, Driver=uvcvideo, 480M
|__ Port 7: Dev 4, If 1, Class=Video, Driver=uvcvideo, 480M
I am wondering, there are two hubs. Are they both needed ?
I tested locally on my chell with only one hub, and USB can be detected.
localhost ~ # lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
|__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 3: Dev 2, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 3: Dev 2, If 1, Class=Wireless, Driver=btusb, 12M
|__ Port 5: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 3: Dev 5, If 0, Class=Vendor Specific Class, Driver=asix, 480M
|__ Port 4: Dev 7, If 0, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 4: Dev 7, If 1, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 4: Dev 7, If 2, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 7: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
|__ Port 7: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
I also tried to reboot chameleon of chromeos1-row5-rack7-host1.
After I rebooted chameleon, I can no longer ping/ssh it.
I guess there are some other serious issue on that chameleon.
,
May 18 2016
Hi Kalin, after several minutes I still can not access chromeos1-row5-rack7-host1-chameleon.cros, could you please check what happened to that chameleon ? Thanks!
,
May 18 2016
Hi Jimmy, 1) regarding chromeos1-row5-rack7-host1 - I found out the chameleon USB cable was unplugged. It may be forgotten when looking at resetting the DUT a week ago. Anyway, chameleon does not connect for me either. When I take it to my desk - it connects OK, when in box with all cables connected - it does not. It might be the issue where the u-boot does not proceed to boot - I have seen this before on some boards but after few restart it gets OK... I'll continue to investigate... 2)Regarding daisy - the current failures show the issue as observed originally http://cautotest/afe/#tab_id=view_job&object_id=63724198 - Unhandled Exception: RPC error: usb.plug http://cautotest/afe/#tab_id=view_job&object_id=63725736 - TEST CASE: PLUG > SUSPEND > PLUG > PLUG > RESUME - client failed to resume from sleep after 60 seconds I think this daisy chameleon (or DUT) at chromeos1-row5-rack5-host2 is the the bad state we want to investigate
,
May 19 2016
Hi Kalin, thanks for checking chell device. As for daisy, now the test passed after rebooting daisy (without rebooting chameleon). So it is hard to tell whether this issue on Chameleon or DUT. The current plan I have in mind is 1. Wait for it to happen again. 2. Try reboot DUT when it happens and see if test pass. 3. If 2 pass, add a reboot before and after USB test as a workaround to bring back daisy in a good state. Thanks!
,
May 19 2016
Thanks Jimmy, I'll follow these steps when it happens again. I'll change the test.
,
May 20 2016
Since we moved daisy to the new audio box in b1758 all seems good - https://screenshot.googleplex.com/fACBOrGdC5r Will wait for the issue to come back.
,
May 25 2016
Well, the issue continues. I filed issue 614390 to reflect this failure happening again at android 1758- location for daisy.
,
Jun 7 2016
,
Jun 9 2016
Facing this issue on Paine: android1758-audiobox4-host1 job page: https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=66144911 Test failed after rebooting the device: https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=66149244 Again test failed after rebooting chameleon: https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=66151244 I'll re-plug the chameleon connection.
,
Jun 10 2016
I went to the lab and with device/chameleon setup present, I re-tested numerous times. I unplugged and plugged USB cable from chameleon, and from DUT too. Restarted chameleon too. I tried connecting the three USB cables - from Eth switch, servo, and chameleon to four different USB hubs.(Only the right side USB port is available, as the left is quite close to the audio jack port) The observations show that - At some times USB node is present, somewhere between widgets binding and USB plug action, but mostly not. - At or after the USB plug action, the DUT loses it's connection to ethernet for4-5 seconds(once or twice in a row), and if USB node was present, then it disappears. - I could get the USB node to appear back if I disconnect and connect the usb hub port with the on/of port switch (the rest three hubs did not have such switch) at some specific moment after the USB plug action, and test PASSES. So, somehow chameleon USB connection messes up the connectivity no matter what USB hub is used. - If I plug the chameleon board to left USB port, and leave the Eth and servo on the right one through the hub, I have better results - the Eth connection does not interrupt, and I could get the test to PASS. I still have to do few tests, but it left port is so close, that we'll need some USB 90 degrees extension that stays close to the DUT - something like the following URL, but shorter: https://www.amazon.com/QIBOX-Female-Adapter-Extension-Degree/dp/B00FQ7IJT6 I am suspecting that whatever is causing the Eth connection disruption, is also interfering in other devices for suspend/resume/timeouts etc.
,
Jun 13 2016
Hi Kalin, the ethernet disconnection is expected since we unbind/bind driver for USB controller to re-enumerate USB devices. This action is not needed if we use kernel 4.2. I would like to try kernel 4.2 on android1758-audiobox4-host1 to see if it can solve the detection problem. But I am not sure if USB is still connected. If kernel 4.2 can not solve the detection problem, we have to resort to 90 degrees extension. Thanks!
,
Jun 13 2016
Yes, the USB is still connected as usual - on the DUT's right side USB port through USB hub for all three - Eth, servo, and chameleon.
,
Jun 13 2016
As addition to observations from c#18, it appeared the 'RPC error: usb.plug' came back on the left USB port so I really could not get to stable passing state. So, I hope kernel 4.2 do really better and gets us on a right track here. Thank you for making this progress.
,
Jun 13 2016
Thank you Kalin. I installed kernel 4.2 on android1758-audiobox4-host1. Let's see how it works.
,
Jun 13 2016
I found that this issue happens 80% of the time on android1758-audiobox4-host1, even after I install kernel 4.2 on chameleon. However, I can not reproduce it using an auron_yuna here locally. I am using a Transcend TS-HUB3K USB hub connecting to Cros USB port on the right. I found that, after USB gadget driver is removed, auron_yuna can detect the removal correctly. I used latest R53-8447.0.0 and R52-8350.26.0 to test. In the contrary, R52-8350.26.0 on android1758-audiobox4-host1 tried to enumerate a low speed USB device on usb 1-2.4 in a infinite loop after USB gadget driver is removed on Chameleon, and failed to do so (error messages attached, starting from line 17723) This is a strange behavior. I am wondering if there is something wrong with the setup in android1758-audiobox4-host1 such that Cros device still thinks there is a USB device to be enumerated even after USB gadget driver is removed on Chameleon side. Hi Kalin, could you please try to reproduce the issue with the setting as I did locally like the picture attached? We can remove servo first to reduce the complexity. Thanks!
,
Jun 13 2016
Hi Bernie, could you please loop in the contact person for auron-soc to check the error messages in #23 ? I am not sure why Cros device tried to enumerate a low speed USB device in a loop like this: 2016-06-13T02:10:20.920648-07:00 INFO kernel: [ 257.576408] usb 1-2.4: USB disconnect, device number 14 2016-06-13T02:10:21.864301-07:00 INFO kernel: [ 258.521234] usb 1-2.4: new low-speed USB device number 15 using xhci_hcd 2016-06-13T02:10:21.938319-07:00 ERR kernel: [ 258.595156] usb 1-2.4: device descriptor read/64, error -32 2016-06-13T02:10:22.039298-07:00 ERR kernel: [ 258.696045] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 14. 2016-06-13T02:10:22.112277-07:00 INFO kernel: [ 258.769138] usb 1-2.4: new low-speed USB device number 16 using xhci_hcd 2016-06-13T02:10:22.376317-07:00 INFO kernel: [ 259.033047] usb 1-2.4: new low-speed USB device number 17 using xhci_hcd 2016-06-13T02:10:22.450317-07:00 ERR kernel: [ 259.107045] usb 1-2.4: device descriptor read/64, error -32 2016-06-13T02:10:22.551320-07:00 ERR kernel: [ 259.207811] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 16. 2016-06-13T02:10:25.244288-07:00 INFO kernel: [ 261.900011] usb 1-2.4: new low-speed USB device number 21 using xhci_hcd 2016-06-13T02:10:25.318318-07:00 ERR kernel: [ 261.974013] usb 1-2.4: device descriptor read/64, error -32 2016-06-13T02:10:25.419318-07:00 ERR kernel: [ 262.074862] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 20. 2016-06-13T02:10:25.492289-07:00 INFO kernel: [ 262.147850] usb 1-2.4: new low-speed USB device number 22 using xhci_hcd 2016-06-13T02:10:25.577317-07:00 ERR kernel: [ 262.232840] usb 1-2.4: device descriptor read/64, error -32 2016-06-13T02:10:25.678302-07:00 ERR kernel: [ 262.333842] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 21. 2016-06-13T02:10:26.728319-07:00 INFO kernel: [ 263.383483] usb 1-2.4: new low-speed USB device number 24 using xhci_hcd 2016-06-13T02:10:26.802315-07:00 ERR kernel: [ 263.457475] usb 1-2.4: device descriptor read/64, error -32 2016-06-13T02:10:26.903320-07:00 ERR kernel: [ 263.558248] xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 23 Thanks!
,
Jun 13 2016
Thanks Jimmy, I'll remove servo at android1758-audiobox4-host1 and re-test with the same USB hub as you did with yuna, and update status.
,
Jun 13 2016
First I hit some connectivity problem, and rest the DUT as I transition to Dev mode and then to Normal mode back. Then, without servo(even removed flex cable), I reran USB playback and record tests several times. issue still persists on inconsistent basis. I guess I should replace the DUT and try again. Another workaround would be to not test paine and use yuna as main board-family representative, but not sure how tests are going to run on it.
,
Jun 14 2016
Thank you Kalin. Replacing a paine DUT might be a good try. Sorry, I still can not find a auron_paine here. I found that test is very stable on auron_yuna. I can pass USB playback/record test 20 times in a row.
,
Jun 14 2016
I think I might be able to get a auron_paine from Chromestop. I will check and update. Thanks!
,
Jun 14 2016
Hi Jimmy, I swapped auron_pain with auron_yuna. So we'll have more results now(with audio tests) for yuna(in the audio box at android1758-audiobox4-host1). And pain at chromeos1-row1-rack3-host4 will run just display tests. We'll add audio board so we get some audio results for pain too. Lets see how these boards run for next day or two.
,
Jun 14 2016
Thank you Kalin! I have tested auron_paine locally too. The test result (with kernel 4.2) is very stable 40/40 usb playback/record tests passed. I deployed 4.2 kernel to chromeos1-row1-rack3-host4 as well. So, we will get test result of 4.2 kernel with paine and yuna. Thanks!
,
Jun 14 2016
Now 'RPC error: usb.plug' failure happens for auron_yuna audio_AudioBasicUSBPlayback and audio_AudioBasicUSBRecord on build R53-8451.0.0 Some other failure are observed on audio tests So, I am pretty much convinced it is chameleon(FPGA) having to do with the issues observed. I'll replace the chameleon board at the audio box.
,
Jun 15 2016
I replaced FPGA board only, and left auron_yuna to run six jobs through the failing previously USB audio tests. http://cautotest/afe/#tab_id=view_host&object_id=4928 Lets see if this fixes things.
,
Jun 15 2016
Sadly it did not. Two out of the four custom jobs I rescheduled for usb audio tests FAILED with "RPC error: usb.plug" Screenshot: https://screenshot.googleplex.com/hQ88Bkiz3kV Since then three jobs were scheduled and they all pass - https://screenshot.googleplex.com/A6Z8WYOiiJ8 Lets give it few more days, though I am not positive this issue is gone.
,
Jun 15 2016
Hi Kalin, is it using the same setup as in #23 ? Same USB hub, no servo. Or maybe, that is a bad USB cable ? I should probably find another chameleon board and do more test locally, and ship one chameleon to you if it is stable. Thanks!
,
Jun 15 2016
Yes, it connects to the right USB port through a Trancent USb hub for the three connections - servo, Ethernet, and chameleon. Let me check on cable, and do more testing.
,
Jun 16 2016
Sridhar replaced the USB cable between DUT and chameleon yesterday, and so far three M53 builds are passing - https://screenshot.googleplex.com/12gZkHypFC6 This might be the real issue. I get the USB cables(typeA-to-miniUSB) from the chameleon FPGA bundles, and I guess we got a bad cable in the pack.
,
Jun 17 2016
This is great! Let's monitor it for some days before closing this issue. Thank you Sridhar and Kalin!
,
Jun 17 2016
June 14th were the last issues on all channels. Since then(after cable replacement) no failure is observed. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by ka...@chromium.org
, May 10 2016Labels: Chameleon