network_WiFi_TDLSPing - Unhandled AutoservRunError: command execution error |
|||||||||||||||||
Issue descriptionChrome Version: 58 Here is the cAuto test result https://wmatrix.googleplex.com/wifi_matfunc?hide_missing=True&tests=network_WiFi_TDLSPing&days_back=20&releases=58 Debug logs https://00e9e64bac3b5add5634ef044748e3405dc24c5562011b0616-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/104619566-chromeos-test%2Fchromeos9-row1-cart2-host4%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEufzEcPfCWPjtZP-cRMAb6isPCtsmNR6G-AMW1nVZrwurYfbY0TJNQNV8xgDl6wULovu4sglF863MS5_YiNrIeEbGyTnBh_o4_U8hHoqRk22LHR_VPMeF-CX8ejloAT5fP0Ubob2hy-YfLK7AqInujj8hMkESV7GI_gBshgSeJKOhuifaLHftbHemf3hDHyDs1oFsJ1LlMKfo_l6OgaCPYBOeBlwJFaCUgEM6VfhOtKsP06INM3BsuFNWVsVapDKoXDg5BtzqqFKkGGbvhtAUMzljSsRrmXsYcVxv2vfj3tGbfMJpKqhYQDJSfuE4437efCYeJViqVYpBuX9SzlTF8x5F-DOQzQ6-DtU4zLrhtAw3o-bpDq4oBnl9MD5v9fCAwXfgLnWLoIEJF1R5pD7b1YcvsV-uFaL0tYhpkXA94B_6Vy4mdnFtjnEPhbp-Xie8QrFAsslNqdj8_bgrZAkHNEio7ot1r9i7V8yX46FTSWbmKFRQHB-m8GPVWvURm9yf4a5yayzG9fQTj6oD4M-NidKJAYeFCsifP-DzK7o9ULmQlzohGPnFT6R_SrYzdLRupOeBEoRrXK-9MYEPmNpVjTO2X9e_82wZiBnFOwDo864BKaaZPyshmkjIX6JupcHipOncQzeS89bHT-OXo1rkG9j3ungUX9qR-TgUTVRQLo45ZtsHXcfebfcv_whrrJ-75AcGLwcfWrrV6E7ReBFh1wPmFL3ZZx2rieM7FH-uhVyYxH6QhjhsAEaFEsWF0Ar2qbK975s2YQaBtBU_3uigDTDJJntl8KqoIMzDs7aW-ghkUnNT5nvxxhRMW1uXdXzOGzf9AOY2-A8ORV44i_XsDPmFVBdsi4zlcTfRZdKXg9Omam0IZiyD6t2fm0O7WLMs9qb4ED Error 03/04 16:37:21.035 DEBUG| test:0220| after_iteration_hook ends running. 03/04 16:37:21.036 DEBUG| test:0395| after_iteration_hooks completed 03/04 16:37:21.039 WARNI| test:0615| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 609, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 823, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host4-router "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip link set managed1 up" Exit status: 2 Duration: 0.692073106766 stderr: RTNETLINK answers: Device or resource busy Traceback (most recent call last):
,
Mar 8 2017
,
Mar 9 2017
dshi@ / jrbarnette@ - Any suggestions? This is passing on some devices and failing on others. Debayan think its the command syntax error as mentioned in #1 above. Logs link: https://storage.cloud.google.com/chromeos-autotest-results/104649660-chromeos-test/chromeos9-row1-cart2-host2/debug/autoserv.DEBUG?_ga=1.43190245.1149929438.1479757627 Here is the complete trace... 03/04 16:53:10.002 INFO | server_job:0184| FAIL network_WiFi_TDLSPing network_WiFi_TDLSPing timestamp=1488675190 localtime=Mar 04 16:53:10 Unhandled AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host2-router "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip link set managed1 up" Exit status: 2 Duration: 0.689301013947 stderr: RTNETLINK answers: Device or resource busy Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 817, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 470, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 380, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/network_WiFi_TDLSPing/network_WiFi_TDLSPing.py", line 119, in run_once self.context.router.add_connected_peer() File "/usr/local/autotest/server/site_linux_router.py", line 1001, in add_connected_peer ssid, frequency, self.local_peer_ip_address(instance)) File "/usr/local/autotest/server/site_linux_router.py", line 1046, in configure_managed_station self.router.run('%s link set %s up' % (self.cmd_ip, interface)) File "/usr/local/autotest/server/hosts/ssh_host.py", line 295, in run ssh_failure_retry_ok) File "/usr/local/autotest/server/hosts/ssh_host.py", line 252, in _run raise error.AutoservRunError("command execution error", result) AutoservRunError: command execution error * Command: /usr/bin/ssh -a -x -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host2-router "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip link set managed1 up" Exit status: 2 Duration: 0.689301013947 stderr: RTNETLINK answers: Device or resource busy 03/04 16:53:10.003 INFO | server_job:0184| END FAIL network_WiFi_TDLSPing network_WiFi_TDLSPing timestamp=1488675190 localtime=Mar 04 16:53:10
,
Mar 9 2017
> RTNETLINK answers: Device or resource busy That message probably isn't caused by a syntax error.
,
Mar 9 2017
When i try running in chromeos9-row4-rack2-host1-router
root@localhost $ export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run(/usr/local/bin/ip link set managed1 up)\";fi;/usr/local/bin/ip link set managed1 up
-dash: Syntax error: "(" unexpected (expecting "fi")
After escape charector
root@localhost $ export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run\(/usr/local/bin/ip link set managed1 up\)\";fi;/usr/local/bin/ip link set managed1 up
Cannot find device "managed1"
root@localhost $
,
Mar 9 2017
> -dash: Syntax error: "(" unexpected (expecting "fi")
I believe the command as entered has one less level of
quoting than reported in the error message.
If the problem were the quoting in the string:
* The syntax error message would show up in the log.
* No call to Host.run() would work, because they all
use the same syntax.
As for this:
> Cannot find device "managed1"
Something in the context of the test is different from the
context when you ran it yourself. Whatever the test did
that you didn't do: That's the bug.
,
Mar 9 2017
the command is syntax error in some device which uses bin/dash root@localhost $ echo $SHELL /bin/dash root@localhost $ Ignore the managed1 error , it is expected as we have not configured the managed1 interface while tests running the command is there any reason for managed1 intf to be busy while configuring , such that kernel is reporting the busy message
,
Apr 1 2017
Issue 597731 has been merged into this issue.
,
Apr 1 2017
Yeah, this is definitely not a shell/syntax error. This test seems to be bringing up two virtual interfaces on the same wireless phy (phy0), with one of them as an AP and the other as a STA (client). That doesn't seem to be logically possible to me, but I could be wrong...
I've tweaked this test locally [1] to try to get it to set the 2nd interface up on a different phy (e.g., phy2 on my jetsream AP) . The test gets a little further along, but it fails with new errors (on my Kevin).
[1] This might be completely wrong...but tweaking the interface requirements seems to allow us to grab another PHY:
diff --git a/server/site_linux_router.py b/server/site_linux_router.py
index dd9c9f357f26..42b36229b9a1 100644
--- a/server/site_linux_router.py
+++ b/server/site_linux_router.py
@@ -1024,7 +1024,7 @@ class LinuxRouter(site_linux_system.LinuxSystem):
(e.g. '192.168.1.200').
"""
- interface = self.get_wlanif(frequency, 'managed')
+ interface = self.get_configured_interface('managed', frequency=frequency, spatial_streams=1)
# TODO(pstew): Configure other bits like PSK, 802.11n if tests
# require them...
,
Apr 3 2017
OK, so I realized my change in #10 actually should be enough to get the test going (since it helps get the test onto different radios for the 'client' and 'AP'), but the real problem is that the OnHub doesn't actually support TDLS. See the PHY info: Supported commands: * new_interface * set_interface * new_key * start_ap * new_station * set_bss * authenticate * associate * deauthenticate * disassociate * join_ibss * remain_on_channel * set_tx_bitrate_mask * frame * frame_wait_cancel * set_wiphy_netns * set_channel * set_wds_peer * probe_client * set_noack_map * register_beacons * start_p2p_device * set_mcast_rate * testmode * channel_switch * set_qos_map * connect * disconnect Note the missing 'tdls_mgmt' and 'tdls_oper', and if I check the 'capabilities' according to site_linux_system.py, I find that indeed the router does not support tlds. So this seems sufficient to get things "passing": https://chromium-review.googlesource.com/466091 That's not exactly ideal (it'd be better to ensure we run this test with HW that can support TDLS), but it's better than nothing IMO. Also, I think my original problem (that the 'AP' and 'client' interfaces are getting pushed onto the same PHY) is still a problem. Notably, the passing tests show results where we managed to allocate different PHYs on the router, due to circumstance. Maybe it's also good to apply a diff like in #10, so we have a better change of choosing different radios?
,
Apr 3 2017
,
Apr 3 2017
(Dropping Dan too, since this really isn't a "testing" bug; it's mostly Wifi) For #11: s/a better change/a better chance/
,
Apr 3 2017
,
Apr 3 2017
+ sduvvuri, grundler It seems a bit counter-intuitive that stumpy supports tdls while whirlwind doesn't. Is this WAI?
,
Apr 3 2017
,
Apr 3 2017
Srini, who should be looking at this? Seems like TDLS is supported by ath10k and I don't know enough about TDLS or ath10 to know why it's not advertised in v3.14 kernel: cd ~/trunk/src/third_party/kernel/v3.14/drivers/net/wireless-4.2/ath fgrep -Ri TDLS ... ath10k/mac.c: if (test_bit(WMI_SERVICE_TDLS, ar->wmi.svc_map)) ath10k/mac.c: ar->hw->wiphy->flags |= WIPHY_FLAG_SUPPORTS_TDLS; Since v3.18 kernel is using the same wireless stack (and ath10k driver) I expect it to be a similar problem.
,
Apr 3 2017
Perhaps this is a problem with the ancient "whirlwind testbed" images that are used for the test lab? IIUC, those actually use the wireless-3.18 stack still, and that doesn't have the same ath10k code as what Grant pointed out. I don't see WIPHY_FLAG_SUPPORTS_TDLS anywhere in wireless-3.18/ath/ath10k/.
,
Apr 3 2017
See bug 702007 for wireless-{3.18,4.2} transition.
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7814fa2a4c62dcdf77832121883c4fa4a99c8c6e commit 7814fa2a4c62dcdf77832121883c4fa4a99c8c6e Author: Brian Norris <briannorris@chromium.org> Date: Mon Apr 03 22:41:50 2017 network_WiFi_TDLSPing: check if router supports TDLS Jetstream APs (used for many Wifi autotests) don't support TLDS, but we're trying to use one of the AP's radios as a TDLS client for the DUT to ping. Obviously, this won't work on APs that don't support TDLS... Check for this, so we don't waste time trying to debug the test when the AP doesn't support it. BUG= chromium:699735 TEST=`test_that ... network_WiFi_TDLSPing` on a DUT paired with a jetstream AP Change-Id: I73f8eb42d64154c36594a8ecff35c882a9c8de2e Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/466091 Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> [modify] https://crrev.com/7814fa2a4c62dcdf77832121883c4fa4a99c8c6e/server/site_tests/network_WiFi_TDLSPing/network_WiFi_TDLSPing.py
,
Apr 4 2017
Brian, you are exactly right: testbed-ap profile uses stock 3.18 wireless stack and that doesn't have TDLS support. Kirtika, is anyone still working on fixing issues that led to "testbed-ap" profile? Seems like a conversation that Sameer and Zhifeng should have again. (and I'm reassigning back to brian for now)
,
Apr 4 2017
Hmm, I managed to get a 3.14 kernel with wireless-4.2 running on a whirlwind test AP here, but it still doesn't show TDLS support: Supported commands: * new_interface * set_interface * new_key * start_ap * new_station * new_mpath * set_mesh_config * set_bss * authenticate * associate * deauthenticate * disassociate * join_ibss * join_mesh * remain_on_channel * set_tx_bitrate_mask * frame * frame_wait_cancel * set_wiphy_netns * set_channel * set_wds_peer * probe_client * set_noack_map * register_beacons * start_p2p_device * set_mcast_rate * testmode * channel_switch * set_qos_map * connect * disconnect
,
Apr 4 2017
Reassigning to Srini. I see multiple problems: (a) as noted in bug 702007 , I'm seeing really bad behavior (crashiness, etc.) when I run with 3.14 + wireless42 (b) TDLS still isn't listed in the supported commands, even with wireless42 Is this something expected? Anything we can do to debug these issues? It's possible I'm still doing something stupid.
,
Apr 4 2017
My guess is that the ath10k FW does not support TDLS. shamanth, Can you check if ath10K on WW(peregrine chipset) supports TDLS ?
,
Apr 5 2017
Isn't the point of TDLS to bypass the router? https://en.wikipedia.org/wiki/TDLS Does Onhub need to mediate the tunnel setup or something like that? (a) (comment 23) is a different problem and should be tracked in a different bug report. My $0.02: the kernel shouldn't be crashing and I'm seeing low levels of _kernel_ crash reports from the field. So tracking this would be good.
,
Apr 5 2017
Grant, Here the jetstream AP is being used as a station and on top of that TDLS is being enabled. jetstream AP as a product will use station mode only during recovery and not in normal mode of operation and TDLS is not enabled in recovery . My question is why is Jetstream AP is being used in STA mode and why do we need to turn on TDLS ?
,
Apr 11 2017
Currently all the belo wdevices are testN/A the test asuka auron_yuna banjo caroline cave celes chell clapper cyan daisy daisy_skate daisy_spring edgar elm gnawty guado hana kefka https://wmatrix.googleplex.com/matrix/wifi_matfunc?hide_missing=True&tests=network_WiFi_TDLSPing&days_back=30 Is that expected ?
,
Apr 11 2017
@27: Of the few I picked out (banjo, samus, veyron_mickey, veyron_minnie, daisy_skate, gnawty), they're all reporting "DUT is incapable of TDLS". I don't think that part of the test changed recently. Do we have older results too? @25/26: It's all an artifact of however the test was initially written. Maybe this is stupid and (if we care about TDLS at all) should be rewritten to work some other way. Unfortunately, "some other way" would probably require more hardware (so we can have another true station/client on the network). So the answer to: "why is Jetstream AP is being used in STA mode and why do we need to turn on TDLS ?" might simply be: because it's a convenient way to have a 2nd client on the network. Also, given how badly TDLS seems to be supported (according to #27) even on our DUTs, we probably don't effectively care about this feature much at all... One correction: I wasn't reporting *kernel* crashes in #23; just firmware crashes, which caused ath10k to reset periodically and provide poor service. I'll file a separate bug for those when I get the time (sorry, I'm distracted; I mentioned this to ranix@ elsewhere I think). As far as I'm concerned, we can drop the Jetstream folks from any concern over TDLS (i.e., this bug) until we get our act together on our side. I'll file separate bug(s) if I have any further concerns over ath10k driver/FW hiccups, in the context of kernel upgrades (e.g., bug 702007 ).
,
Feb 16 2018
,
Dec 1
I'm going to claim this is Fixed. I don't think we expect TDLS support to materialize on Whirlwind any time soon, and the changes on this bug already prevent us from trying to run (and fail) TDLS tests on these routers. If we ever want to re-enable (and test) TDLS on Chrome OS, we need to figure out a different strategy. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by debayanb@chromium.org
, Mar 8 2017After debugging the issue RCA Syntax error in in export command MIssing ' \( ' server/site_linux_router.py:1046" Extra escape characters needed Correct command /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_w39NSMssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos9-row4-rack2-host1-router "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run\(/usr/local/bin/ip link set managed1 up\)\";fi;/usr/local/bin/ip link set managed1 up" Plcae for Code change: /src/third_party/autotest/files/server/site_linux_router.py # Connect the station. self.router.run('%s link set %s up' % (self.cmd_ip, interface) the ssh command is formed as concat from other commands and result is causing bash script syntax error.