New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 699735 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
User never visited
Closed: Dec 1
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

network_WiFi_TDLSPing - Unhandled AutoservRunError: command execution error

Project Member Reported by debayanb@chromium.org, Mar 8 2017

Issue description

Chrome Version: 58

Here is the cAuto test result 
https://wmatrix.googleplex.com/wifi_matfunc?hide_missing=True&tests=network_WiFi_TDLSPing&days_back=20&releases=58

Debug logs
https://00e9e64bac3b5add5634ef044748e3405dc24c5562011b0616-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/104619566-chromeos-test%2Fchromeos9-row1-cart2-host4%2Fdebug%2Fautoserv.DEBUG?qk=AD5uMEufzEcPfCWPjtZP-cRMAb6isPCtsmNR6G-AMW1nVZrwurYfbY0TJNQNV8xgDl6wULovu4sglF863MS5_YiNrIeEbGyTnBh_o4_U8hHoqRk22LHR_VPMeF-CX8ejloAT5fP0Ubob2hy-YfLK7AqInujj8hMkESV7GI_gBshgSeJKOhuifaLHftbHemf3hDHyDs1oFsJ1LlMKfo_l6OgaCPYBOeBlwJFaCUgEM6VfhOtKsP06INM3BsuFNWVsVapDKoXDg5BtzqqFKkGGbvhtAUMzljSsRrmXsYcVxv2vfj3tGbfMJpKqhYQDJSfuE4437efCYeJViqVYpBuX9SzlTF8x5F-DOQzQ6-DtU4zLrhtAw3o-bpDq4oBnl9MD5v9fCAwXfgLnWLoIEJF1R5pD7b1YcvsV-uFaL0tYhpkXA94B_6Vy4mdnFtjnEPhbp-Xie8QrFAsslNqdj8_bgrZAkHNEio7ot1r9i7V8yX46FTSWbmKFRQHB-m8GPVWvURm9yf4a5yayzG9fQTj6oD4M-NidKJAYeFCsifP-DzK7o9ULmQlzohGPnFT6R_SrYzdLRupOeBEoRrXK-9MYEPmNpVjTO2X9e_82wZiBnFOwDo864BKaaZPyshmkjIX6JupcHipOncQzeS89bHT-OXo1rkG9j3ungUX9qR-TgUTVRQLo45ZtsHXcfebfcv_whrrJ-75AcGLwcfWrrV6E7ReBFh1wPmFL3ZZx2rieM7FH-uhVyYxH6QhjhsAEaFEsWF0Ar2qbK975s2YQaBtBU_3uigDTDJJntl8KqoIMzDs7aW-ghkUnNT5nvxxhRMW1uXdXzOGzf9AOY2-A8ORV44i_XsDPmFVBdsi4zlcTfRZdKXg9Omam0IZiyD6t2fm0O7WLMs9qb4ED


Error
03/04 16:37:21.035 DEBUG|              test:0220| after_iteration_hook ends running.
03/04 16:37:21.036 DEBUG|              test:0395| after_iteration_hooks completed
03/04 16:37:21.039 WARNI|              test:0615| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 609, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 823, in _call_test_function
    raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled AutoservRunError: command execution error
* Command: 
    /usr/bin/ssh -a -x     -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host4-router "export
    LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger
    -tag \"autotest\"
    \"server[stack::run_once|add_connected_peer|configure_managed_station] ->
    ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip
    link set managed1 up"
Exit status: 2
Duration: 0.692073106766

stderr:
RTNETLINK answers: Device or resource busy
Traceback (most recent call last):
 
After debugging the issue 

RCA 
Syntax error in  in export command 
MIssing ' \( '
server/site_linux_router.py:1046" Extra escape characters needed 

Correct command
 /usr/bin/ssh -a -x    -o ControlPath=/tmp/_autotmp_w39NSMssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos9-row4-rack2-host1-router "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run\(/usr/local/bin/ip link set managed1 up\)\";fi;/usr/local/bin/ip link set managed1 up"

Plcae for Code change:
/src/third_party/autotest/files/server/site_linux_router.py

   # Connect the station.
        self.router.run('%s link set %s up' % (self.cmd_ip, interface)

the ssh command is formed as concat from other commands and result is causing bash script syntax error. 

Cc: aashuto...@chromium.org
Cc: dshi@chromium.org jrbarnette@chromium.org
dshi@ / jrbarnette@ - Any suggestions? This is passing on some devices and failing on others. Debayan think its the command syntax error as mentioned in #1 above.


Logs link: https://storage.cloud.google.com/chromeos-autotest-results/104649660-chromeos-test/chromeos9-row1-cart2-host2/debug/autoserv.DEBUG?_ga=1.43190245.1149929438.1479757627

Here is the complete trace...

03/04 16:53:10.002 INFO |        server_job:0184| 	FAIL	network_WiFi_TDLSPing	network_WiFi_TDLSPing	timestamp=1488675190	localtime=Mar 04 16:53:10	Unhandled AutoservRunError: command execution error
  * Command: 
      /usr/bin/ssh -a -x     -o StrictHostKeyChecking=no -o
      UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
      ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
      -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host2-router "export
      LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger
      -tag \"autotest\"
      \"server[stack::run_once|add_connected_peer|configure_managed_station] ->
      ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip
      link set managed1 up"
  Exit status: 2
  Duration: 0.689301013947
  
  stderr:
  RTNETLINK answers: Device or resource busy
  Traceback (most recent call last):
    File "/usr/local/autotest/client/common_lib/test.py", line 817, in _call_test_function
      return func(*args, **dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 470, in execute
      dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
      postprocess_profiled_run, args, dargs)
    File "/usr/local/autotest/client/common_lib/test.py", line 380, in _call_run_once
      self.run_once(*args, **dargs)
    File "/usr/local/autotest/server/site_tests/network_WiFi_TDLSPing/network_WiFi_TDLSPing.py", line 119, in run_once
      self.context.router.add_connected_peer()
    File "/usr/local/autotest/server/site_linux_router.py", line 1001, in add_connected_peer
      ssid, frequency, self.local_peer_ip_address(instance))
    File "/usr/local/autotest/server/site_linux_router.py", line 1046, in configure_managed_station
      self.router.run('%s link set %s up' % (self.cmd_ip, interface))
    File "/usr/local/autotest/server/hosts/ssh_host.py", line 295, in run
      ssh_failure_retry_ok)
    File "/usr/local/autotest/server/hosts/ssh_host.py", line 252, in _run
      raise error.AutoservRunError("command execution error", result)
  AutoservRunError: command execution error
  * Command: 
      /usr/bin/ssh -a -x     -o StrictHostKeyChecking=no -o
      UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
      ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
      -o Protocol=2 -l root -p 22 chromeos9-row1-cart2-host2-router "export
      LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger
      -tag \"autotest\"
      \"server[stack::run_once|add_connected_peer|configure_managed_station] ->
      ssh_run(/usr/local/bin/ip link set managed1 up)\";fi; /usr/local/bin/ip
      link set managed1 up"
  Exit status: 2
  Duration: 0.689301013947
  
  stderr:
  RTNETLINK answers: Device or resource busy
03/04 16:53:10.003 INFO |        server_job:0184| END FAIL	network_WiFi_TDLSPing	network_WiFi_TDLSPing	timestamp=1488675190	localtime=Mar 04 16:53:10
>   RTNETLINK answers: Device or resource busy

That message probably isn't caused by a syntax error.

When i try running in  chromeos9-row4-rack2-host1-router

root@localhost $ export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run(/usr/local/bin/ip link set managed1 up)\";fi;/usr/local/bin/ip link set managed1 up
-dash: Syntax error: "(" unexpected (expecting "fi")

After escape charector 

root@localhost $ export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::run_once|add_connected_peer|configure_managed_station] -> ssh_run\(/usr/local/bin/ip link set managed1 up\)\";fi;/usr/local/bin/ip link set managed1 up
Cannot find device "managed1"
root@localhost $ 
 
> -dash: Syntax error: "(" unexpected (expecting "fi")

I believe the command as entered has one less level of
quoting than reported in the error message.

If the problem were the quoting in the string:
  * The syntax error message would show up in the log.
  * No call to Host.run() would work, because they all
    use the same syntax.

As for this:
> Cannot find device "managed1"

Something in the context of the test is different from the
context when you ran it yourself.  Whatever the test did
that you didn't do:  That's the bug.

the command is syntax error in some device which uses bin/dash 

root@localhost $ echo $SHELL
/bin/dash
root@localhost $ 

Ignore the managed1 error , it is expected as we have not configured the managed1 interface while tests running the command 

is there any reason for managed1 intf to be busy while configuring , such that kernel is reporting the busy message 
Cc: bmahadev@chromium.org tienchang@chromium.org snanda@chromium.org krisr@chromium.org jleong@chromium.org vwang@chromium.org
 Issue 597731  has been merged into this issue.
Yeah, this is definitely not a shell/syntax error. This test seems to be bringing up two virtual interfaces on the same wireless phy (phy0), with one of them as an AP and the other as a STA (client). That doesn't seem to be logically possible to me, but I could be wrong...

I've tweaked this test locally [1] to try to get it to set the 2nd interface up on a different phy (e.g., phy2 on my jetsream AP) . The test gets a little further along, but it fails with new errors (on my Kevin).


[1] This might be completely wrong...but tweaking the interface requirements seems to allow us to grab another PHY:

diff --git a/server/site_linux_router.py b/server/site_linux_router.py
index dd9c9f357f26..42b36229b9a1 100644
--- a/server/site_linux_router.py
+++ b/server/site_linux_router.py
@@ -1024,7 +1024,7 @@ class LinuxRouter(site_linux_system.LinuxSystem):
                         (e.g. '192.168.1.200').
 
         """
-        interface = self.get_wlanif(frequency, 'managed')
+        interface = self.get_configured_interface('managed', frequency=frequency, spatial_streams=1)
 
         # TODO(pstew): Configure other bits like PSK, 802.11n if tests
         # require them...
Owner: briannorris@chromium.org
Status: Started (was: Untriaged)
OK, so I realized my change in #10 actually should be enough to get the test going (since it helps get the test onto different radios for the 'client' and 'AP'), but the real problem is that the OnHub doesn't actually support TDLS. See the PHY info:

	Supported commands:
		 * new_interface
		 * set_interface
		 * new_key
		 * start_ap
		 * new_station
		 * set_bss
		 * authenticate
		 * associate
		 * deauthenticate
		 * disassociate
		 * join_ibss
		 * remain_on_channel
		 * set_tx_bitrate_mask
		 * frame
		 * frame_wait_cancel
		 * set_wiphy_netns
		 * set_channel
		 * set_wds_peer
		 * probe_client
		 * set_noack_map
		 * register_beacons
		 * start_p2p_device
		 * set_mcast_rate
		 * testmode
		 * channel_switch
		 * set_qos_map
		 * connect
		 * disconnect

Note the missing 'tdls_mgmt' and 'tdls_oper', and if I check the 'capabilities' according to site_linux_system.py, I find that indeed the router does not support tlds.

So this seems sufficient to get things "passing":

https://chromium-review.googlesource.com/466091

That's not exactly ideal (it'd be better to ensure we run this test with HW that can support TDLS), but it's better than nothing IMO.

Also, I think my original problem (that the 'AP' and 'client' interfaces are getting pushed onto the same PHY) is still a problem. Notably, the passing tests show results where we managed to allocate different PHYs on the router, due to circumstance. Maybe it's also good to apply a diff like in #10, so we have a better change of choosing different radios?
Cc: -jrbarnette@chromium.org
Cc: -dshi@chromium.org -krisr@chromium.org kr...@chroium.org
(Dropping Dan too, since this really isn't a "testing" bug; it's mostly Wifi)

For #11:

s/a better change/a better chance/
Cc: -kr...@chroium.org krisr@chromium.org
Cc: sduvvuri@chromium.org grundler@chromium.org
+ sduvvuri, grundler
It seems a bit counter-intuitive that stumpy supports tdls while whirlwind doesn't. Is this WAI?


Cc: -jleong@chromium.org
Owner: sduvvuri@chromium.org
Srini, who should be looking at this?

Seems like TDLS is supported by ath10k and I don't know enough about TDLS or ath10 to know why it's not advertised in v3.14 kernel:

cd ~/trunk/src/third_party/kernel/v3.14/drivers/net/wireless-4.2/ath 
fgrep -Ri TDLS
...
ath10k/mac.c:   if (test_bit(WMI_SERVICE_TDLS, ar->wmi.svc_map))
ath10k/mac.c:           ar->hw->wiphy->flags |= WIPHY_FLAG_SUPPORTS_TDLS;

Since v3.18 kernel is using the same wireless stack (and ath10k driver) I expect it to be a similar problem.
Perhaps this is a problem with the ancient "whirlwind testbed" images that are used for the test lab? IIUC, those actually use the wireless-3.18 stack still, and that doesn't have the same ath10k code as what Grant pointed out. I don't see WIPHY_FLAG_SUPPORTS_TDLS anywhere in wireless-3.18/ath/ath10k/.
See  bug 702007  for wireless-{3.18,4.2} transition.
Project Member

Comment 20 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7814fa2a4c62dcdf77832121883c4fa4a99c8c6e

commit 7814fa2a4c62dcdf77832121883c4fa4a99c8c6e
Author: Brian Norris <briannorris@chromium.org>
Date: Mon Apr 03 22:41:50 2017

network_WiFi_TDLSPing: check if router supports TDLS

Jetstream APs (used for many Wifi autotests) don't support TLDS, but
we're trying to use one of the AP's radios as a TDLS client for the DUT
to ping. Obviously, this won't work on APs that don't support TDLS...

Check for this, so we don't waste time trying to debug the test when the
AP doesn't support it.

BUG= chromium:699735 
TEST=`test_that ... network_WiFi_TDLSPing` on a DUT paired with a
     jetstream AP

Change-Id: I73f8eb42d64154c36594a8ecff35c882a9c8de2e
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/466091
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>

[modify] https://crrev.com/7814fa2a4c62dcdf77832121883c4fa4a99c8c6e/server/site_tests/network_WiFi_TDLSPing/network_WiFi_TDLSPing.py

Owner: briannorris@chromium.org
Brian, you are exactly right: testbed-ap profile uses stock 3.18 wireless stack and that doesn't have TDLS support.

Kirtika,  is anyone still working on fixing issues that led to "testbed-ap" profile?

Seems like a conversation that Sameer and Zhifeng should have again.

(and I'm reassigning back to brian for now)
Hmm, I managed to get a 3.14 kernel with wireless-4.2 running on a whirlwind test AP here, but it still doesn't show TDLS support:

	Supported commands:
		 * new_interface
		 * set_interface
		 * new_key
		 * start_ap
		 * new_station
		 * new_mpath
		 * set_mesh_config
		 * set_bss
		 * authenticate
		 * associate
		 * deauthenticate
		 * disassociate
		 * join_ibss
		 * join_mesh
		 * remain_on_channel
		 * set_tx_bitrate_mask
		 * frame
		 * frame_wait_cancel
		 * set_wiphy_netns
		 * set_channel
		 * set_wds_peer
		 * probe_client
		 * set_noack_map
		 * register_beacons
		 * start_p2p_device
		 * set_mcast_rate
		 * testmode
		 * channel_switch
		 * set_qos_map
		 * connect
		 * disconnect
Owner: sduvvuri@chromium.org
Status: Assigned (was: Started)
Reassigning to Srini. I see multiple problems:

(a) as noted in  bug 702007 , I'm seeing really bad behavior (crashiness, etc.) when I run with 3.14 + wireless42
(b) TDLS still isn't listed in the supported commands, even with wireless42

Is this something expected? Anything we can do to debug these issues? It's possible I'm still doing something stupid.
Owner: shamanthakumar@chromium.org
My guess is that the ath10k FW does not support TDLS.
shamanth,
  Can you check if ath10K on WW(peregrine chipset) supports TDLS ?
Isn't the point of TDLS to bypass the router?
https://en.wikipedia.org/wiki/TDLS

Does Onhub need to mediate the tunnel setup or something like that?

(a) (comment 23) is a different problem and should be tracked in a different bug report.  My $0.02: the kernel shouldn't be crashing and I'm seeing low levels of _kernel_ crash reports from the field. So tracking this would be good.
Grant,
  Here the jetstream AP is being  used as a station and on top of that TDLS is
  being enabled. jetstream AP as a product will use station mode only during recovery    
  and not in normal mode of operation and TDLS is not enabled in recovery .
  
My question is 
 why is Jetstream  AP is being used in STA mode and why do we need to turn on TDLS ?
  
Cc: -bmahadev@chromium.org briannorris@chromium.org
Currently all the belo wdevices are testN/A the test 
asuka	auron_yuna	banjo	caroline	cave	celes	chell	clapper	cyan	daisy	daisy_skate	daisy_spring	edgar	elm	gnawty	guado	hana	kefka

https://wmatrix.googleplex.com/matrix/wifi_matfunc?hide_missing=True&tests=network_WiFi_TDLSPing&days_back=30

Is that expected ?
@27:
Of the few I picked out (banjo, samus, veyron_mickey, veyron_minnie, daisy_skate, gnawty), they're all reporting "DUT is incapable of TDLS". I don't think that part of the test changed recently. Do we have older results too?

@25/26:
It's all an artifact of however the test was initially written. Maybe this is stupid and (if we care about TDLS at all) should be rewritten to work some other way. Unfortunately, "some other way" would probably require more hardware (so we can have another true station/client on the network). So the answer to:

"why is Jetstream  AP is being used in STA mode and why do we need to turn on TDLS ?"

might simply be: because it's a convenient way to have a 2nd client on the network.


Also, given how badly TDLS seems to be supported (according to #27) even on our DUTs, we probably don't effectively care about this feature much at all...

One correction: I wasn't reporting *kernel* crashes in #23; just firmware crashes, which caused ath10k to reset periodically and provide poor service.

I'll file a separate bug for those when I get the time (sorry, I'm distracted; I mentioned this to ranix@ elsewhere I think).

As far as I'm concerned, we can drop the Jetstream folks from any concern over TDLS (i.e., this bug) until we get our act together on our side. I'll file separate bug(s) if I have any further concerns over ath10k driver/FW hiccups, in the context of kernel upgrades (e.g.,  bug 702007 ).
Labels: wifi-test-failures
Cc: benchan@chromium.org
Status: Fixed (was: Assigned)
I'm going to claim this is Fixed. I don't think we expect TDLS support to materialize on Whirlwind any time soon, and the changes on this bug already prevent us from trying to run (and fail) TDLS tests on these routers.

If we ever want to re-enable (and test) TDLS on Chrome OS, we need to figure out a different strategy.

Sign in to add a comment