allspark-E: failed to ping during network_EthernetStressPlug test |
||||
Issue description
CHROMEOS_RELEASE_DESCRIPTION=8560.0.2016_07_07_1111 (Test Build - grundler) developer-build gale
What steps will reproduce the problem?
test_that 172.27.217.124 network_EthernetStressPlug --args num_iterations=1000
Allspark-E is USB type C power adapter with single port USB2.0 RTL8152 ("Fast Ethernet", aka 100Mbps).
What is the expected output?
Test passes.
What do you see instead?
"Autotest client terminated unexpectedly: DUT is no longer pingable, it may have rebooted or hung."
DUT neither rebooted nor hung. The DUT sat overnight since I kicked the test off before leaving around 6:45pm.
/var/log/messages is a mess. Looks like two instances of "dhclient wan0" were running and one "lost the race" and was spewing errors the whole time. I started one instant manually because I didn't see one running at boot time (wan0 is equivalent to a "built-in" device even though it's USB.)
[<rant> And the date/timestamps in /var/log/messages are completely off. It's not getting updated at boot time automatically. And ntpdate isn't installed. Why does jetstream have to solve this problem differently than the rest of the world? :( </rant>]
In order to ssh into the DUT, I rmmod/modprobed r8152 driver:
dmesg
[ 936.979846] init: cryptohomed main process (759) killed by TERM signal
[ 937.079173] init: chapsd main process (747) killed by TERM signal
[ 942.180820] init: tcsd main process (742) killed by KILL signal
[ 1857.971628] IPv6: ADDRCONF(NETDEV_UP): wan0: link is not ready
[ 1860.338179] IPv6: ADDRCONF(NETDEV_CHANGE): wan0: link becomes ready
[52537.669364] usbcore: deregistering interface driver r8152
[52542.064824] usb 1-1: reset high-speed USB device number 3 using xhci-hcd
[52542.261845] r8152 1-1:1.0 eth0: v1.08.3
[52542.263469] usbcore: registered new interface driver r8152
[52542.334841] r8152 1-1:1.0 wan0: renamed from eth0
[52564.462074] IPv6: ADDRCONF(NETDEV_UP): wan0: link is not ready
[52566.834206] IPv6: ADDRCONF(NETDEV_CHANGE): wan0: link becomes ready
<end>
,
Jul 8 2016
remote: New Changes: remote: https://chromium-review.googlesource.com/359441 autotest: EthernetStressPlug accepts interface parameter The change is not fully tested yet. But default behavior seems to work even if RTL8152 + r8152 driver doesn't survive the first cycle of this. :( (link goes down by never comes back)
,
Jul 9 2016
Two issues:
1) "Unreasonable amount of memory"
diff --git a/client/bin/site_utils.py b/client/bin/site_utils.py
index 99b44b7..1e1c298 100644
--- a/client/bin/site_utils.py
+++ b/client/bin/site_utils.py
@@ -844,7 +844,7 @@ def get_mem_total():
"""
mem_total = _get_float_from_file(_MEMINFO, 'MemTotal:', 'MemTotal:', ' kB')
# Sanity check, all Chromebooks have at least 1GB of memory.
- assert mem_total > 1024 * 1024, 'Unreasonable amount of memory.'
+ assert mem_total > 256 * 1024, 'Unreasonable amount of memory.'
return mem_total / 1024
I thought this was fixed already by ranix@.
2) Something is removing wan0 interface and bringing it back unconfigured. My suspicion is this test is not playing well with udev (which renames new devices when they reappear). :(
19:03:57 INFO | autoserv| FAIL network_EthernetStressPlug network_EthernetStressPlug timestamp=1467084143localtime=Jun 27 22:22:23 ERROR: TIMEOUT : wan0 IP is None after setting power on (last_wait = 0.60 seconds)
network_EthernetStressPlug.DEBUG:
...
06/28 03:21:33.801 DEBUG|network_EthernetSt:0458| Iteration: 0 start
06/28 03:21:33.968 DEBUG| base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:34.013 DEBUG| base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:34.727 DEBUG| base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:34.773 DEBUG| base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:35.866 DEBUG| base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:35.911 DEBUG| base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:37.003 DEBUG| base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:37.047 DEBUG| base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:38.084 DEBUG| base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:38.114 DEBUG| base_utils:0278| [stdout] wan0: flags=4098<BROADCAST,MULTICAST> mtu 1500
06/28 03:21:38.118 DEBUG| base_utils:0278| [stdout] ether f4:f5:d8:10:12:63 txqueuelen 1000 (Ethernet)
06/28 03:21:38.120 DEBUG| base_utils:0278| [stdout] RX packets 0 bytes 0 (0.0 B)
06/28 03:21:38.122 DEBUG| base_utils:0278| [stdout] RX errors 0 dropped 0 overruns 0 frame 0
06/28 03:21:38.124 DEBUG| base_utils:0278| [stdout] TX packets 0 bytes 0 (0.0 B)
06/28 03:21:38.127 DEBUG| base_utils:0278| [stdout] TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
06/28 03:21:38.129 DEBUG| base_utils:0278| [stdout]
...
,
Jul 11 2016
Updated the CL to add "interface=wan0" support: https://chromium-review.googlesource.com/#/c/359441/ Now to dig into why IP address isn't preserved across "down/up" sequence. It seems to be working fine when I do this "by hand": gale tmp # cat /sys/class/net/wan0/flags 0x1003 gale tmp # echo 0x1002 > /sys/class/net/wan0/flags gale tmp # ifconfig wan0 | fgrep -e inet -e flags wan0: flags=4098<BROADCAST,MULTICAST> mtu 1500 inet 172.27.217.218 netmask 255.255.255.0 broadcast 172.27.217.255 gale tmp # echo 0x1003 > /sys/class/net/wan0/flags gale tmp # ifconfig wan0 | fgrep -e inet -e flags wan0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.27.217.218 netmask 255.255.255.0 broadcast 172.27.217.255 which suggests something else is interfering (udev rules most likely) with "up" transition.
,
Jul 12 2016
My current theory is the use of /sys "authorized" field is not playing well with udev. autotest "stdout" from last run had: ... 00:34:05 DEBUG| Iteration: 0 start 00:34:05 DEBUG| Running 'ifconfig wan0' 00:34:05 DEBUG| [stderr] wan0: error fetching interface information: Device not found 00:34:07 DEBUG| Running 'ifconfig wan0' 00:34:07 DEBUG| [stderr] wan0: error fetching interface information: Device not found 00:34:08 DEBUG| Running 'ifconfig wan0' 00:34:08 DEBUG| [stdout] wan0: flags=4098<BROADCAST,MULTICAST> mtu 1500 00:34:08 DEBUG| [stdout] ether f4:f5:d8:10:12:63 txqueuelen 1000 (Ethernet) 00:34:08 DEBUG| [stdout] RX packets 0 bytes 0 (0.0 B) 00:34:08 DEBUG| [stdout] RX errors 0 dropped 0 overruns 0 frame 0 00:34:08 DEBUG| [stdout] TX packets 0 bytes 0 (0.0 B) 00:34:08 DEBUG| [stdout] TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ... And the corresponding /var/log/messages: 2016-07-01T00:34:07.507890+00:00 INFO kernel: [330988.244257] usb 1-1: authorized to connect 2016-07-01T00:34:07.657243+00:00 INFO kernel: [330988.402977] usb 1-1: reset high-speed USB device number 7 using xhci-hcd 2016-07-01T00:34:07.867271+00:00 INFO kernel: [330988.604221] r8152 1-1:1.0 eth0: v1.08.3 2016-07-01T00:34:07.892766+00:00 NOTICE configure_gale_test_interface.sh[2697]: Ignoring eth0, unknown topology. 2016-07-01T00:34:07.937204+00:00 INFO kernel: [330988.676750] r8152 1-1:1.0 wan0: renamed from eth0 2016-07-01T00:34:09.158868+00:00 ERR dhclient[2705]: Bind socket to interface: No such device ... 2016-07-01T00:34:09.159330+00:00 ERR dhclient[2705]: exiting. ... 2016-07-01T00:34:53.712005+00:00 NOTICE root[2995]: autotest finished iteration /usr/local/autotest/results/default/network_EthernetStressPlug/sysinfo/iteration.1 2016-07-01T00:34:53.737422+00:00 NOTICE autotest[2998]: 00:34:53.715 WARNI| test:0606| Autotest caught exception when running test:#012Traceback (most recent call last):#012 File "/usr/local/autotest/common_lib/test.py", line 600, in _exec#012 _call_test_function(self.execute, *p_args, **p_dargs)#012 File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function#012 return func(*args, **dargs)#012 File "/usr/local/autotest/common_lib/test.py", line 461, in execute#012 dargs)#012 File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry#012 postprocess_profiled_run, args, dargs)#012 File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once#012 self.run_once(*args, **dargs)#012 File "/usr/local/autotest/tests/network_EthernetStressPlug/network_EthernetStressPlug.py", line 475, in run_once#012 linkup_time = self.TestPowerEthernet(power=1)#012 File "/usr/local/autotest/tests/network_EthernetStressPlug/network_EthernetStressPlug.py", line 303, in TestPowerEthernet#012 power_str[power], self.test_status['last_wai And the I ran this sequence by hand: echo 0 > /sys/devices/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1/authorized <2 seconds> echo 1 > /sys/devices/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1/authorized And here is the corresponding dmesg output: [340861.732150] usb 1-1: authorized to connect [340861.875032] usb 1-1: reset high-speed USB device number 7 using xhci-hcd [340861.994486] xhci-hcd xhci-hcd.0.auto: ERROR: unexpected setup address command completion code 0x11. [340862.204382] xhci-hcd xhci-hcd.0.auto: ERROR: unexpected setup address command completion code 0x11. [340862.414353] usb 1-1: device not accepting address 7, error -22 ... [340864.154624] r8152 1-1:1.0 (unnamed net_device) (uninitialized): Unknown version 0x58b0 [340864.154693] r8152 1-1:1.0 (unnamed net_device) (uninitialized): Unknown Device [340864.155471] usb 1-1: USB disconnect, device number 7 [340864.274440] usb 1-1: new high-speed USB device number 8 using xhci-hcd [340864.415301] usb 1-1: New USB device found, idVendor=0bda, idProduct=8152 [340864.415375] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber= 3 [340864.415432] usb 1-1: Product: USB 10/100 LAN [340864.415472] usb 1-1: Manufacturer: Tenpao [340864.415509] usb 1-1: SerialNumber: F4F5D8101263 [340864.579033] usb 1-1: reset high-speed USB device number 8 using xhci-hcd [340864.750441] r8152 1-1:1.0 eth0: v1.08.3 [340864.848842] r8152 1-1:1.0 wan0: renamed from eth0 So it looks like I need to modify TestPowerEthernet() to handle this USB host controller behavior.
,
Jul 12 2016
christopher, Garrett,
when echo 0/1 to authorized field as in comment 5, can either of you explain why shill isn't reconfiguring the "new" wan0 device when it becomes "known"?
Since ap-controller and ap-group-manager are constantly respawning in my test builds and spamming my dmesg output, I always have to run:
stop ap-controller
stop ap-group-manager
Do they monitor network interfaces coming up?
,
Jul 13 2016
Not sure how shill handles this... do you see activity in net.log? What command line arguments were passed to the shill that is currently running?
,
Jul 13 2016
Shill is not currently running: gale ~ # ps -ef | grep shill root 4403 2136 0 04:48 ttyMSM0 00:00:00 grep --colour=auto shill net.log didn't have anything regarding shill: gale log # less net.log 1970-01-01T00:00:06.962497+00:00 NOTICE wpa_supplicant[572]: Successfully initia lized wpa_supplicant 1970-01-01T00:00:05.283481+00:00 NOTICE wpa_supplicant[518]: Successfully initia lized wpa_supplicant 1970-01-01T00:00:06.005845+00:00 NOTICE wpa_supplicant[590]: Successfully initia lized wpa_supplicant 1970-01-01T00:00:06.063532+00:00 NOTICE wpa_supplicant[587]: Successfully initia lized wpa_supplicant net.log (END)
,
Jul 13 2016
I tried to manually restart shill and got the following in /var/log/messages: 2016-06-27T06:14:51.797820+00:00 WARNING kernel: [ 5832.638613] init: shill main process (20641) terminated with status 127 2016-06-27T06:14:51.797886+00:00 WARNING kernel: [ 5832.638879] init: shill resp awning too fast, stopped I can manually run "dhclient wan0" but that isn't the right answer here. :/
,
Jul 13 2016
Probably should resolve the ap-controller crash before all of these other ones.
,
Sep 22 2016
I'm going to close this as "WONTFIX" since Allspark-E is now "deprecated" (obsolete). All evt2 + allspark-E are expected to be replaced RSN by evt3 units (which have built-in RJ45 ports). I'll note that ap-* daemons are no longer crashing repeatedly. |
||||
►
Sign in to add a comment |
||||
Comment 1 by grundler@chromium.org
, Jul 8 2016