New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 626698 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Sep 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

allspark-E: failed to ping during network_EthernetStressPlug test

Project Member Reported by grundler@chromium.org, Jul 8 2016

Issue description

CHROMEOS_RELEASE_DESCRIPTION=8560.0.2016_07_07_1111 (Test Build - grundler) developer-build gale

What steps will reproduce the problem?
  test_that 172.27.217.124 network_EthernetStressPlug --args num_iterations=1000

Allspark-E is USB type C power adapter with single port USB2.0 RTL8152 ("Fast Ethernet", aka 100Mbps).

What is the expected output?
Test passes.

What do you see instead?
"Autotest client terminated unexpectedly: DUT is no longer pingable, it may have rebooted or hung."

DUT neither rebooted nor hung. The DUT sat overnight since I kicked the test off before leaving around 6:45pm.

/var/log/messages is a mess. Looks like two instances of "dhclient wan0" were running and one "lost the race" and was spewing errors the whole time.  I started one instant manually because I didn't see one running at boot time (wan0 is equivalent to a "built-in" device even though it's USB.)

[<rant> And the date/timestamps in /var/log/messages are completely off. It's not getting updated at boot time automatically. And ntpdate isn't installed. Why does jetstream have to solve this problem differently than the rest of the world? :( </rant>]

In order to ssh into the DUT, I rmmod/modprobed r8152 driver:
dmesg
[  936.979846] init: cryptohomed main process (759) killed by TERM signal
[  937.079173] init: chapsd main process (747) killed by TERM signal
[  942.180820] init: tcsd main process (742) killed by KILL signal
[ 1857.971628] IPv6: ADDRCONF(NETDEV_UP): wan0: link is not ready
[ 1860.338179] IPv6: ADDRCONF(NETDEV_CHANGE): wan0: link becomes ready
[52537.669364] usbcore: deregistering interface driver r8152
[52542.064824] usb 1-1: reset high-speed USB device number 3 using xhci-hcd
[52542.261845] r8152 1-1:1.0 eth0: v1.08.3
[52542.263469] usbcore: registered new interface driver r8152
[52542.334841] r8152 1-1:1.0 wan0: renamed from eth0
[52564.462074] IPv6: ADDRCONF(NETDEV_UP): wan0: link is not ready
[52566.834206] IPv6: ADDRCONF(NETDEV_CHANGE): wan0: link becomes ready
<end>


 
I just realized I forgot one bit: in order to run the test, I had to manually rename "wan0" back to "eth0". Let me figure out how to invoke the test so it "naturally" messes with wan0 instead.
remote: New Changes:
remote:   https://chromium-review.googlesource.com/359441 autotest: EthernetStressPlug accepts interface parameter

The change is not fully tested yet. But default behavior seems to work even if RTL8152 + r8152 driver doesn't survive the first cycle of this. :( (link goes down by never comes back)
Cc: ranix@chromium.org
Two issues:
1) "Unreasonable amount of memory"
diff --git a/client/bin/site_utils.py b/client/bin/site_utils.py
index 99b44b7..1e1c298 100644
--- a/client/bin/site_utils.py
+++ b/client/bin/site_utils.py
@@ -844,7 +844,7 @@ def get_mem_total():
     """
     mem_total = _get_float_from_file(_MEMINFO, 'MemTotal:', 'MemTotal:', ' kB')
     # Sanity check, all Chromebooks have at least 1GB of memory.
-    assert mem_total > 1024 * 1024, 'Unreasonable amount of memory.'
+    assert mem_total > 256 * 1024, 'Unreasonable amount of memory.'
     return mem_total / 1024

I thought this was fixed already by ranix@.


2) Something is removing wan0 interface and bringing it back unconfigured. My suspicion is this test is not playing well with udev (which renames new devices when they reappear). :(

19:03:57 INFO | autoserv| FAIL  network_EthernetStressPlug      network_EthernetStressPlug      timestamp=1467084143localtime=Jun 27 22:22:23        ERROR: TIMEOUT : wan0 IP is None after setting power on (last_wait = 0.60 seconds)

network_EthernetStressPlug.DEBUG:
...

06/28 03:21:33.801 DEBUG|network_EthernetSt:0458| Iteration: 0 start
06/28 03:21:33.968 DEBUG|        base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:34.013 DEBUG|        base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:34.727 DEBUG|        base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:34.773 DEBUG|        base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:35.866 DEBUG|        base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:35.911 DEBUG|        base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:37.003 DEBUG|        base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:37.047 DEBUG|        base_utils:0278| [stderr] wan0: error fetching interface information: Device not found
06/28 03:21:38.084 DEBUG|        base_utils:0185| Running 'ifconfig wan0'
06/28 03:21:38.114 DEBUG|        base_utils:0278| [stdout] wan0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
06/28 03:21:38.118 DEBUG|        base_utils:0278| [stdout]         ether f4:f5:d8:10:12:63  txqueuelen 1000  (Ethernet)
06/28 03:21:38.120 DEBUG|        base_utils:0278| [stdout]         RX packets 0  bytes 0 (0.0 B)
06/28 03:21:38.122 DEBUG|        base_utils:0278| [stdout]         RX errors 0  dropped 0  overruns 0  frame 0
06/28 03:21:38.124 DEBUG|        base_utils:0278| [stdout]         TX packets 0  bytes 0 (0.0 B)
06/28 03:21:38.127 DEBUG|        base_utils:0278| [stdout]         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
06/28 03:21:38.129 DEBUG|        base_utils:0278| [stdout] 
...
Updated the CL to add "interface=wan0" support:
   https://chromium-review.googlesource.com/#/c/359441/

Now to dig into why IP address isn't preserved across "down/up" sequence. It seems to be working fine when I do this "by hand":
gale tmp # cat /sys/class/net/wan0/flags                                        
0x1003
gale tmp # echo 0x1002 > /sys/class/net/wan0/flags
gale tmp # ifconfig wan0 | fgrep -e inet -e flags                               
wan0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        inet 172.27.217.218  netmask 255.255.255.0  broadcast 172.27.217.255
gale tmp # echo 0x1003 > /sys/class/net/wan0/flags
gale tmp # ifconfig wan0 | fgrep -e inet -e flags
wan0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.27.217.218  netmask 255.255.255.0  broadcast 172.27.217.255

which suggests something else is interfering (udev rules most likely) with "up" transition.

My current theory is the use of /sys "authorized" field is not playing well with udev.

autotest "stdout" from last run had:
...
00:34:05 DEBUG| Iteration: 0 start
00:34:05 DEBUG| Running 'ifconfig wan0'
00:34:05 DEBUG| [stderr] wan0: error fetching interface information: Device not found
00:34:07 DEBUG| Running 'ifconfig wan0'
00:34:07 DEBUG| [stderr] wan0: error fetching interface information: Device not found
00:34:08 DEBUG| Running 'ifconfig wan0'
00:34:08 DEBUG| [stdout] wan0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
00:34:08 DEBUG| [stdout]         ether f4:f5:d8:10:12:63  txqueuelen 1000  (Ethernet)
00:34:08 DEBUG| [stdout]         RX packets 0  bytes 0 (0.0 B)
00:34:08 DEBUG| [stdout]         RX errors 0  dropped 0  overruns 0  frame 0
00:34:08 DEBUG| [stdout]         TX packets 0  bytes 0 (0.0 B)
00:34:08 DEBUG| [stdout]         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
...

And the corresponding /var/log/messages:


2016-07-01T00:34:07.507890+00:00 INFO kernel: [330988.244257] usb 1-1: authorized to connect
2016-07-01T00:34:07.657243+00:00 INFO kernel: [330988.402977] usb 1-1: reset high-speed USB device number 7 using xhci-hcd
2016-07-01T00:34:07.867271+00:00 INFO kernel: [330988.604221] r8152 1-1:1.0 eth0: v1.08.3
2016-07-01T00:34:07.892766+00:00 NOTICE configure_gale_test_interface.sh[2697]: Ignoring eth0, unknown topology.
2016-07-01T00:34:07.937204+00:00 INFO kernel: [330988.676750] r8152 1-1:1.0 wan0: renamed from eth0
2016-07-01T00:34:09.158868+00:00 ERR dhclient[2705]: Bind socket to interface: No such device
...
2016-07-01T00:34:09.159330+00:00 ERR dhclient[2705]: exiting.
...
2016-07-01T00:34:53.712005+00:00 NOTICE root[2995]: autotest finished iteration /usr/local/autotest/results/default/network_EthernetStressPlug/sysinfo/iteration.1
2016-07-01T00:34:53.737422+00:00 NOTICE autotest[2998]: 00:34:53.715 WARNI|              test:0606| Autotest caught exception when running test:#012Traceback (most recent call last):#012  File "/usr/local/autotest/common_lib/test.py", line 600, in _exec#012    _call_test_function(self.execute, *p_args, **p_dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function#012    return func(*args, **dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 461, in execute#012    dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry#012    postprocess_profiled_run, args, dargs)#012  File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once#012    self.run_once(*args, **dargs)#012  File "/usr/local/autotest/tests/network_EthernetStressPlug/network_EthernetStressPlug.py", line 475, in run_once#012    linkup_time = self.TestPowerEthernet(power=1)#012  File "/usr/local/autotest/tests/network_EthernetStressPlug/network_EthernetStressPlug.py", line 303, in TestPowerEthernet#012    power_str[power], self.test_status['last_wai

And the I ran this sequence by hand:
echo 0 > /sys/devices/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1/authorized
<2 seconds>                                                              
echo 1 > /sys/devices/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1/authorized

And here is the corresponding dmesg output:
[340861.732150] usb 1-1: authorized to connect
[340861.875032] usb 1-1: reset high-speed USB device number 7 using xhci-hcd
[340861.994486] xhci-hcd xhci-hcd.0.auto: ERROR: unexpected setup address command completion code 0x11.
[340862.204382] xhci-hcd xhci-hcd.0.auto: ERROR: unexpected setup address command completion code 0x11.
[340862.414353] usb 1-1: device not accepting address 7, error -22
...
[340864.154624] r8152 1-1:1.0 (unnamed net_device) (uninitialized): Unknown version 0x58b0
[340864.154693] r8152 1-1:1.0 (unnamed net_device) (uninitialized): Unknown Device
[340864.155471] usb 1-1: USB disconnect, device number 7
[340864.274440] usb 1-1: new high-speed USB device number 8 using xhci-hcd
[340864.415301] usb 1-1: New USB device found, idVendor=0bda, idProduct=8152
[340864.415375] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=
3
[340864.415432] usb 1-1: Product: USB 10/100 LAN
[340864.415472] usb 1-1: Manufacturer: Tenpao
[340864.415509] usb 1-1: SerialNumber: F4F5D8101263
[340864.579033] usb 1-1: reset high-speed USB device number 8 using xhci-hcd
[340864.750441] r8152 1-1:1.0 eth0: v1.08.3
[340864.848842] r8152 1-1:1.0 wan0: renamed from eth0

So it looks like I need to modify TestPowerEthernet() to handle this USB host controller behavior.
Cc: cbook@chromium.org gdk@chromium.org
christopher, Garrett,
when echo 0/1 to authorized field as in comment 5, can either of you explain why shill isn't reconfiguring the "new" wan0 device when it becomes "known"?

Since ap-controller and ap-group-manager are constantly respawning in my test builds and spamming my dmesg output, I always have to run:
    stop ap-controller
    stop ap-group-manager

Do they monitor network interfaces coming up?

Comment 7 by cbook@chromium.org, Jul 13 2016

Not sure how shill handles this... do you see activity in net.log?

What command line arguments were passed to the shill that is currently running?

Comment 8 by grundler@google.com, Jul 13 2016

Shill is not currently running:
gale ~ # ps -ef | grep shill
root      4403  2136  0 04:48 ttyMSM0  00:00:00 grep --colour=auto shill

net.log didn't have anything regarding shill:

gale log # less net.log 
1970-01-01T00:00:06.962497+00:00 NOTICE wpa_supplicant[572]: Successfully initia
lized wpa_supplicant
1970-01-01T00:00:05.283481+00:00 NOTICE wpa_supplicant[518]: Successfully initia
lized wpa_supplicant
1970-01-01T00:00:06.005845+00:00 NOTICE wpa_supplicant[590]: Successfully initia
lized wpa_supplicant
1970-01-01T00:00:06.063532+00:00 NOTICE wpa_supplicant[587]: Successfully initia
lized wpa_supplicant
net.log (END)

I tried to manually restart shill and got the following in /var/log/messages:

2016-06-27T06:14:51.797820+00:00 WARNING kernel: [ 5832.638613] init: shill main
 process (20641) terminated with status 127
2016-06-27T06:14:51.797886+00:00 WARNING kernel: [ 5832.638879] init: shill resp
awning too fast, stopped

I can manually run "dhclient wan0" but that isn't the right answer here. :/

Comment 10 by cbook@chromium.org, Jul 13 2016

Probably should resolve the ap-controller crash before all of these other ones.
Status: WontFix (was: Assigned)
I'm going to close this as "WONTFIX" since Allspark-E is now "deprecated" (obsolete). All evt2 + allspark-E are expected to be replaced RSN by evt3 units (which have built-in RJ45 ports).

I'll note that ap-* daemons are no longer crashing repeatedly.

Sign in to add a comment