New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 662413 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

WIFI chromebox lost network connectivity

Reported by tuan...@scala.com, Nov 4 2016

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
Platform: Asus Chromebox

Example URL:

Steps to reproduce the problem:
I have seen this issues  3  times on three different devices. Not sure how easy to reproduce it. The device plays in Kiosk mode

What is the expected behavior?
Expected behavior is: Have WIFI working to talk to content Manager and Google Admin Console

What went wrong?
The device lost WIFI connectivity and stopped reported to Content Mananger server and stopped reporting back to Google Admin Console.

===========================
2016-08-31T09:04:25.712265-04:00 INFO avahi-daemon[861]: Joining mDNS multicast group on interface wlan0.IPv4 with address <IPv4: 1>32.
2016-08-31T09:04:25.712364-04:00 INFO avahi-daemon[861]: New relevant interface wlan0.IPv4 for mDNS.
2016-08-31T09:04:25.712385-04:00 INFO avahi-daemon[861]: Registering new address record for <IPv4: 1>32 on wlan0.IPv4.
2016-08-31T09:04:31.778272-04:00 ERR shill[792]: [ERROR:http_request.cc(188)] Could not resolve hostname www.gstatic.com: The network connection was timed out
Seem like the failure starting with the WIFI issue and it never recover
2016-08-31T09:03:42.801793-04:00 ERR shill[792]: [ERROR:active_link_monitor.cc(222)] Link monitor has reached the failure threshold with 3 broadcast failures and 2 unicast failures.
2016-08-31T09:03:42.835011-04:00 INFO shill[792]: [INFO:wifi.cc(2162)] WiFi Device wlan0: StartReconnectTimer
2016-08-31T09:03:42.835033-04:00 INFO shill[792]: [INFO:wifi.cc(1734)] In OnLinkMonitorFailure(): Called Reattach().
2016-08-31T09:03:42.899559-04:00 NOTICE wpa_supplicant[342]: wlan0: SME: Trying to authenticate with f4:6d:04:00:00:01 (SSID='1' freq=2417 MHz)
========================

Raj Duraisamy has been aware of this issue 
========================
Tuan Vu <tuan.vu@scala.com>
Nov 3 (1 day ago)

to Raj, Darren, Phil, Jeffrey, Chester 
Hi Raj,

Let me give you background on the failures and the devices.

These devices are not in this state now. I only see 3 incidents with 3 different devices.

On May 27th, 1st device was a Asus Chromebox (Wired connected) on the Canary channel with Chrome Version: 52.0.2743.0. 
On Aug 24th, 2nd device was Asus Chromebit (wifi) on Beta channel with Chrome Version: 52.0.2743.116
On August 31st, 3 device was Asus Chrombox (wifi) on Beta Channel with Chrome Version: 53.0.2785.81 

They all have been configured with Static IP addresses and the DNS is actually the router IP address.

With that out of the way, let me try to answer your questions

Q: Do you have any device in that state now? 
A: No

Q: Are you able to ping 8.8.8.8? default gateway?
A: I can't do that as these devices were running in kiosk mode. Ping these devices at the time of the failure, got no responses. Default gateway is the local gateway 192.168.60.1

Q: If you can ping either of the above with normal sized packets, do larger packets (e.g. `ping -s 1472`) fail?
A: No response from ping when they failed

Q: If you manually disconnect + reconnect WiFi, does that fix it?
A: Restarted the wifi router does not fix the issue. Reboot the device, it fixes the problem.

Q: If you manually disable + re-enable WiFi, does that fix it?
A: Kiosk mode, can't get on the device to disable or enable the WIFI on the unit

Q: If you leave it alone for 24 hours, does the DHCP renewal fail?  (Please feel free to use a shorter DHCP lease to test this)
A: This is static IP addressed device.  No renewal will happen

Hope that helps narrowing down the problem.

==============================================

Did this work before? N/A 

Chrome version: 53.0.2785.81   Channel: beta
OS Version: 53.0.2785.81
Flash Version: Shockwave Flash 23.0 r0

Raj Duraisamy has been aware of this issue and forwarded on.
 
8-31-2016 10-38-42 AM CBX2 Crashed.jpg
30.0 KB View Download
logs_20160901-1210.zip
576 KB Download
Admin OnlineStatus Red.PNG
24.3 KB View Download
CBX2 20160831 Offline but systemlogs uploaded earlier today.PNG
44.3 KB View Download

Comment 1 by roy...@google.com, Nov 4 2016

Cc: royans@chromium.org
Labels: Hotlist-Enterprise
Status: Untriaged (was: Unconfirmed)
1) Can this be reproduced in current stable ?
2) Would you be able to get us wireshark/tcpdump captures/sniffs to see what was going on when this incident was going on
3) And if you can, can you also get us device logs with the time of the incident so that we can follow along on the logs and pcap dumps to see what happened ?

Its not clear when this issue happened in the logs attached. The logs suggests that the device believes network is connected, but network health checks are failing.

08-31T15:57:11.286852-04:00 INFO shill[792]: [INFO:portal_detector.cc(130)] Portal detection completed attempt 2 with phase==DNS, status==Timeout, failures in content==0
2016-08-31T15:57:16.289118-04:00 ERR shill[792]: [ERROR:http_request.cc(188)] Could not resolve hostname www.gstatic.com: The network connection was timed out
2016-08-31T15:57:16.289248-04:00 INFO shill[792]: [INFO:portal_detector.cc(130)] Portal detection completed attempt 3 with phase==DNS, status==Timeout, failures in content==0
2016-08-31T15:57:20.290041-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(276)] Connection diagnostics events:
2016-08-31T15:57:20.290078-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #0: Event: Portal detection          Phase: End (DNS)        Result: Timeout   
2016-08-31T15:57:20.290083-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #1: Event: Ping DNS servers          Phase: Start            Result: Success   
2016-08-31T15:57:20.290087-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #2: Event: Ping DNS servers          Phase: End              Result: Failure   Msg: No DNS servers responded to pings. Pinging first DNS server at <IPv4: 1>
2016-08-31T15:57:20.290091-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #3: Event: Find route                Phase: Start            Result: Success   Msg: Requesting route to <IPv4: 1>
2016-08-31T15:57:20.290096-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #4: Event: Find route                Phase: End              Result: Success   Msg: Found route to <IPv4: 1> (remote)
2016-08-31T15:57:20.290100-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #5: Event: ARP table lookup          Phase: Start            Result: Success   Msg: Finding ARP table entry for <IPv4: 1>
2016-08-31T15:57:20.290104-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(278)]   #6: Event: ARP table lookup          Phase: End              Result: Success   Msg: Found ARP table entry for <IPv4: 1>
2016-08-31T15:57:20.290107-04:00 INFO shill[792]: [INFO:connection_diagnostics.cc(281)] Connection diagnostics completed. Connection issue: This web server appears to be on the local network, but is not responding to pings.
2016-08-31T15:57:50.413661-04:00 INFO shill[792]: [INFO:wifi_service.cc(787)] Representative endpoint updated for service 0. [SSID=1], bssid: f4:6d:04:00:00:01, signal: -15, security: rsn, frequency: 2417
2016-08-31T15:57:51.298474-04:00 ERR shill[792]: [ERROR:http_request.cc(188)] Could not resolve hostname www.gstatic.com: The network connection was timed out

In most cases, such issues are triggered by infrastructure, but in some cases it could be a bug on chromeos. We would need additional logs and pcap dumps to understand root cause.

Comment 2 by roy...@google.com, Nov 4 2016

Owner: royans@chromium.org
Cc: sduraisamy@chromium.org

Comment 4 by tuan...@scala.com, Nov 4 2016

1. I have not seen this issue in the current Stable channel. I dont know how to reproduce it yet. Hope the log give you some clue to how we fall into this and try to reproduce it
2. Wireshark on ContentManager server normally show the traffic to and from the device. However, Wireshark shows no traffic from this device when the failure occurred. I tried to ping the device but no response either.
3. The system log provide covers the time of the failure 15:57 on 08/31/2016. With no traffic to and from the device, Wireshark will show nothing. 


Comment 5 by roy...@google.com, Nov 4 2016

Thanks.

1) So it looks like chromeos device still makes a DHCP call to do something. Its not clear why looking at the logs
2) And it looks like connection issues started right after device did a DHCP check.
3) We may need some more logs to understand if the DHCP request impacted your connectivity. Can you plan on running "network_diag" command using crosh shell next time you see this issue ? This will create a file in the download folder. If possible upload it here or send it to Raj for me.
4) We should also increase the level of network logging when this issue happens. You can do so by going to chrome://net-internals#chromeos. Click on "wifi" to enable the debugging flags.
5) Finally after you do all your tests, send me the complete device log from the same page... click on "Store debug logs" which will create another file in the download folder.

Components: -Internals>Network OS>Systems>Network

Comment 7 by tuan...@scala.com, Nov 4 2016

I don't know why it calls for DHCP and yet it is configured for static IP addresses.
That may be the key of the problem.

Unfortunately, the device runs our app in Kiosk mode and no way get to the chrome://net-internals#chromeos without rebooting it. Once the device rebooted, it works again just fine and have the same IP address. 

You may ask how I know that it is the same IP address- I have converted all my devices to static IP address since July 7th, 2016 ( to be exact). That is the only way for me to capture our application debug log when the device run into issues. Debug log has been continuous since July 7th.

>>3) We may need some more logs to understand if the DHCP request impacted your connectivity. Can you plan on running "network_diag" command using crosh shell next time you see this issue ? This will create a file in the download folder. If possible upload it here or send it to Raj for me.

T: I will definitely need help to get this information for you.

4) We should also increase the level of network logging when this issue happens. You can do so by going to chrome://net-internals#chromeos. Click on "wifi" to enable the debugging flags.

T: Will this debug maintain over the reboot as my device reboots every night. I have to turn on all of the device and hope to see one when the failure happens

Comment 8 by roy...@google.com, Nov 4 2016

Got it.

- Another way to get device logs would be to run the kiosk in a developer mode
- When you notice issue, u can use a keyboard to press CTRL+ALT+F2 to get into shell
- There you can trigger the network_diag and create a tarball for /var/log directory which has all the logs.

I understand this is hard to repro... but when it happens please file a support ticket using this link: https://support.google.com/work/answer/142244?visit_id=undefined&hl=en&rd=1
We can work with you at that point to get the logs. Mention this bug when you file the support case.

You should be able to go to your admin console and click on support panel to get related information on how to reach us.

Comment 9 by tuan...@scala.com, Nov 4 2016

Thanks for the tip Roy!

However, I have 12 devices here running for the last 8-9 months and I saw 3 incidents. The probability for it to happens again shortly is rare and don't know when it will ever show up.

I will keep that in mind when I see it next time.

Comment 10 by roy...@google.com, Jan 13 2017

Status: Archived (was: Untriaged)

Sign in to add a comment