New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 911234 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 649417



Sign in to add a comment

Passive link monitoring on wifi networks broken when shill runs in sandbox

Project Member Reported by jmuppala@google.com, Dec 3

Issue description

Chrome Version: <From about:version: Google Chrome 72.0.3625.0>
Chrome OS Version: <From about:version: Platform 11316.0.0>
Chrome OS Platform: <Nocturne>

Please specify Cr-* of the system to which this bug/feature applies (add
the label below).

Steps To Reproduce:
(1)Add and connect to Openvpn connection.
(2)disconnect/connect the openvpn connection from uber tray couple of times until you see openvpn status mismatch.
(3)status mismatch on chrome://settings/internet -> openvpn ->detailed view.
(4)try to reconnect from uber tray.
(5)openvpn stuck in "reconnecting" state.
(6)openvpn "configure" button from chrome://settings/internet -> openvpn->detailed view : does not have any info configured while the connection was created.sometimes, the previously entered info pops up on its own, and clicking connect successfully connects to openvpn.
(7)if the info is re-entered, and click connect, then the connection is successful sometimes.

NOTE: sometimes, leaving the device idle after a successful openvpn connection and then trying to disconnect it results in status mismatch.


 
Owner: mortonm@chromium.org
I'm going to dig deeper into the communications from chrome->shill and shill->openvpn when doing these transitions.
Cc: briannorris@chromium.org
Summary: Passive link monitoring on wifi networks broken when shill runs in sandbox (was: M72 - UI inconsistent after OpenVpn fails to connect)
When shill runs as shill:shill, processing ARP request packets (as part of passive link monitoring) is broken somehow. This results in shill being unresponsive to Chrome's D-Bus requests (e.g. toggling VPN/WiFi connection buttons) for seconds to minutes while shill is trying to process ARP packets. The error logs look like the following when this happens:

2018-12-04T16:39:30.786737-08:00 ERR shill[1350]: [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable
2018-12-04T16:39:54.049477-08:00 ERR shill[1350]: message repeated 2895805 times: [ [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable]

I'm not sure why running in a sandbox messes with processing ARP packets, but commenting out this line: https://cs.corp.google.com/chromeos_public/src/platform2/shill/link_monitor.cc?rcl=41abfab3354d550e8fe2da94a5a820d67f4196d0&l=137 _fixes_ the root cause of this bug (namely that shill hangs for long periods of time as seen by Chrome).

So we have a hacky way of fixing the bug, but still need to figure out why this bug is happening in the first place so we can come to a better solution. Should be able to get to the bottom of the issue before M72 beta now that we know what is going on.
$ ff_debug --level -4
$ ff_debug +link

Does running the above on a DUT and checking /var/log/net.log give you more visibility? 

This is what I see when I do that (note the last 2 lines are repeated millions of times). Not sure if theres anything interesting here but thanks for the tip

2018-12-05T11:20:50.269347-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(197)] wlan0 SampleTraffic
2018-12-05T11:20:50.269399-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(105)] wlan0 IsCongestedTxQueues
2018-12-05T11:20:50.272613-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(77)] wlan0 BuildIPPortToTxQueueLength
2018-12-05T11:20:50.272646-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=10, TimerState=0
2018-12-05T11:20:50.272661-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272678-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=0.0.0.0, TX=0, State=10, TimerState=0
2018-12-05T11:20:50.272691-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272708-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=6, TimerState=3
2018-12-05T11:20:50.272726-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272754-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2
2018-12-05T11:20:50.272778-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272809-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=1, TimerState=0
2018-12-05T11:20:50.272834-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272871-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2
2018-12-05T11:20:50.272903-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272933-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=10.8.0.6, TX=1, State=9, TimerState=4
2018-12-05T11:20:50.272959-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.272988-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2
2018-12-05T11:20:50.273030-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273059-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=1, TimerState=0
2018-12-05T11:20:50.273084-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273115-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=10.8.0.6, TX=1, State=9, TimerState=4
2018-12-05T11:20:50.273145-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273177-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=6, TimerState=3
2018-12-05T11:20:50.273201-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273230-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=::, TX=0, State=10, TimerState=0
2018-12-05T11:20:50.273260-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273290-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=::, TX=0, State=10, TimerState=0
2018-12-05T11:20:50.273312-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered.
2018-12-05T11:20:50.273337-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(117)] wlan0 IsCongestedTxQueues: No interesting socket info
2018-12-05T11:20:50.273368-08:00 DEBUG shill[1366]: [VERBOSE2:traffic_monitor.cc(70)] wlan0 ResetCongestedTxQueuesStatsWithLogging: Tx-queues decongested
2018-12-05T11:20:50.273401-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(156)] wlan0 IsDnsFailing
2018-12-05T11:20:50.275187-08:00 DEBUG shill[1366]: [VERBOSE2:traffic_monitor.cc(151)] wlan0 ResetDnsFailingStatsWithLogging: DNS queries restored
2018-12-05T11:20:54.061746-08:00 DEBUG shill[1366]: [VERBOSE2:passive_link_monitor.cc(105)] wlan0 In ReceiveRequest.
2018-12-05T11:20:54.061810-08:00 DEBUG shill[1366]: [VERBOSE2:passive_link_monitor.cc(105)] wlan0 In ReceiveRequest.
2018-12-05T11:20:54.061852-08:00 ERR shill[1366]: [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable
This bug is oddly similar, if not identical to  crbug.com/531655 

See C#19 in particular
Must have regressed back to that behavior from 2015 when switching to run shill in minijail. Will have to come up with a CL similar to this one that fixes it: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/319561/
Cc: benchan@chromium.org
Status: Started (was: Untriaged)
Blocking: 649417
Wait, if that CL is the real fix here, isn't this a problem not just with running *shill* as non-root, but with running any of its *subprocesses* in minijail? e.g., this CL?

https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1087359/

So, that goes back all the way to build 10790.0.0, which means M-69 could be affected. And it could also possibly explain some of our other VPN state issues, in which we never had satisfying answers? I'm still not sure the exact situations in which shill's sub-processes will screw up shill's state, so I don't have a 100% answer myself. But it looks to me like the regressed code is live even with shill running as root.
Oh, never mind: the openvpn and l2tp launchers also look for special shill cmdline args before deciding to jail themselves, and we don't use those in shill-as-root modes.

Also, I noticed we already were launching dhcpcd in this way (without cleaning our FDs), but apparently it didn't cause problems? Because that's been there for years.

So indeed, this may only be a problem when we have shill-as-non-root.
Issue 887730 has been merged into this issue.
Labels: ReleaseBlock-Stable
So per bug 887730, we definitely need to give VPN testing (especially OpenVPN) another run on M-72, as shill+openvpn has issues we're fixing here (CL to land shortly).

Marking RBS, so this doesn't slip through the cracks.
Project Member

Comment 12 by bugdroid1@chromium.org, Dec 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/54b9512433186d6992d4b77c0d24f5dc97559b43

commit 54b9512433186d6992d4b77c0d24f5dc97559b43
Author: Micah Morton <mortonm@chromium.org>
Date: Mon Dec 10 01:39:27 2018

shill: close open fds when spawning new processes

 crbug.com/531655  resurfaced when trying to spawn shill in a minijail. So
we need the same solution as before: close open file descriptors when
spawning processes. Was already done here before: CL:319561.

CQ-DEPEND=CL:1365089
BUG= chromium:911234 
TEST=manual testing VPN connection UI

Change-Id: I61a39ca3e24974b9b0c6ffd3f735e527ab6a2851
Reviewed-on: https://chromium-review.googlesource.com/1365088
Commit-Ready: Micah Morton <mortonm@chromium.org>
Tested-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>

[modify] https://crrev.com/54b9512433186d6992d4b77c0d24f5dc97559b43/shill/process_manager_test.cc
[modify] https://crrev.com/54b9512433186d6992d4b77c0d24f5dc97559b43/shill/process_manager.cc

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/8ad82549bbe251b68d39833483d5812bc8bd57bd

commit 8ad82549bbe251b68d39833483d5812bc8bd57bd
Author: Micah Morton <mortonm@chromium.org>
Date: Mon Dec 10 01:39:27 2018

libbrillo: add CloseOpenFds function to brillo::Minijail

Calls minijail_close_open_fds().

BUG= chromium:911234 
TEST=manual testing for chromium:911234

Change-Id: Ib557a6a3efeb79cb72f8640ad5303d77925654fb
Reviewed-on: https://chromium-review.googlesource.com/1365089
Commit-Ready: Micah Morton <mortonm@chromium.org>
Tested-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Micah Morton <mortonm@chromium.org>

[modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/mock_minijail.h
[modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/minijail.cc
[modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/minijail.h

Labels: Merge-Request-72
Status: Fixed (was: Started)
Still needs to be merged to M72 and then verified
Labels: -Merge-Request-72 Merge-Approved-72
Project Member

Comment 16 by bugdroid1@chromium.org, Dec 11

Labels: merge-merged-release-R72-11316.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/7c726bbd9770018c6b940fd30bb00679cd1d8b74

commit 7c726bbd9770018c6b940fd30bb00679cd1d8b74
Author: Micah Morton <mortonm@chromium.org>
Date: Tue Dec 11 17:49:24 2018

libbrillo: add CloseOpenFds function to brillo::Minijail

Calls minijail_close_open_fds().

BUG= chromium:911234 
TEST=manual testing for chromium:911234

Change-Id: Ib557a6a3efeb79cb72f8640ad5303d77925654fb
Reviewed-on: https://chromium-review.googlesource.com/1365089
Commit-Ready: Micah Morton <mortonm@chromium.org>
Tested-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Micah Morton <mortonm@chromium.org>
(cherry picked from commit 8ad82549bbe251b68d39833483d5812bc8bd57bd)
Reviewed-on: https://chromium-review.googlesource.com/c/1370853
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>
Commit-Queue: Eric Caruso <ejcaruso@chromium.org>
Tested-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/mock_minijail.h
[modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/minijail.cc
[modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/minijail.h

Labels: -Merge-Approved-72 Merge-merged-72
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/a778987f70f8a28146e72f38e0c9716890fa73fb

commit a778987f70f8a28146e72f38e0c9716890fa73fb
Author: Micah Morton <mortonm@chromium.org>
Date: Tue Dec 11 18:03:26 2018

shill: close open fds when spawning new processes

 crbug.com/531655  resurfaced when trying to spawn shill in a minijail. So
we need the same solution as before: close open file descriptors when
spawning processes. Was already done here before: CL:319561.

CQ-DEPEND=CL:1370853
BUG= chromium:911234 
TEST=manual testing VPN connection UI

Change-Id: I61a39ca3e24974b9b0c6ffd3f735e527ab6a2851
Reviewed-on: https://chromium-review.googlesource.com/1365088
Commit-Ready: Micah Morton <mortonm@chromium.org>
Tested-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
(cherry picked from commit 54b9512433186d6992d4b77c0d24f5dc97559b43)
Reviewed-on: https://chromium-review.googlesource.com/c/1370796
Commit-Queue: Micah Morton <mortonm@chromium.org>

[modify] https://crrev.com/a778987f70f8a28146e72f38e0c9716890fa73fb/shill/process_manager_test.cc
[modify] https://crrev.com/a778987f70f8a28146e72f38e0c9716890fa73fb/shill/process_manager.cc

Sign in to add a comment