Passive link monitoring on wifi networks broken when shill runs in sandbox |
|||||||||
Issue descriptionChrome Version: <From about:version: Google Chrome 72.0.3625.0> Chrome OS Version: <From about:version: Platform 11316.0.0> Chrome OS Platform: <Nocturne> Please specify Cr-* of the system to which this bug/feature applies (add the label below). Steps To Reproduce: (1)Add and connect to Openvpn connection. (2)disconnect/connect the openvpn connection from uber tray couple of times until you see openvpn status mismatch. (3)status mismatch on chrome://settings/internet -> openvpn ->detailed view. (4)try to reconnect from uber tray. (5)openvpn stuck in "reconnecting" state. (6)openvpn "configure" button from chrome://settings/internet -> openvpn->detailed view : does not have any info configured while the connection was created.sometimes, the previously entered info pops up on its own, and clicking connect successfully connects to openvpn. (7)if the info is re-entered, and click connect, then the connection is successful sometimes. NOTE: sometimes, leaving the device idle after a successful openvpn connection and then trying to disconnect it results in status mismatch.
,
Dec 5
When shill runs as shill:shill, processing ARP request packets (as part of passive link monitoring) is broken somehow. This results in shill being unresponsive to Chrome's D-Bus requests (e.g. toggling VPN/WiFi connection buttons) for seconds to minutes while shill is trying to process ARP packets. The error logs look like the following when this happens: 2018-12-04T16:39:30.786737-08:00 ERR shill[1350]: [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable 2018-12-04T16:39:54.049477-08:00 ERR shill[1350]: message repeated 2895805 times: [ [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable] I'm not sure why running in a sandbox messes with processing ARP packets, but commenting out this line: https://cs.corp.google.com/chromeos_public/src/platform2/shill/link_monitor.cc?rcl=41abfab3354d550e8fe2da94a5a820d67f4196d0&l=137 _fixes_ the root cause of this bug (namely that shill hangs for long periods of time as seen by Chrome). So we have a hacky way of fixing the bug, but still need to figure out why this bug is happening in the first place so we can come to a better solution. Should be able to get to the bottom of the issue before M72 beta now that we know what is going on.
,
Dec 5
$ ff_debug --level -4 $ ff_debug +link Does running the above on a DUT and checking /var/log/net.log give you more visibility?
,
Dec 5
This is what I see when I do that (note the last 2 lines are repeated millions of times). Not sure if theres anything interesting here but thanks for the tip 2018-12-05T11:20:50.269347-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(197)] wlan0 SampleTraffic 2018-12-05T11:20:50.269399-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(105)] wlan0 IsCongestedTxQueues 2018-12-05T11:20:50.272613-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(77)] wlan0 BuildIPPortToTxQueueLength 2018-12-05T11:20:50.272646-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=10, TimerState=0 2018-12-05T11:20:50.272661-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272678-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=0.0.0.0, TX=0, State=10, TimerState=0 2018-12-05T11:20:50.272691-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272708-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=6, TimerState=3 2018-12-05T11:20:50.272726-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272754-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2 2018-12-05T11:20:50.272778-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272809-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=1, TimerState=0 2018-12-05T11:20:50.272834-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272871-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2 2018-12-05T11:20:50.272903-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272933-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=10.8.0.6, TX=1, State=9, TimerState=4 2018-12-05T11:20:50.272959-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.272988-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=172.16.10.41, TX=0, State=1, TimerState=2 2018-12-05T11:20:50.273030-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273059-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=1, TimerState=0 2018-12-05T11:20:50.273084-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273115-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=10.8.0.6, TX=1, State=9, TimerState=4 2018-12-05T11:20:50.273145-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273177-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=127.0.0.1, TX=0, State=6, TimerState=3 2018-12-05T11:20:50.273201-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273230-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=::, TX=0, State=10, TimerState=0 2018-12-05T11:20:50.273260-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273290-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(80)] wlan0 SocketInfo(IP=::, TX=0, State=10, TimerState=0 2018-12-05T11:20:50.273312-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(91)] wlan0 Connection Filtered. 2018-12-05T11:20:50.273337-08:00 DEBUG shill[1366]: [VERBOSE3:traffic_monitor.cc(117)] wlan0 IsCongestedTxQueues: No interesting socket info 2018-12-05T11:20:50.273368-08:00 DEBUG shill[1366]: [VERBOSE2:traffic_monitor.cc(70)] wlan0 ResetCongestedTxQueuesStatsWithLogging: Tx-queues decongested 2018-12-05T11:20:50.273401-08:00 DEBUG shill[1366]: [VERBOSE4:traffic_monitor.cc(156)] wlan0 IsDnsFailing 2018-12-05T11:20:50.275187-08:00 DEBUG shill[1366]: [VERBOSE2:traffic_monitor.cc(151)] wlan0 ResetDnsFailingStatsWithLogging: DNS queries restored 2018-12-05T11:20:54.061746-08:00 DEBUG shill[1366]: [VERBOSE2:passive_link_monitor.cc(105)] wlan0 In ReceiveRequest. 2018-12-05T11:20:54.061810-08:00 DEBUG shill[1366]: [VERBOSE2:passive_link_monitor.cc(105)] wlan0 In ReceiveRequest. 2018-12-05T11:20:54.061852-08:00 ERR shill[1366]: [ERROR:arp_client.cc(118)] Socket recvfrom failed: Resource temporarily unavailable
,
Dec 5
This bug is oddly similar, if not identical to crbug.com/531655 See C#19 in particular
,
Dec 5
Must have regressed back to that behavior from 2015 when switching to run shill in minijail. Will have to come up with a CL similar to this one that fixes it: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/319561/
,
Dec 6
,
Dec 6
Wait, if that CL is the real fix here, isn't this a problem not just with running *shill* as non-root, but with running any of its *subprocesses* in minijail? e.g., this CL? https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1087359/ So, that goes back all the way to build 10790.0.0, which means M-69 could be affected. And it could also possibly explain some of our other VPN state issues, in which we never had satisfying answers? I'm still not sure the exact situations in which shill's sub-processes will screw up shill's state, so I don't have a 100% answer myself. But it looks to me like the regressed code is live even with shill running as root.
,
Dec 6
Oh, never mind: the openvpn and l2tp launchers also look for special shill cmdline args before deciding to jail themselves, and we don't use those in shill-as-root modes. Also, I noticed we already were launching dhcpcd in this way (without cleaning our FDs), but apparently it didn't cause problems? Because that's been there for years. So indeed, this may only be a problem when we have shill-as-non-root.
,
Dec 8
Issue 887730 has been merged into this issue.
,
Dec 8
So per bug 887730, we definitely need to give VPN testing (especially OpenVPN) another run on M-72, as shill+openvpn has issues we're fixing here (CL to land shortly). Marking RBS, so this doesn't slip through the cracks.
,
Dec 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/54b9512433186d6992d4b77c0d24f5dc97559b43 commit 54b9512433186d6992d4b77c0d24f5dc97559b43 Author: Micah Morton <mortonm@chromium.org> Date: Mon Dec 10 01:39:27 2018 shill: close open fds when spawning new processes crbug.com/531655 resurfaced when trying to spawn shill in a minijail. So we need the same solution as before: close open file descriptors when spawning processes. Was already done here before: CL:319561. CQ-DEPEND=CL:1365089 BUG= chromium:911234 TEST=manual testing VPN connection UI Change-Id: I61a39ca3e24974b9b0c6ffd3f735e527ab6a2851 Reviewed-on: https://chromium-review.googlesource.com/1365088 Commit-Ready: Micah Morton <mortonm@chromium.org> Tested-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> [modify] https://crrev.com/54b9512433186d6992d4b77c0d24f5dc97559b43/shill/process_manager_test.cc [modify] https://crrev.com/54b9512433186d6992d4b77c0d24f5dc97559b43/shill/process_manager.cc
,
Dec 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/8ad82549bbe251b68d39833483d5812bc8bd57bd commit 8ad82549bbe251b68d39833483d5812bc8bd57bd Author: Micah Morton <mortonm@chromium.org> Date: Mon Dec 10 01:39:27 2018 libbrillo: add CloseOpenFds function to brillo::Minijail Calls minijail_close_open_fds(). BUG= chromium:911234 TEST=manual testing for chromium:911234 Change-Id: Ib557a6a3efeb79cb72f8640ad5303d77925654fb Reviewed-on: https://chromium-review.googlesource.com/1365089 Commit-Ready: Micah Morton <mortonm@chromium.org> Tested-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Micah Morton <mortonm@chromium.org> [modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/mock_minijail.h [modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/minijail.cc [modify] https://crrev.com/8ad82549bbe251b68d39833483d5812bc8bd57bd/libbrillo/brillo/minijail/minijail.h
,
Dec 10
Still needs to be merged to M72 and then verified
,
Dec 10
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/7c726bbd9770018c6b940fd30bb00679cd1d8b74 commit 7c726bbd9770018c6b940fd30bb00679cd1d8b74 Author: Micah Morton <mortonm@chromium.org> Date: Tue Dec 11 17:49:24 2018 libbrillo: add CloseOpenFds function to brillo::Minijail Calls minijail_close_open_fds(). BUG= chromium:911234 TEST=manual testing for chromium:911234 Change-Id: Ib557a6a3efeb79cb72f8640ad5303d77925654fb Reviewed-on: https://chromium-review.googlesource.com/1365089 Commit-Ready: Micah Morton <mortonm@chromium.org> Tested-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Micah Morton <mortonm@chromium.org> (cherry picked from commit 8ad82549bbe251b68d39833483d5812bc8bd57bd) Reviewed-on: https://chromium-review.googlesource.com/c/1370853 Reviewed-by: Eric Caruso <ejcaruso@chromium.org> Commit-Queue: Eric Caruso <ejcaruso@chromium.org> Tested-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/mock_minijail.h [modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/minijail.cc [modify] https://crrev.com/7c726bbd9770018c6b940fd30bb00679cd1d8b74/libbrillo/brillo/minijail/minijail.h
,
Dec 11
,
Dec 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/a778987f70f8a28146e72f38e0c9716890fa73fb commit a778987f70f8a28146e72f38e0c9716890fa73fb Author: Micah Morton <mortonm@chromium.org> Date: Tue Dec 11 18:03:26 2018 shill: close open fds when spawning new processes crbug.com/531655 resurfaced when trying to spawn shill in a minijail. So we need the same solution as before: close open file descriptors when spawning processes. Was already done here before: CL:319561. CQ-DEPEND=CL:1370853 BUG= chromium:911234 TEST=manual testing VPN connection UI Change-Id: I61a39ca3e24974b9b0c6ffd3f735e527ab6a2851 Reviewed-on: https://chromium-review.googlesource.com/1365088 Commit-Ready: Micah Morton <mortonm@chromium.org> Tested-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Micah Morton <mortonm@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> (cherry picked from commit 54b9512433186d6992d4b77c0d24f5dc97559b43) Reviewed-on: https://chromium-review.googlesource.com/c/1370796 Commit-Queue: Micah Morton <mortonm@chromium.org> [modify] https://crrev.com/a778987f70f8a28146e72f38e0c9716890fa73fb/shill/process_manager_test.cc [modify] https://crrev.com/a778987f70f8a28146e72f38e0c9716890fa73fb/shill/process_manager.cc |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by mortonm@chromium.org
, Dec 4