crosh: network_diag and other commands broken in 69 |
|||
Issue descriptionChrome OS Version: M69 dev/canary Network info: vanilla WiFi Steps To Reproduce: (1) Use Chrome OS 68 beta channel. Observe, "network_diag --hosts" and other crosh commands work as intended. (2) Upgrade to Chrome OS 69 dev or canary. (3) network_diag --hosts always fails with DNS resolution errors.' (4) network_diag alone also fails. tracepath fails Expected Result: crosh network tools get network connectivity Actual Result: Tools seem to have no network connectivity despite working WiFi How frequently does this problem reproduce? (Always, sometimes, hard to reproduce?) 10/10 What is the impact to the user, and is there a workaround? If so, what is it? Loss of connectivity and debug tools
,
Jul 5
agreed, I'll look into it
,
Jul 9
the line that fails is here: https://cs.corp.google.com/chromeos_public/src/platform2/crosh/network_diag?rcl=359f23e5ab10d2cde1f4322f3f4bccd391f93184&l=111 The awk command fails to open /etc/resolv.conf, which is a symlink to /run/shill/resolv.conf. Strangely, if I do an ls -l on /run at that spot in the network_diag script, I see this: total 0 drwxr-xr-x. 3 root root 60 Jul 9 09:51 arc drwxr-xr-x. 2 root root 40 Jul 9 09:52 containers drwxr-xr-x. 2 root root 60 Jul 9 09:51 cups drwxr-xr-x. 2 messagebus messagebus 60 Jul 9 09:51 dbus instead of what I would expect to see, something like this: total 20 -rw-------. 1 root root 0 Jul 9 09:51 agetty.reload drwxr-xr-x. 2 root root 80 Jul 9 09:51 anomaly-collector drwxr-xr-x. 11 root root 300 Jul 9 09:52 arc drwxr-xr-x. 2 avahi avahi 80 Jul 9 09:51 avahi-daemon drwxrwx---. 4 avfs avfs 80 Jul 9 09:51 avfsroot drwxr-x---. 2 bluetooth bluetooth 40 Jul 9 09:51 bluetooth drwxr-x---. 2 arc-camera arc-camera 60 Jul 9 09:51 camera drwxr-xr-x. 2 chronos chronos 80 Jul 9 09:52 chrome drwxr-xr-x. 2 root root 40 Jul 9 09:52 containers drwxrwx--T. 2 cras cras 60 Jul 9 09:51 cras drwxr-xr-x. 2 root root 40 Jul 9 09:51 crash_reporter drwx------. 2 root root 80 Jul 9 09:51 cros-machine-id-regen drwx------. 4 root root 80 Jul 9 09:52 cryptohome drwxr-xr-x. 2 root root 60 Jul 9 09:51 cups drwxr-xr-x. 2 messagebus messagebus 60 Jul 9 09:51 dbus -rw-r--r--. 1 root root 4 Jul 9 09:51 dbus.pid drwxr-xr-x. 2 root root 0 Jul 9 09:51 debugfs_gpu drwxr-xr-x. 2 dhcp dhcp 60 Jul 9 09:51 dhcpcd drwx------. 2 root root 120 Jul 9 09:51 frecon drwxr-xr-x. 2 cros-disks cros-disks 40 Jul 9 09:51 fuse drwxr-xr-x. 2 root root 40 Jul 9 09:51 imageloader drwxrwxr-x. 2 root ippusb 60 Jul 9 09:51 ippusb drwxrwx---. 2 shill shill 40 Jul 9 09:51 ipsec drwxrwx---. 2 shill shill 40 Jul 9 09:51 l2tpipsec_vpn drwxrwxrwt. 3 root root 120 Jul 9 09:51 lock drwx--x--x. 2 root root 40 Jul 9 09:51 lockbox drwxr-xr-x. 2 root root 60 Jul 9 09:51 metrics -rw-r--r--. 1 root root 0 Jul 9 09:51 metrics.mount-encrypted drwxr-xr-x. 2 root root 60 Jul 9 09:51 mount -rw-r--r--. 1 root root 5 Jul 9 09:51 patchoat.pid drwxr-xr-x. 4 root root 80 Jul 9 09:51 power_manager drwx------. 2 root root 100 Jul 9 09:52 session_manager drwxr-xr-x. 3 root root 120 Jul 9 09:52 shill -rw-r--r--. 1 root root 5 Jul 9 09:51 sshd.pid drwx--x---. 2 root root 60 Jul 9 09:52 state srw-rw----. 1 tss tss 0 Jul 9 09:51 tcsd.socket -rw-r--r--. 1 root root 53 Jul 9 09:51 tpm_firmware_update_location drwxr-xr-x. 6 root root 140 Jul 9 10:23 udev -rw-r--r--. 1 root root 5 Jul 9 09:51 upstart-socket-bridge.pid drwx--x---. 2 root chronos 40 Jul 9 09:51 user_policy drwxr-x---. 2 wpa wpa 60 Jul 9 09:51 wpa_supplicant Maybe there's some kind of mount/pivot root/container/minijail thing going on that I dont understand but for now it makes sense that network_diag can't read /run/shill/resolv.conf if it can't even see /run/shill. But I don't know how any recent changes could have caused this, so still digging.
,
Jul 9
we run debugd in a reduced mount environment which is /run is limited (by design). try this CL i sent you: https://chromium-review.googlesource.com/1129330
,
Jul 9
thanks I'll try it
,
Jul 9
Yep, that CL fixes the issue from what I see. Definitely fixes 'network_diag --hosts' command, and looks like it fixes 'network_diag' command in general. I'm at a loss for why these failures just came up, since shill had been symlinking /etc/resolv.conf to /run/shill/resolv.conf since long before I started working on this code. I recently changed the ownership of /run/shill from root:root to shill:shill in https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1087527 , but not sure how this could have triggered the behavior we've seen here.
,
Jul 9
i agree that it seems like this should have been broken for a while once this landed: https://chromium-review.googlesource.com/1053426 maybe it was only noticed recently due to the acceleration of branching R69. i'll take this bug on the assumption the CL i just posted fixes things.
,
Jul 9
fwiw I did see the issue maybe a week or two before filing this on canary channel but "well it's canary" and fighting other fires distracted me at that time.
,
Jul 9
ah ok makes sense. maybe it was broken for a month or so then.
,
Jul 10
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/65884e62c0d239a7d934a7b666441b8184b306e7 commit 65884e62c0d239a7d934a7b666441b8184b306e7 Author: Mike Frysinger <vapier@chromium.org> Date: Tue Jul 10 00:59:25 2018 debugd: bind mount /run/shill for /etc/resolv.conf BUG= chromium:859867 TEST=precq passes Change-Id: Ied761db4facdee7dba2a1ad91093db241e370af9 Reviewed-on: https://chromium-review.googlesource.com/1129330 Commit-Ready: Mike Frysinger <vapier@chromium.org> Tested-by: Mike Frysinger <vapier@chromium.org> Reviewed-by: Micah Morton <mortonm@chromium.org> [modify] https://crrev.com/65884e62c0d239a7d934a7b666441b8184b306e7/debugd/src/main.cc
,
Jul 10
|
|||
►
Sign in to add a comment |
|||
Comment 1 by vapier@google.com
, Jul 4Components: -Platform>Apps>Default>Hterm OS>Systems
Labels: allpublic
Owner: mortonm@chromium.org