New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 859867 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

crosh: network_diag and other commands broken in 69

Project Member Reported by jayhlee@google.com, Jul 3

Issue description

Chrome OS Version: M69 dev/canary
Network info: vanilla WiFi

Steps To Reproduce:
(1) Use Chrome OS 68 beta channel. Observe, "network_diag --hosts" and other crosh commands work as intended.
(2) Upgrade to Chrome OS 69 dev or canary.
(3) network_diag --hosts always fails with DNS resolution errors.'
(4) network_diag alone also fails. tracepath fails

Expected Result:
crosh network tools get network connectivity

Actual Result:
Tools seem to have no network connectivity despite working WiFi

How frequently does this problem reproduce? (Always, sometimes, hard to
reproduce?)
10/10

What is the impact to the user, and is there a workaround? If so, what is
it?
Loss of connectivity and debug tools
 
network_diagnostics_2018-07-03.08-38-03.txt
2.2 KB View Download
network_diagnostics_2018-07-03.08-36-45.txt
16.1 KB View Download
Cc: vapier@chromium.org
Components: -Platform>Apps>Default>Hterm OS>Systems
Labels: allpublic
Owner: mortonm@chromium.org
probably due to the recent shill depriv work
agreed, I'll look into it
the line that fails is here: https://cs.corp.google.com/chromeos_public/src/platform2/crosh/network_diag?rcl=359f23e5ab10d2cde1f4322f3f4bccd391f93184&l=111

The awk command fails to open /etc/resolv.conf, which is a symlink to /run/shill/resolv.conf. Strangely, if I do an ls -l on /run at that spot in the network_diag script, I see this:

total 0
drwxr-xr-x. 3 root       root       60 Jul  9 09:51 arc
drwxr-xr-x. 2 root       root       40 Jul  9 09:52 containers
drwxr-xr-x. 2 root       root       60 Jul  9 09:51 cups
drwxr-xr-x. 2 messagebus messagebus 60 Jul  9 09:51 dbus

instead of what I would expect to see, something like this:

total 20
-rw-------.  1 root       root         0 Jul  9 09:51 agetty.reload
drwxr-xr-x.  2 root       root        80 Jul  9 09:51 anomaly-collector
drwxr-xr-x. 11 root       root       300 Jul  9 09:52 arc
drwxr-xr-x.  2 avahi      avahi       80 Jul  9 09:51 avahi-daemon
drwxrwx---.  4 avfs       avfs        80 Jul  9 09:51 avfsroot
drwxr-x---.  2 bluetooth  bluetooth   40 Jul  9 09:51 bluetooth
drwxr-x---.  2 arc-camera arc-camera  60 Jul  9 09:51 camera
drwxr-xr-x.  2 chronos    chronos     80 Jul  9 09:52 chrome
drwxr-xr-x.  2 root       root        40 Jul  9 09:52 containers
drwxrwx--T.  2 cras       cras        60 Jul  9 09:51 cras
drwxr-xr-x.  2 root       root        40 Jul  9 09:51 crash_reporter
drwx------.  2 root       root        80 Jul  9 09:51 cros-machine-id-regen
drwx------.  4 root       root        80 Jul  9 09:52 cryptohome
drwxr-xr-x.  2 root       root        60 Jul  9 09:51 cups
drwxr-xr-x.  2 messagebus messagebus  60 Jul  9 09:51 dbus
-rw-r--r--.  1 root       root         4 Jul  9 09:51 dbus.pid
drwxr-xr-x.  2 root       root         0 Jul  9 09:51 debugfs_gpu
drwxr-xr-x.  2 dhcp       dhcp        60 Jul  9 09:51 dhcpcd
drwx------.  2 root       root       120 Jul  9 09:51 frecon
drwxr-xr-x.  2 cros-disks cros-disks  40 Jul  9 09:51 fuse
drwxr-xr-x.  2 root       root        40 Jul  9 09:51 imageloader
drwxrwxr-x.  2 root       ippusb      60 Jul  9 09:51 ippusb
drwxrwx---.  2 shill      shill       40 Jul  9 09:51 ipsec
drwxrwx---.  2 shill      shill       40 Jul  9 09:51 l2tpipsec_vpn
drwxrwxrwt.  3 root       root       120 Jul  9 09:51 lock
drwx--x--x.  2 root       root        40 Jul  9 09:51 lockbox
drwxr-xr-x.  2 root       root        60 Jul  9 09:51 metrics
-rw-r--r--.  1 root       root         0 Jul  9 09:51 metrics.mount-encrypted
drwxr-xr-x.  2 root       root        60 Jul  9 09:51 mount
-rw-r--r--.  1 root       root         5 Jul  9 09:51 patchoat.pid
drwxr-xr-x.  4 root       root        80 Jul  9 09:51 power_manager
drwx------.  2 root       root       100 Jul  9 09:52 session_manager
drwxr-xr-x.  3 root       root       120 Jul  9 09:52 shill
-rw-r--r--.  1 root       root         5 Jul  9 09:51 sshd.pid
drwx--x---.  2 root       root        60 Jul  9 09:52 state
srw-rw----.  1 tss        tss          0 Jul  9 09:51 tcsd.socket
-rw-r--r--.  1 root       root        53 Jul  9 09:51 tpm_firmware_update_location
drwxr-xr-x.  6 root       root       140 Jul  9 10:23 udev
-rw-r--r--.  1 root       root         5 Jul  9 09:51 upstart-socket-bridge.pid
drwx--x---.  2 root       chronos     40 Jul  9 09:51 user_policy
drwxr-x---.  2 wpa        wpa         60 Jul  9 09:51 wpa_supplicant

Maybe there's some kind of mount/pivot root/container/minijail thing going on that I dont understand but for now it makes sense that network_diag can't read /run/shill/resolv.conf if it can't even see /run/shill. But I don't know how any recent changes could have caused this, so still digging.

we run debugd in a reduced mount environment which is /run is limited (by design).  try this CL i sent you:
  https://chromium-review.googlesource.com/1129330
thanks I'll try it
Yep, that CL fixes the issue from what I see. Definitely fixes 'network_diag --hosts' command, and looks like it fixes 'network_diag' command in general. I'm at a loss for why these failures just came up, since shill had been symlinking /etc/resolv.conf to /run/shill/resolv.conf since long before I started working on this code. I recently changed the ownership of /run/shill from root:root to shill:shill in https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1087527 , but not sure how this could have triggered the behavior we've seen here. 
Cc: lhchavez@chromium.org mortonm@chromium.org
Owner: vapier@chromium.org
Status: Started (was: Unconfirmed)
i agree that it seems like this should have been broken for a while once this landed:
  https://chromium-review.googlesource.com/1053426

maybe it was only noticed recently due to the acceleration of branching R69.  i'll take this bug on the assumption the CL i just posted fixes things.
fwiw I did see the issue maybe a week or two before filing this on canary channel but "well it's canary" and fighting other fires distracted me at that time.
ah ok makes sense. maybe it was broken for a month or so then.
Project Member

Comment 10 by bugdroid1@chromium.org, Jul 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/65884e62c0d239a7d934a7b666441b8184b306e7

commit 65884e62c0d239a7d934a7b666441b8184b306e7
Author: Mike Frysinger <vapier@chromium.org>
Date: Tue Jul 10 00:59:25 2018

debugd: bind mount /run/shill for /etc/resolv.conf

BUG= chromium:859867 
TEST=precq passes

Change-Id: Ied761db4facdee7dba2a1ad91093db241e370af9
Reviewed-on: https://chromium-review.googlesource.com/1129330
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Micah Morton <mortonm@chromium.org>

[modify] https://crrev.com/65884e62c0d239a7d934a7b666441b8184b306e7/debugd/src/main.cc

Status: Fixed (was: Started)

Sign in to add a comment