New issue
Advanced search Search tips

Issue 878034 link

Starred by 22 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Feature


Show other hotlists

Hotlists containing this issue:
LXD


Sign in to add a comment

minikube: enable kernel features it needs

Project Member Reported by richardrose@google.com, Aug 27

Issue description

UserAgent: Mozilla/5.0 (X11; CrOS x86_64 10895.10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.21 Safari/537.36
Platform: 10895.10.0 (Official Build) dev-channel eve

Steps to reproduce the problem:
1. Install docker
2.  Install Minikube
3. 

What is the expected behavior?
Docker - storage graph driver is set to aufs or overlay
Minikube - kubelet proxy available

What went wrong?
Docker storage graph driver implementation does not support either aufs or overlay, these would be useful to add for backwards compatibility. At present only btrfs is enabled by default in the current build.

Minikube requires additional linux kernel modules to be available to sucessfully establish the proxy network. Based on a limited investigation, I believe the following items need to be added: ip_tables, nf_nat, overlay and aufs, netlink_diag. 

Looking at the error log for Minikube the following is evident (I have attached the steps taken for this task and the error log output as part of this ticket).

Aug 26 20:33:32 minikube kubelet[8632]: W0826 20:33:32.635150    8632 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d

Aug 26 20:33:32 minikube kubelet[8632]: W0826 20:33:32.671714    8632 fs.go:216] stat failed on /dev/vdb with error: no such file or directory

Aug 26 20:33:32 minikube kubelet[8632]: F0826 20:33:32.675456    8632 server.go:233] failed to run Kubelet: open /proc/swaps: no such file or directory

Did this work before? N/A 

Chrome version: 69.0.3497.21  Channel: dev
OS Version: 10895.10.0
Flash Version: 30.0.0.142 /opt/google/chrome/pepper/libpepflashplayer.so

Happy to provide more details if required. I have attached a breakdown of the approach taken inclusive of the error log generated.
 
Crostini - Minikube.pdf
137 KB Download
Cc: smbar...@chromium.org
Components: -Platform>DevTools OS>Systems>Containers
Labels: Proj-Containers
I don't know that we'll be able to use aufs or overlay, given we run our LXD containers without any privileges.

We should definitely get minikube up and running though.
Summary: Linux module extensions for containers hardened kernel (was: Linux module extensions for crosh hardened kernel)
we need to focus this bug otherwise it's going to be unmanagable.  what do we want to focus on here ?  minikube ?
Labels: allpublic
Primary focus would definitely be minikube - getting this working would be a major win. 

The Docker request was more for compatibility - but I take your point regarding privileges, which is something I didnt consider as part of my request.
Summary: minikube: enable kernel features it needs (was: Linux module extensions for containers hardened kernel)
if minikube is failing due to /proc/swaps not existing, that's a bug in minikube imo.  there's no reason it should need that.

i don't know what /dev/vdb is supposed to contain.  some virtual disks ?
Labels: -Type-Bug -Pri-2 Pri-3 Type-Feature
Owner: vapier@chromium.org
Status: Assigned (was: Unconfirmed)
<triage>@vapier, feel free to reassign if needed</triage>
Owner: ----
Status: Available (was: Assigned)
sorry, i don't have the cycles or experience for this.  i've never run kubernetes or docker stuff before.  i'm just here for the general distro angle ;).
If it helps, I included a guide doc on installation of minikube as part of the original ticket.


From what I can tell, it looks like the issues may relate to br_netfilter module not being enabled? 

When the minikube Linux container is launched, there needs to be a /proc/sys/net/bridge directory created. This sub directory should contain a r/w file named bridge-nf-call-iptables to indicate whether the proxy state has been set. As this directory does not exist, there is no way to indicate to kubeadm init/minikube that they own proxy management. Outside of lxc, this can be performed using "modprobe br_netfilter" and then rebooting. I am not clear how to effect the same result within lxc, but it seems to be worth investigating further?

I believe this is managed differently under Docker, which uses sysctl to set this indicator. I am unsure if it actually references the same value or if it just uses a separate config file.

Again, hope this helps, if not please ignore 🙄

Cc: tbuck...@chromium.org
Owner: smbar...@chromium.org
Do you know whom would be a good owner for this Stephen, and what the timeline should be?
Status: Assigned (was: Available)
I'm probably a good owner since I'm the most familiar with this environment.

Optimistically M-71 if it's just a few kernel configs, but I could see this taking until M-72. I suspect we'll need to fix the Docker keyctl issue before this will work as well.
Hey - just checking in, any update on whether this is feasible/potential issues?
I don't think this is feasible right now without some help from Kubernetes to support running in unprivileged LXD containers.

We have most of the kernel configs that are needed already, aside from NETLINK_DIAG. I think the only way to run this right now is as a privileged container, which we don't want to do with crostini: https://blog.ubuntu.com/2017/02/20/running-kubernetes-inside-lxd

The restrictions that I'm aware of causing issues for kubernetes:
1) Only btrfs and dir storage drivers for docker are available. overlayfs would only work with privileged containers.
2) Swap cannot be controlled from the container. This is not namespaced and requires privileges.
3) sysctls like net.bridge.* are also not namespaced (only valid in the initial netns). The configs are already built into the VM kernel but won't be available to unprivileged containers.
Thanks for detailed analysis :-)

I had used the following configuration (as it doesnt use Conjure), which resulted in similar results.  
https://gist.github.com/bat9r/76610a778f53f4dfbb5bc887bc2f3cce

Re: The restrictions that I'm aware of causing issues for kubernetes:
1) Only btrfs and dir storage drivers for docker are available. overlayfs would only work with privileged containers.

Ok , if I understand this correctly, this should not be a showstopper as the latest Minikube version has been updated to support BTRFS. Any change for this I would think would be software related i.e. Minikube/K8s to achieve compatibility?

2) Swap cannot be controlled from the container. This is not namespaced and requires privileges.

My assumption here is that /proc/swaps is part of the virtual filesystem, so making this entry available and read only might be be possible? In the instance of Kubernetes, as far as I am aware Swap needs be turned off i.e. "swapoff -a", but what you are saying is that this will not do anything as this setting cannot be performed from within the container - right?

3) sysctls like net.bridge.* are also not namespaced (only valid in the initial netns). The configs are already built into the VM kernel but won't be available to unprivileged containers.

Ok, I had assumed this works in the same way between Docker and K8s, but this is not the case. This to me seems like the main sticking point - as the other issues, can most likely have software workarounds. However without the ability to bridge/NAT traffic, it is unlikely to be possible to run kubernetes?

cheers

Rich 



presumably if the code is looking for /proc/swaps, it doesn't want to just merely read the file, it wants to manage/mess around with adding/removing swap devices.  we don't have swap support enabled in the kernel and currently have no plans to enable it.  so i don't see adding a stub/zero byte /proc/swaps file being useful if the code is just going to die because it can't manage swaps.

on the other hand, if the code really just wants to read the file, then it dying because of it missing is def a bug in that code that should be fixed rather than making downstream users hack around it.

wrt bridging, we document this FAQ already:
https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#can-i-access-layer-2-networking

if that's a hard requirement, then i don't see it working anytime soon (if ever).
Yeah I agree on the Swaps - this should be possible to stop this from occuring within the codebase. Given that Kubernetes demands this is turned off before the application can run, I assume this is a verification step to ensure swap space has not been assigned. I see a reference to a flag denoting just this behaviour and how to bypass it (i.e. KUBELET_FLAGS=${KUBELET_FLAGS:-"--fail-swap-on=false).

Re: wrt bridging - that basically means it is not possible to run Kubernetes/minikube as inter-container communication is meant to be managed by the master node ? Happy to be corrected if wrong - but I this is the way I understood it to work. 

In which case, I would suggest closing the ticket as without the ability to manage networking - this request cannot be fulfilled.

cheers

Rich 
as noted in the FAQ, we're going to say "no" to layer 2 access outside of the VM, but we're open to doing layer 2 inside of a single VM (between containers in there).  smbarber@ has probably thought the most about what that would take.
Thanks that would be a good solution having a single VM managing traffic between containers.

I have added the ports used by Kubernetes to the ticket as well.

Master
TCP     6443* /443      Kubernetes API Server
TCP     2379-2380   etcd server client API
TCP     10250       Kubelet API
TCP     10251       kube-scheduler
TCP     10252       kube-controller-manager
TCP     10255       Read-Only Kubelet API

Worker(s)
TCP     10250       Kubelet API
TCP     10255       Read-Only Kubelet API
TCP     30000-32767 NodePort Services
Crostini - Minikube.png
58.0 KB View Download
If minikube can run under that environment then I think we should already be okay. I've tested docker (requires un-blacklisting keyctl, but that will be fixed upstream soon) and lxc, and both are functional in the normal crostini environment. crostini containers can already set up network bridges, so nested containers could use those.

I think the immediate to-dos are:
1) set the appropriate net.bridge.* sysctls in maitred. minikube won't be able to access them, but I don't see an issue with turning those sysctls on.
2) turn on the additional *_DIAG kernel configs
both sound fine to try out in tot now that we're branching the runtime
looks like chromeos-4.19 rebase work has started, so i would naively guess that it will be available by end of Q4 2018.  but no promises :p.
This sounds very promising :-) No pun intended.

I will keep my fingers crossed.
FYI - New networking detail for Kubernetes has been released by the K8S team, this provides further information that may be of interest to the ticket, specifically the pods section.

https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview

Hey , Just following up - is there any progress that can be shared?
Sorry, I haven't yet worked on the items in #20. :( I might be able to squeeze it in for 73, but that depends on the progress we make on our uid shifting bugs.

We've finished our upgrade to 4.19, which should be available in 72.

Comment 27 Deleted

About the br_netfilter stuff for Kubernetes: I have a patch for this which upstream likes. I just haven't had time to rework it:

[PATCH net-next 0/2] br_netfilter: enable in non-initial netns
https://lkml.org/lkml/2018/11/7/681
Is the /proc/swaps thing still a problem? I'm a Kubernetes contributor and can make a PR to prevent the check from failing if the file is missing entirely. Assuming I can convince the code owners that it's not a bad idea, it would ship in 1.14 at the earliest. Anything to make this happen and get rid of my Mac...
By the way, this is exactly what kubelet is trying to do:

https://github.com/kubernetes/kubernetes/blob/05183bffe5cf690b418718aa107f5655e4ac0618/pkg/kubelet/cm/container_manager_linux.go#L205

It wants to make sure that the file has no lines except the headers, i.e. that no swap is enabled, since supporting such systems is complicated and Kubernetes just punts on that for the time being.

Comment 31 by richardrose@google.com, Jan 16 (6 days ago)

Re: proc/swaps. Given swap has been disabled on the target device, I was hoping this check would not be an issue, however the code performs an explicit check for the swap directory...

container_manager_linux.go: line 205
	if failSwapOn {
		// Check whether swap is enabled. The Kubelet does not support running with swap enabled.
		swapData, err := ioutil.ReadFile("/proc/swaps")

I havent tried this recently, so I will give it another go to see what latest changes are apparent from a ChromeOS and Minikube perspective. To my mind the main blocker would be the br_netfilter kernel module - the rest would appear to be minor software changes.

Comment 32 by vjvale...@gmail.com, Jan 16 (6 days ago)

Nice. Yes, the lack of br_netfilter is probably the larger issue. Here's hoping that this all gets resolved soon!

Sign in to add a comment