minikube: enable kernel features it needs |
|||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; CrOS x86_64 10895.10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.21 Safari/537.36 Platform: 10895.10.0 (Official Build) dev-channel eve Steps to reproduce the problem: 1. Install docker 2. Install Minikube 3. What is the expected behavior? Docker - storage graph driver is set to aufs or overlay Minikube - kubelet proxy available What went wrong? Docker storage graph driver implementation does not support either aufs or overlay, these would be useful to add for backwards compatibility. At present only btrfs is enabled by default in the current build. Minikube requires additional linux kernel modules to be available to sucessfully establish the proxy network. Based on a limited investigation, I believe the following items need to be added: ip_tables, nf_nat, overlay and aufs, netlink_diag. Looking at the error log for Minikube the following is evident (I have attached the steps taken for this task and the error log output as part of this ticket). Aug 26 20:33:32 minikube kubelet[8632]: W0826 20:33:32.635150 8632 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d Aug 26 20:33:32 minikube kubelet[8632]: W0826 20:33:32.671714 8632 fs.go:216] stat failed on /dev/vdb with error: no such file or directory Aug 26 20:33:32 minikube kubelet[8632]: F0826 20:33:32.675456 8632 server.go:233] failed to run Kubelet: open /proc/swaps: no such file or directory Did this work before? N/A Chrome version: 69.0.3497.21 Channel: dev OS Version: 10895.10.0 Flash Version: 30.0.0.142 /opt/google/chrome/pepper/libpepflashplayer.so Happy to provide more details if required. I have attached a breakdown of the approach taken inclusive of the error log generated.
,
Aug 27
we need to focus this bug otherwise it's going to be unmanagable. what do we want to focus on here ? minikube ?
,
Aug 27
,
Aug 27
Primary focus would definitely be minikube - getting this working would be a major win. The Docker request was more for compatibility - but I take your point regarding privileges, which is something I didnt consider as part of my request.
,
Aug 27
if minikube is failing due to /proc/swaps not existing, that's a bug in minikube imo. there's no reason it should need that. i don't know what /dev/vdb is supposed to contain. some virtual disks ?
,
Aug 27
,
Aug 27
<triage>@vapier, feel free to reassign if needed</triage>
,
Aug 28
sorry, i don't have the cycles or experience for this. i've never run kubernetes or docker stuff before. i'm just here for the general distro angle ;).
,
Aug 28
If it helps, I included a guide doc on installation of minikube as part of the original ticket.
,
Aug 29
From what I can tell, it looks like the issues may relate to br_netfilter module not being enabled? When the minikube Linux container is launched, there needs to be a /proc/sys/net/bridge directory created. This sub directory should contain a r/w file named bridge-nf-call-iptables to indicate whether the proxy state has been set. As this directory does not exist, there is no way to indicate to kubeadm init/minikube that they own proxy management. Outside of lxc, this can be performed using "modprobe br_netfilter" and then rebooting. I am not clear how to effect the same result within lxc, but it seems to be worth investigating further? I believe this is managed differently under Docker, which uses sysctl to set this indicator. I am unsure if it actually references the same value or if it just uses a separate config file. Again, hope this helps, if not please ignore 🙄
,
Sep 4
Do you know whom would be a good owner for this Stephen, and what the timeline should be?
,
Sep 4
I'm probably a good owner since I'm the most familiar with this environment. Optimistically M-71 if it's just a few kernel configs, but I could see this taking until M-72. I suspect we'll need to fix the Docker keyctl issue before this will work as well.
,
Sep 19
Hey - just checking in, any update on whether this is feasible/potential issues?
,
Sep 19
I don't think this is feasible right now without some help from Kubernetes to support running in unprivileged LXD containers. We have most of the kernel configs that are needed already, aside from NETLINK_DIAG. I think the only way to run this right now is as a privileged container, which we don't want to do with crostini: https://blog.ubuntu.com/2017/02/20/running-kubernetes-inside-lxd The restrictions that I'm aware of causing issues for kubernetes: 1) Only btrfs and dir storage drivers for docker are available. overlayfs would only work with privileged containers. 2) Swap cannot be controlled from the container. This is not namespaced and requires privileges. 3) sysctls like net.bridge.* are also not namespaced (only valid in the initial netns). The configs are already built into the VM kernel but won't be available to unprivileged containers.
,
Sep 20
Thanks for detailed analysis :-) I had used the following configuration (as it doesnt use Conjure), which resulted in similar results. https://gist.github.com/bat9r/76610a778f53f4dfbb5bc887bc2f3cce Re: The restrictions that I'm aware of causing issues for kubernetes: 1) Only btrfs and dir storage drivers for docker are available. overlayfs would only work with privileged containers. Ok , if I understand this correctly, this should not be a showstopper as the latest Minikube version has been updated to support BTRFS. Any change for this I would think would be software related i.e. Minikube/K8s to achieve compatibility? 2) Swap cannot be controlled from the container. This is not namespaced and requires privileges. My assumption here is that /proc/swaps is part of the virtual filesystem, so making this entry available and read only might be be possible? In the instance of Kubernetes, as far as I am aware Swap needs be turned off i.e. "swapoff -a", but what you are saying is that this will not do anything as this setting cannot be performed from within the container - right? 3) sysctls like net.bridge.* are also not namespaced (only valid in the initial netns). The configs are already built into the VM kernel but won't be available to unprivileged containers. Ok, I had assumed this works in the same way between Docker and K8s, but this is not the case. This to me seems like the main sticking point - as the other issues, can most likely have software workarounds. However without the ability to bridge/NAT traffic, it is unlikely to be possible to run kubernetes? cheers Rich
,
Sep 20
presumably if the code is looking for /proc/swaps, it doesn't want to just merely read the file, it wants to manage/mess around with adding/removing swap devices. we don't have swap support enabled in the kernel and currently have no plans to enable it. so i don't see adding a stub/zero byte /proc/swaps file being useful if the code is just going to die because it can't manage swaps. on the other hand, if the code really just wants to read the file, then it dying because of it missing is def a bug in that code that should be fixed rather than making downstream users hack around it. wrt bridging, we document this FAQ already: https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#can-i-access-layer-2-networking if that's a hard requirement, then i don't see it working anytime soon (if ever).
,
Sep 20
Yeah I agree on the Swaps - this should be possible to stop this from occuring within the codebase. Given that Kubernetes demands this is turned off before the application can run, I assume this is a verification step to ensure swap space has not been assigned. I see a reference to a flag denoting just this behaviour and how to bypass it (i.e. KUBELET_FLAGS=${KUBELET_FLAGS:-"--fail-swap-on=false).
Re: wrt bridging - that basically means it is not possible to run Kubernetes/minikube as inter-container communication is meant to be managed by the master node ? Happy to be corrected if wrong - but I this is the way I understood it to work.
In which case, I would suggest closing the ticket as without the ability to manage networking - this request cannot be fulfilled.
cheers
Rich
,
Sep 21
as noted in the FAQ, we're going to say "no" to layer 2 access outside of the VM, but we're open to doing layer 2 inside of a single VM (between containers in there). smbarber@ has probably thought the most about what that would take.
,
Sep 21
Thanks that would be a good solution having a single VM managing traffic between containers. I have added the ports used by Kubernetes to the ticket as well. Master TCP 6443* /443 Kubernetes API Server TCP 2379-2380 etcd server client API TCP 10250 Kubelet API TCP 10251 kube-scheduler TCP 10252 kube-controller-manager TCP 10255 Read-Only Kubelet API Worker(s) TCP 10250 Kubelet API TCP 10255 Read-Only Kubelet API TCP 30000-32767 NodePort Services
,
Sep 22
If minikube can run under that environment then I think we should already be okay. I've tested docker (requires un-blacklisting keyctl, but that will be fixed upstream soon) and lxc, and both are functional in the normal crostini environment. crostini containers can already set up network bridges, so nested containers could use those. I think the immediate to-dos are: 1) set the appropriate net.bridge.* sysctls in maitred. minikube won't be able to access them, but I don't see an issue with turning those sysctls on. 2) turn on the additional *_DIAG kernel configs
,
Sep 22
both sound fine to try out in tot now that we're branching the runtime
,
Sep 24
looks like chromeos-4.19 rebase work has started, so i would naively guess that it will be available by end of Q4 2018. but no promises :p.
,
Sep 24
This sounds very promising :-) No pun intended. I will keep my fingers crossed.
,
Oct 5
FYI - New networking detail for Kubernetes has been released by the K8S team, this provides further information that may be of interest to the ticket, specifically the pods section. https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview
,
Dec 12
Hey , Just following up - is there any progress that can be shared?
,
Dec 13
Sorry, I haven't yet worked on the items in #20. :( I might be able to squeeze it in for 73, but that depends on the progress we make on our uid shifting bugs. We've finished our upgrade to 4.19, which should be available in 72.
,
Dec 13
About the br_netfilter stuff for Kubernetes: I have a patch for this which upstream likes. I just haven't had time to rework it: [PATCH net-next 0/2] br_netfilter: enable in non-initial netns https://lkml.org/lkml/2018/11/7/681
,
Jan 16
Is the /proc/swaps thing still a problem? I'm a Kubernetes contributor and can make a PR to prevent the check from failing if the file is missing entirely. Assuming I can convince the code owners that it's not a bad idea, it would ship in 1.14 at the earliest. Anything to make this happen and get rid of my Mac...
,
Jan 16
By the way, this is exactly what kubelet is trying to do: https://github.com/kubernetes/kubernetes/blob/05183bffe5cf690b418718aa107f5655e4ac0618/pkg/kubelet/cm/container_manager_linux.go#L205 It wants to make sure that the file has no lines except the headers, i.e. that no swap is enabled, since supporting such systems is complicated and Kubernetes just punts on that for the time being.
,
Jan 16
(6 days ago)
Re: proc/swaps. Given swap has been disabled on the target device, I was hoping this check would not be an issue, however the code performs an explicit check for the swap directory...
container_manager_linux.go: line 205
if failSwapOn {
// Check whether swap is enabled. The Kubelet does not support running with swap enabled.
swapData, err := ioutil.ReadFile("/proc/swaps")
I havent tried this recently, so I will give it another go to see what latest changes are apparent from a ChromeOS and Minikube perspective. To my mind the main blocker would be the br_netfilter kernel module - the rest would appear to be minor software changes.
,
Jan 16
(6 days ago)
Nice. Yes, the lack of br_netfilter is probably the larger issue. Here's hoping that this all gets resolved soon! |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by smbar...@chromium.org
, Aug 27Components: -Platform>DevTools OS>Systems>Containers
Labels: Proj-Containers