Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rancher desktop bug] kind create cluster hangs on Windows at β€’ Writing configuration πŸ“œ ... #3003

Open
tglaeser opened this issue Nov 15, 2022 · 14 comments
Labels
kind/external upstream bugs

Comments

@tglaeser
Copy link

I'm following the instructions outlined here.

I have Rancher Desktop installed:

$Β wsl --list
Windows Subsystem for Linux Distributions:
rancher-desktop (Default)
rancher-desktop-data

I use the configuration from the link above:

$ cat cluster-config.yml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30000
    hostPort: 30000
    protocol: TCP

When trying to create a new cluster, command execution never finishes and no error is logged either:

$ kind create cluster --config=cluster-config.yml
Creating cluster "kind" ...
 β€’ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό  ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 β€’ Preparing nodes πŸ“¦   ...
 βœ“ Preparing nodes πŸ“¦
 β€’ Writing configuration πŸ“œ  ...

Any suggestions how to further debug the issue would be highly appreciated.

@tglaeser tglaeser added the kind/support Categorizes issue or PR as a support question. label Nov 15, 2022
@stmcginnis
Copy link
Contributor

Can you add some of the missing information from the issue template:

Environment:

  • kind version: (use kind version):
  • Kubernetes version: (use kubectl version):
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):

I'm not sure if it will show us much, but you can also try creating the cluster with the -v argument. As a test, try running:

kind create cluster -v 9

That may provide additional output that could help pinpoint where things are getting hung.

@tglaeser
Copy link
Author

OS is Windows, but all commands where executed on mintty that comes with MSYS2. Programs kind and kubectl where installed using curl, docker came with Rancher Desktop. Following comes the detailed environment output:

$ kind version
kind v0.17.0 go1.19.2 windows/amd64
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:44:59Z", GoVersion:"go1.19", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+k3s1", GitCommit:"f2585c1671b31b4b34bddbb3bf4e7d69662b0821", GitTreeState:"clean", BuildDate:"2022-10-25T19:59:38Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.12.0)
  dev: Docker Dev Environments (Docker Inc., v0.0.3)
  extension: Manages Docker extensions (Docker Inc., v0.2.13)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.21.0)

Server:
 Containers: 22
  Running: 20
  Paused: 0
  Stopped: 2
 Images: 68
 Server Version: 20.10.18
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version: 5fd4c4d144137e991c4acebb2146ab1483a97925
 init version:
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.102.1-microsoft-standard-WSL2
 Operating System: Rancher Desktop WSL Distribution
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 12.33GiB
 Name: SAG-9WP8L33
 ID: T77M:4FS2:GDGP:46ND:5LNF:DFXS:SKKY:ONI4:RLYA:3OIC:RCHW:H46F
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Yes, I did run verbosity level 9, but again, no error was revealed; here comes that output anyway:

$ kind create cluster --config=cluster-config.yml -v 9
Creating cluster "kind" ...
 β€’ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό  ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1 present locally
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 β€’ Preparing nodes πŸ“¦   ...
 βœ“ Preparing nodes πŸ“¦
 β€’ Writing configuration πŸ“œ  ...
DEBUG: config/config.go:96] Using the following kubeadm config for node kind-control-plane:
apiServer:
  certSANs:
  - localhost
  - 127.0.0.1
  extraArgs:
    runtime-config: ""
apiVersion: kubeadm.k8s.io/v1beta3
clusterName: kind
controlPlaneEndpoint: kind-control-plane:6443
controllerManager:
  extraArgs:
    enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.25.3
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/16
scheduler:
  extraArgs: null
---
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- token: abcdef.0123456789abcdef
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.18.0.2
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-ip: 172.18.0.2
    node-labels: ""
    provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubeadm.k8s.io/v1beta3
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 172.18.0.2
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: kind-control-plane:6443
    token: abcdef.0123456789abcdef
    unsafeSkipCAVerification: true
kind: JoinConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-ip: 172.18.0.2
    node-labels: ""
    provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
cgroupRoot: /kubelet
evictionHard:
  imagefs.available: 0%
  nodefs.available: 0%
  nodefs.inodesFree: 0%
failSwapOn: false
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
conntrack:
  maxPerCore: 0
iptables:
  minSyncPeriod: 1s
kind: KubeProxyConfiguration
mode: iptables

@BenTheElder
Copy link
Member

This probably means docker exec is hanging, since we write the config to the node by piping it to docker exec.

That's going to be a bit difficult to debug by github comment, I don't know why it would be hanging.

It's possible something is incompatible with the rancher desktop environment, I don't use WSL or rancher desktop, our community maintained WSL2 guide suggests some other approaches.

@tglaeser
Copy link
Author

This probably means docker exec is hanging, since we write the config to the node by piping it to docker exec.

How can I test this; what command can I execute via docker exec? I assume this would be something like docker exec kind-control-plane ..., right?

@tglaeser
Copy link
Author

tglaeser commented Nov 17, 2022

Actually, writing the configuration seems to have been successful:

$ winpty docker exec -it kind-control-plane cat //kind//kubeadm.conf
apiServer:
  certSANs:
  - localhost
  - 127.0.0.1
  extraArgs:
    runtime-config: ""
apiVersion: kubeadm.k8s.io/v1beta3
clusterName: kind
controlPlaneEndpoint: kind-control-plane:6443
controllerManager:
  extraArgs:
    enable-hostpath-provisioner: "true"
kind: ClusterConfiguration
kubernetesVersion: v1.25.3
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/16
scheduler:
  extraArgs: null
---
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- token: abcdef.0123456789abcdef
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.18.0.2
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-ip: 172.18.0.2
    node-labels: ""
    provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubeadm.k8s.io/v1beta3
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 172.18.0.2
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: kind-control-plane:6443
    token: abcdef.0123456789abcdef
    unsafeSkipCAVerification: true
kind: JoinConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-ip: 172.18.0.2
    node-labels: ""
    provider-id: kind://docker/kind/kind-control-plane
---
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
cgroupRoot: /kubelet
evictionHard:
  imagefs.available: 0%
  nodefs.available: 0%
  nodefs.inodesFree: 0%
failSwapOn: false
imageGCHighThresholdPercent: 100
kind: KubeletConfiguration
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
conntrack:
  maxPerCore: 0
iptables:
  minSyncPeriod: 1s
kind: KubeProxyConfiguration
mode: iptables

@BenTheElder
Copy link
Member

does delve work on windows? ordinarily I'd try to reproduce a go hang under a debugger and see where it halted.

@tglaeser
Copy link
Author

tglaeser commented Nov 18, 2022

Well sure, that could be a last resort; but I would rather prefer that a command logs some meaningful information with flag --verbosity set instead of simply becoming unresponsive.

@BenTheElder
Copy link
Member

Well sure, that could be a last resort; but I would rather prefer that a command logs some meaningful information with flag --verbosity set instead of simply becoming unresponsive.

There's no further logging because something hung very early in the cluster bootstrap process, which we've never seen before in years of developing this tool. We're at the last resort unfortunately, something is very wrong in this environment for the command to hang indefinitely. Docker commands should not do that, there's a built in timeout, nor should any of the logic in kind.

Maybe we're stuck trying to write the log output in mintty? Otherwise maybe docker in this environment has a very long timeout.

After writing out this file to the container, it checks for success, and then it would either fail or move to checking if we need to deal with containerd config. With your given config we don't need to touch containerd config, so it would mark the step done (which is itself just logging that) and move on to executing kubeadm init.

Due to nobody helping with #1529 we have no Windows CI, and none of the core contributors use Windows locally, so Windows support is best-effort.

@tglaeser
Copy link
Author

tglaeser commented Nov 18, 2022

I can understand that this is not heavily used on Windows; I had no issues running it on Linux.

I was wondering if containerd might be the problem here, but at least it is running fine:

$Β winpty docker exec -it kind-control-plane pgrep --exact containerd
218

If the next action is executing kubeadm init, I can run this manually too:

$ Β winpty docker exec -it kind-control-plane kubeadm init
[init] Using Kubernetes version: v1.25.4
[preflight] Running pre-flight checks
        [WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.18.0.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 7.507360 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node kind-control-plane as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node kind-control-plane as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: y79hdp.alukghgfhshsweu3
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.18.0.2:6443 --token y79hdp.alukghgfhshsweu3 \
        --discovery-token-ca-cert-hash sha256:cc5679675cd00236ff26c8844fb5f14e5278dc8a3fa712b033f0efa3946f8258

But note the following: In order execute any command in the container, I need to prefix all commands with winpty or leave out flags -it, otherwise I'm getting the following output:

the input device is not a TTY.  If you are using mintty, try prefixing the command with 'winpty'

So I could imagine that the issue at stake might simply be an IO redirection problem.

On the topic of debugging, I don't know when I will find the time to setup a Go development environment including the mentioned debugger.

@BenTheElder
Copy link
Member

But note the following: In order execute any command in the container, I need to prefix all commands with winpty or leave out flags -it, otherwise I'm getting the following output:

That might be a problem, and one we haven't seen on other platforms before.

However while kind adds -i when we are passing an input, we shouldn't be using -t.

@tglaeser
Copy link
Author

tglaeser commented Nov 18, 2022

Maybe one more pointer:

When I call kind delete cluster from a different terminal the hanging process finishes with:

$ kind create cluster --config=cluster-config.yml
Creating cluster "kind" ...
 β€’ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό  ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 β€’ Preparing nodes πŸ“¦   ...
 βœ“ Preparing nodes πŸ“¦
 β€’ Writing configuration πŸ“œ  ...
 βœ— Writing configuration πŸ“œ
ERROR: failed to create cluster: failed to copy kubeadm config to node: command "docker exec --privileged -i kind-control-plane cp /dev/stdin /kind/kubeadm.conf" failed with error: exit status 137

Command Output:

Which is somehow unexpected given that file /kind/kubeadm.conf has been written successfully.

@BenTheElder
Copy link
Member

That exit status is coming from docker, seems there's something wrong with docker exec in this environment.

@sergioprates
Copy link

I think that the problem is explained in #3065

@BenTheElder BenTheElder added kind/external upstream bugs and removed kind/support Categorizes issue or PR as a support question. labels Feb 1, 2023
@BenTheElder
Copy link
Member

Yes, this seems to be rancher-sandbox/rancher-desktop#3239 πŸ‘€

@BenTheElder BenTheElder changed the title kind create cluster hangs on Windows at β€’ Writing configuration πŸ“œ ... [rancher desktop bug] kind create cluster hangs on Windows at β€’ Writing configuration πŸ“œ ... May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/external upstream bugs
Projects
None yet
Development

No branches or pull requests

4 participants