Skip to content
This repository has been archived by the owner on Jan 18, 2023. It is now read-only.

Init cluster failed due to discovery container issue #250

Open
oglok opened this issue Aug 7, 2019 · 13 comments
Open

Init cluster failed due to discovery container issue #250

oglok opened this issue Aug 7, 2019 · 13 comments

Comments

@oglok
Copy link

oglok commented Aug 7, 2019

Having this cluster-init pod definition:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: cmk-cluster-init-pod
  name: cmk-cluster-init-pod
  namespace: intel-cmk
spec:
  serviceAccountName: cmk-serviceaccount
  containers:
  - args:
      # Change this value to pass different options to cluster-init.
      - "/cmk/cmk.py cluster-init --host-list=test1-worker-0,test1-worker-1 --saname=cmk-serviceaccount --namespace=intel-cmk --cmk-img=quay.io/oglok/ocp4-cmk:latest"
    command:
    - "/bin/bash"
    - "-c"
    image: quay.io/oglok/ocp4-cmk:latest
    name: cmk-cluster-init-pod
  restartPolicy: Never

The image was stored in my own Quay registry, in order to store it somewhere easily accesible.

[root@booger CPU-Manager-for-Kubernetes]# oc get pods -n intel-cmk
NAME                                           READY   STATUS   RESTARTS   AGE
cmk-cluster-init-pod                           0/1     Error    0          24h
cmk-init-install-discover-pod-test1-worker-0   0/2     Error    0          24h
cmk-init-install-discover-pod-test1-worker-1   0/2     Error    0          24h

The install container pastes the cmk binary into the workers in /opt/bin. However, I'm getting the following trace in the discover container:

oc logs pod/cmk-init-install-discover-pod-test1-worker-0 -c discover -n intel-cmk                                                                             [3/1917]
INFO:root:Patching node status test1-worker-0:
[
  {
    "op": "add",
    "path": "/status/capacity/cmk.intel.com~1exclusive-cores",
    "value": 4
  }
]
Traceback (most recent call last):
  File "/cmk/cmk.py", line 158, in <module>
    main()
  File "/cmk/cmk.py", line 115, in main
    discover.discover(args["--conf-dir"])
  File "/cmk/intel/discover.py", line 41, in discover
    add_node_er(conf_dir)
  File "/cmk/intel/discover.py", line 96, in add_node_er
    patch_k8s_node_status(patch_body)
  File "/cmk/intel/discover.py", line 202, in patch_k8s_node_status
    k8sapi.patch_node_status(node_name, patch_body)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/apis/core_v1_api.py", line 17100, in patch_node_status
    (data) = self.patch_node_status_with_http_info(name, body, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/apis/core_v1_api.py", line 17194, in patch_node_status_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 334, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    return_data = self.deserialize(response_data, response_type)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 249, in deserialize
    return self.__deserialize(data, response_type)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 289, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 633, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 289, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 633, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 267, in __deserialize
    for sub_data in data]
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 267, in <listcomp>
    for sub_data in data]
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 289, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/api_client.py", line 635, in __deserialize_model
    instance = klass(**kwargs)
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/models/v1_container_image.py", line 52, in __init__
    self.names = names
  File "/usr/local/lib/python3.4/site-packages/kubernetes/client/models/v1_container_image.py", line 77, in names
    raise ValueError("Invalid value for `names`, must not be `None`")
ValueError: Invalid value for `names`, must not be `None`

I'm not sure what is the command being run, but I can do something like this in the worker nodes:

[root@test1-worker-1 bin]# ./cmk discover --conf-dir=/etc/cmk                                                                                                                                                   
Traceback (most recent call last):
  File "cmk.py", line 158, in <module>
  File "cmk.py", line 115, in main
  File "intel/discover.py", line 32, in discover
  File "intel/k8s.py", line 299, in get_kubelet_version
  File "intel/k8s.py", line 135, in version_api_client_from_config
  File "site-packages/kubernetes/config/incluster_config.py", line 96, in load_incluster_config
  File "site-packages/kubernetes/config/incluster_config.py", line 47, in load_and_set
  File "site-packages/kubernetes/config/incluster_config.py", line 53, in _load_config
kubernetes.config.config_exception.ConfigException: Service host/port is not set.
[89004] Failed to execute script cmk
@mJace
Copy link

mJace commented Aug 7, 2019

What's your k8s version?

@lmdaly
Copy link
Contributor

lmdaly commented Aug 7, 2019

We released added support for L8s 1.14+, that was showing the same error. #249

Are you using the latest master branch?

@mJace
Copy link

mJace commented Aug 7, 2019

BTW @lmdaly Does #249 support k8s lower than 1.14? like 1.10

@oglok
Copy link
Author

oglok commented Aug 7, 2019

Hello guys! Thanks for your quick replies :-)

[root@booger CPU-Manager-for-Kubernetes]# oc version
Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.4-201906271212+6b97d85-dirty", GitCommit:"6b97d85", GitTreeState:"dirty", BuildDate:"2019-06-27T18:11:21Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+c62ce01", GitCommit:"c62ce01", GitTreeState:"clean", BuildDate:"2019-06-27T18:14:14Z", GoVersion:"go1.11.6", Compiler:"gc", Platform:"linux/amd64"}

So 1.13 at the moment.

@oglok
Copy link
Author

oglok commented Aug 7, 2019

Yes, I was using master branch. Should I use remotes/origin/cmk-release-v1.3.1 ?

@oglok
Copy link
Author

oglok commented Aug 7, 2019

What is the exact command that the "discover" container is executing? is it provided by this pod defintion?

https://github.com/intel/CPU-Manager-for-Kubernetes/blob/master/resources/pods/cmk-discover-pod.yaml#L25

@lmdaly
Copy link
Contributor

lmdaly commented Aug 7, 2019

I tried out the latest master branch, with K8s 1.13 and didn't run into this issue. Does your master branch include the latest commit for K8s 1.14 support?

Yes, thats the command run by the discover.

@mJace I haven't tested with 1.10 but I imagine so.

@oglok
Copy link
Author

oglok commented Aug 7, 2019

Yes, it has this commit: cc50f8f

@mJace
Copy link

mJace commented Aug 8, 2019

@oglok Does the system environment variable 'NODE_NAME' exists in your pod?
Since the node_name is from container environment NODE_NAME which is from spec.nodeName.

def patch_k8s_node_status(patch_body):
    k8sconfig.load_incluster_config()
    k8sapi = k8sclient.CoreV1Api()
    node_name = os.getenv("NODE_NAME")

    logging.info("Patching node status {}:\n{}".format(
        node_name,
        json.dumps(patch_body, indent=2, sort_keys=True)))

    # Patch the node with the specified number of opaque integer resources.
    k8sapi.patch_node_status(node_name, patch_body)

I also tested your repo's cmk image in my k8s v1.14 cluser. And it works.
Here's the init yaml in my cluster. I only changed the --all-hosts and run the pod in default ns.

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: cmk-cluster-init-pod
  name: cmk-cluster-init-pod
spec:
  serviceAccountName: cmk-serviceaccount
  containers:
  - args:
      # Change this value to pass different options to cluster-init.
      - "/cmk/cmk.py cluster-init --all-hosts --saname=cmk-serviceaccount --cmk-img=quay.io/oglok/ocp4-cmk:latest"
    command:
    - "/bin/bash"
    - "-c"
    image: quay.io/oglok/ocp4-cmk:latest
    name: cmk-cluster-init-pod
  restartPolicy: Never

@oglok
Copy link
Author

oglok commented Aug 8, 2019

Hey! I couldn't catch the "discover" container, but the "install" one has that env var:

[root@booger CPU-Manager-for-Kubernetes]# oc exec -ti cmk-init-install-discover-pod-test1-worker-0 -c install -n intel-cmk  -- /bin/bash
root@cmk-init-install-discover-pod-test1-worker-0:/cmk# printenv
HOSTNAME=cmk-init-install-discover-pod-test1-worker-0
GPG_KEY=97FC712E4C024BBEA48A61ED3A5CA953F73C700D
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
TERM=xterm
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_HOST=172.30.0.1
CMK_PROC_FS=/host/proc
PYTHON_VERSION=3.4.6
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/cmk
LANG=C.UTF-8
PYTHON_PIP_VERSION=9.0.1
SHLVL=1
HOME=/root
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
NODE_NAME=test1-worker-0
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443

@oglok
Copy link
Author

oglok commented Aug 8, 2019

I've run the cmk-discover pod manifest manually, and I got into it. It has NODE_NAME var, and then, it fails with the same trace as before:

[root@booger CPU-Manager-for-Kubernetes]# oc exec -ti cmk-discover-pod -n intel-cmk -- /bin/bash
root@cmk-discover-pod:/cmk# printenv
HOSTNAME=cmk-discover-pod
GPG_KEY=97FC712E4C024BBEA48A61ED3A5CA953F73C700D
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
TERM=xterm
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_HOST=172.30.0.1
PYTHON_VERSION=3.4.6
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/cmk
LANG=C.UTF-8
PYTHON_PIP_VERSION=9.0.1
SHLVL=1
HOME=/root
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
NODE_NAME=test1-worker-1
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
_=/usr/bin/printenv

@oglok
Copy link
Author

oglok commented Aug 12, 2019

Any clue guys?

@bhavishyasharma
Copy link

@oglok Did you manage to resolve the issue with discover pod? I am facing the same issue.
CMK Version: v1.5.2
K8s Version: Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants