-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error rebuilding the cluster with a different number of nodes #2637
Comments
If the nodes are cleaned properly, it seems you are hitting the same as #2632. Can you share logging with |
Repeated all the cleaning steps above
rke --debug up: |
The title of the issue is, error rebuilding the cluster, with what version, how many and what nodes and when was the last time you ran |
This is a test cluster of 4 nodes, one was taken away, so I had to rebuild the cluster on 3 nodes. Before rebuilding, rke v1.2.9 + v1.20.8-rancher1-1 was running there. There is a subtlety, I did not immediately notice that the version of the image changed in rke v1.2.11, so I tried several times to start rke v1.2.11 + v1.20.8-rancher1-1. But a complete cleanup of the node should have removed these changes. |
When was the last time it worked with v1.2.9? Does it work with v1.2.9 now or do you also get an error? I assume the 3 remaining nodes have remained the same in between (specification wise, CPU/memory/disk) |
rke v1.2.9 + v1.20.8-rancher1-1 doesn't work either Repeated all the cleaning steps above
rke --debug up: |
Not really sure but if it's not version dependent, you might still be running into the linked issue above. Can you test storage performance using https://www.ibm.com/cloud/blog/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd? |
This look good, did you follow the article and check the values? Did you run it a couple of times to make sure it's consistent? |
Sorry, I checked it once. node-1: https://paste.4040.io/rikelejaro.md |
I don't really have a lead at this moment, I mean, those errors could be related to bad disk performance but thats not the case. What changed on these nodes since the last successful |
The problem turned out to be an incorrect DNS.
|
RKE version: v1.2.11
Docker version: (
docker version
,docker info
preferred)Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Type/provider of hosts: Bare-metal
cluster.yml file:
Steps to Reproduce:
cluster
,node-1
,node-2
,node-3
Control node:
cluster
Command:
rke up
Command:
rke remove
cluster
node incluster.yml
file.node-1
,node-2
,node-3
Control node:
node-2
Command:
rke up
Results:
An error will appear on a random node:
The text was updated successfully, but these errors were encountered: