-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SURE-7342] Re-installation loop of managed services #1703
Comments
Further details:
Continuing the investigation, to focus on the manager cluster. If you have any process or resource to suggest monitoring, it would be welcome. |
We have upgraded to rancher v2.7.6, which we installed on a new manager;
As soon as we attached the ks-shared cluster, the bundles on ks-shared-staging cluster got immediately redeployed, like if the cluster was installed the first time (not just upgraded: helm list shows revision 1 and a fresh version). This happens every 15 minutes. Any feedback, before we abandon the project of moving to Rancher altogether? |
Hi @dvarrazzo, this looks like another instance of #1245, with redeployments happening every 15 minutes on clusters named with a common prefix. Any chance you could update your Fleet install to discard this being the issue? |
I looked for similar issues, but didn't find any. Thank you for the reference. What can I say... Wow. This huge bug has been open since January and across several releases. And no, not confirmed: rancher 2.7.6, installed on 2023-10-02, installed fleet 0.7.1, affected by the bug. I will discuss with my team, but at the moment we are on course to abandon our attempt to use fleet altogether. Our level of trust in the project is pretty low in this moment, you may understand. |
We know there is currently unpredictability on Fleet versions installed with a given Rancher version, and are taking steps to fix that in the next Rancher release. |
We hit the same issue, and Fleet in version 0.8.0 doesn't appear to solve it. In our case the upgrade to Rancher 2.7.6 took place on 2023-10-30. With the upgrade came automatically the Fleet in version fleet-102.2.0+up0.8.0 (I understand it's not pinned anyhow with Rancher 2.7 release up to now). As a result the fleet-agent keep upgrading every 1 minute on downstream rke2 clusters:
|
@nepomucen, @dvarrazzo - we cannot reproduce this with Rancher 2.8.0 (Fleet 0.9.0). If you can, please help us with a reproducer. |
closing for no response. We believe this is fixed in Rancher 2.8. |
Is there an existing issue for this?
Current Behavior
We have experimenting with a Rancher installation, currently controlling 6 gitrepo and 6 clusters
We have 6 gitrepo currently controlling 6 clusters (not 1:1). The gitrepos watch 3 repos, one of which on several branches.
Since we added the cluster on shared-prod, the three clusters on shared-staging and shared-test entered a loop where all the helm charts managed are re-installed. As a consequence, services get restarted and the system is unstable.
I can't find any even in the system showing the reason why the re-installation happens. The GitRepo resources don't change. Because restart happens on the 3 clusters at the same time, I tend to think it is something on the fleet manager to trigger the event.
How can we diagnose what is causing the problem and stop the issue?
Expected Behavior
No response
Steps To Reproduce
No response
Environment
Logs
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: