-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If any clusters are offline, bundle status can get stuck #594
Comments
Looks like we are running into this error 2years later with rancher 2.7.1. Having 1 downed Cluster block the whole process is what we were trying to circumvent with fleet. Any idea or timeline? |
This might be related to default values in the rollout strategy. The defaults are documented in the fleet.yaml reference. |
Let's test if this still happens on 2.9.1 |
When working on this we should
|
This should help reproduce fleet#594 [1]. [1]: rancher/fleet#594
This is reproduction step 6 of fleet#594 [1]. [1]: rancher/fleet#594
This is reproduction step 10 of fleet#594 [1]. [1]: rancher/fleet#594
This reverts commit 1ffa225. This is reproduction step 12 of fleet#594 [1]. [1]: rancher/fleet#594
This is reproduction step 6 of fleet#594 [1]. [1]: rancher/fleet#594
This is reproduction step 10 of fleet#594 [1]. [1]: rancher/fleet#594
This reverts commit 1ffa225. This is reproduction step 12 of fleet#594 [1]. [1]: rancher/fleet#594
Rollout strategy does not seem to be the culprit here, as setting |
The issue here is that since a bundle deployment's status is updated by the agent living in the bundle deployment's target cluster, a bundle deployment targeting a downstream cluster will not have its status updated once that cluster is offline. With status data being propagated from bundle deployment upwards to bundles and A solution for this could consist in watching a Fleet |
Not planned right now, as it needs more UX |
If any clusters are offline/unavailable, the status of Bundles that get deployed to those clusters can get stuck with misleading/confusing error messages.
Steps to reproduce:
ErrApplied
, with an error message similar toerror validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].lifecycle): unknown field "preStart" in io.k8s.api.core.v1.Lifecycle'
.Error validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].lifecycle): unknown field "preStart" in io.k8s.api.core.v1.Lifecycle
, even though the actual repository no longer contains any reference to apreStart
field.It is worth noting that, if step 10 is skipped - so that the commit in step 12 (which fixes the error) is the first commit to the repo after cluster A goes offline - then in step 12 the BundleDeployment for A will go to a "Wait Applied" state rather than being stuck in the error state.
The text was updated successfully, but these errors were encountered: