Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-46380: StaticPodOperatorStatus validation should reject downgrades and concurrent node rollouts #2123

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,66 @@ tests:
spec:
logLevel: Normal
operatorLogLevel: Normal
onUpdate:
- name: Should reject multiple nodes with nonzero target revisions
initial: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
targetRevision: 1
- nodeName: b
targetRevision: 0
- nodeName: c
- nodeName: d
updated: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
targetRevision: 1
- nodeName: b
targetRevision: 0
- nodeName: c
targetRevision: 2
- nodeName: d
expectedStatusError: "status.nodeStatuses: Invalid value: \"array\": no more than 1 node status may have a nonzero targetRevision"
- name: Should reject decreasing currentRevision
initial: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
currentRevision: 3
updated: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
currentRevision: 2
expectedStatusError: "status.nodeStatuses[0].currentRevision: Invalid value: \"integer\": must only increase"
- name: Should reject clearing currentRevision
initial: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
currentRevision: 3
updated: |
apiVersion: operator.openshift.io/v1
kind: KubeAPIServer
spec: {} # No spec is required for a KubeAPIServer
status:
nodeStatuses:
- nodeName: a
expectedStatusError: "status.nodeStatuses[0].currentRevision: Invalid value: \"object\": cannot be unset once set"
3 changes: 3 additions & 0 deletions operator/v1/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -252,16 +252,19 @@ type StaticPodOperatorStatus struct {
// +listType=map
// +listMapKey=nodeName
// +optional
// +kubebuilder:validation:XValidation:rule="size(self.filter(status, status.?targetRevision.orValue(0) != 0)) <= 1",message="no more than 1 node status may have a nonzero targetRevision"
deads2k marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing integration test

NodeStatuses []NodeStatus `json:"nodeStatuses,omitempty"`
}

// NodeStatus provides information about the current state of a particular node managed by this operator.
// +kubebuilder:validation:XValidation:rule="has(self.currentRevision) || !has(oldSelf.currentRevision)",message="cannot be unset once set",fieldPath=".currentRevision"
type NodeStatus struct {
// nodeName is the name of the node
// +required
NodeName string `json:"nodeName"`

// currentRevision is the generation of the most recently successful deployment
// +kubebuilder:validation:XValidation:rule="self >= oldSelf",message="must only increase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is not true. There is a fallback logic in SNO that might revert the CurrentRevision if the new revision fails to install. I think we should revert it; otherwise, it might break an SNO cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this reminds me of an earlier bug https://bugzilla.redhat.com/show_bug.cgi?id=1985997.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really here, but would it be compatible to instead validate that self.currentRevision == oldSelf.targetRevision?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p0lyn0mial A clear logic for fallback we can see https://github.com/openshift/enhancements/blob/master/enhancements/kube-apiserver/startup-monitor.md, CurrentRevision won't decrease, when detecting problems with the new revision, the startup-monitor will copy the pod-manifest of the /etc/kubernetes/static-pods/last-known-good link (or the previous revision if the link does not exist, or don't do anything if there is no previous revision as in bootstrapping) into /etc/kubernetes.

CurrentRevision int32 `json:"currentRevision"`
// targetRevision is the generation of the deployment we're trying to apply
TargetRevision int32 `json:"targetRevision,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we'll make this only increase too

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -296,10 +299,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -283,10 +286,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -296,10 +299,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -296,10 +299,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -265,10 +268,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -274,10 +277,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -265,10 +268,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -270,10 +273,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -283,10 +286,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,9 @@ spec:
successful deployment
format: int32
type: integer
x-kubernetes-validations:
- message: must only increase
rule: self >= oldSelf
lastFailedCount:
description: lastFailedCount is how often the installer pod
of the last failed revision failed.
Expand Down Expand Up @@ -283,10 +286,18 @@ spec:
required:
- nodeName
type: object
x-kubernetes-validations:
- fieldPath: .currentRevision
message: cannot be unset once set
rule: has(self.currentRevision) || !has(oldSelf.currentRevision)
type: array
x-kubernetes-list-map-keys:
- nodeName
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: no more than 1 node status may have a nonzero targetRevision
rule: size(self.filter(status, status.?targetRevision.orValue(0)
!= 0)) <= 1
observedGeneration:
description: observedGeneration is the last generation change you've
dealt with
Expand Down
Loading