20 Upgrade Controller #
See the Upgrade Controller documentation.
A Kubernetes controller capable of performing infrastructure platform upgrades consisting of:
Operating System (SL Micro)
Kubernetes (K3s & RKE2)
Additional components (Rancher, Elemental, NeuVector, etc.)
20.1 How does SUSE Edge use Upgrade Controller? #
The Upgrade Controller is essential in automating the (formerly manual) Day 2
operations required to upgrade management
clusters from one SUSE Edge release version to the next.
To achieve this automation, the Upgrade Controller
utilizes tools such as the System Upgrade Controller (Chapter 19, System Upgrade Controller) and the Helm Controller.
For further details on how the Upgrade Controller
works, see "How does the Upgrade Controller work?" (Section 20.3, “How does the Upgrade Controller work?”).
For known limitations that the Upgrade Controller
has, see the Known Limitations (Section 20.6, “Known Limitations”) section.
20.2 Installing the Upgrade Controller #
20.2.1 Prerequisites #
System Upgrade Controller (Section 19.2, “Installing the System Upgrade Controller”)
A Kubernetes cluster; either K3s or RKE2
20.2.2 Steps #
Install the
Upgrade Controller
Helm chart on your management cluster:helm install upgrade-controller oci://registry.suse.com/edge/3.1/upgrade-controller-chart --version 0.1.0 --create-namespace --namespace upgrade-controller-system
Validate the
Upgrade Controller
deployment:kubectl get deployment -n upgrade-controller-system
Validate the
Upgrade Controller
pod:kubectl get pods -n upgrade-controller-system
Validate the
Upgrade Controller
pod logs:kubectl logs <pod_name> -n upgrade-controller-system
20.3 How does the Upgrade Controller work? #
In order to perform an Edge release upgrade, the Upgrade Controller introduces two
new Kubernetes custom resources:
UpgradePlan (Section 20.4.1, “UpgradePlan”) -
created by the user
; holds configurations regarding an Edge release upgrade.ReleaseManifest (Section 20.4.2, “ReleaseManifest”) -
created by the Upgrade Controller
; holds component versions specific to a particular Edge release version. Must not be edited by users.
The Upgrade Controller proceeds to create a ReleaseManifest
resource that holds the component data for the Edge release version specified by the user under the releaseVersion
property in the UpgradePlan
resource.
Using the component data from the ReleaseManifest
, the Upgrade Controller proceeds to upgrade the Edge release components in the following order:
Operating System (OS) (Section 20.3.1, “Operating System upgrade”).
Kubernetes (Section 20.3.2, “Kubernetes upgrade”).
Additional components (Section 20.3.3, “Additional components upgrades”).
During the upgrade process, the Upgrade Controller constantly outputs upgrade information to the created UpgradePlan
. For more information on how to track the upgrade process, see Tracking the upgrade process (Section 20.5, “Tracking the upgrade process”).
20.3.1 Operating System upgrade #
To upgrade the OS component, the Upgrade Controller creates SUC (Chapter 19, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to
control-plane
node OS upgrades -control-plane-<os-name>-<os-version>-<suffix>
.For SUC Plans related to
worker
node OS upgrades -workers-<os-name>-<os-version>-<suffix>
.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual OS upgrade.
Depending on the ReleaseManifest
, the OS upgrade may include:
Package only updates
- for use-cases where the OS version does not change between Edge releases.Full OS migration
- for use-cases where the OS version changes between Edge releases.
The upgrade is executed one node at a time starting with the control-plane
nodes first. Only if the control-plane
node upgrade finishes, will the worker
nodes begin to be upgraded.
The Upgrade Controller configures the OS SUC Plans
to do drain of the cluster nodes if the cluster has more than one node of the specific type.
For clusters where the control-plane
nodes are greater than one and there is only one worker node, drain
will be performed only for the control-plane
nodes and vice versa.
For information on how to disable node drains altogether, see the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.3.2 Kubernetes upgrade #
To upgrade the Kubernetes distribution of a cluster, the Upgrade Controller creates SUC (Chapter 19, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to
control-plane
node Kubernetes upgrades -control-plane-<k8s-version>-<suffix>
.For SUC Plans related to
worker
node Kubernetes upgrades -workers-<k8s-version>-<suffix>
.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual Kubernetes upgrade.
The Kubernetes upgrade will happen one node at a time starting with the control-plane
nodes first. Only if the control-plane
node upgrade finishes, will the worker
nodes begin to be upgraded.
The Upgrade Controller configures the Kubernetes SUC Plans
to do drain of the cluster nodes if the cluster has more than one node of the specific type.
For clusters where the control-plane
nodes are greater than one and there is only one worker node, drain
will be performed only for the control-plane
nodes and vice versa.
For information on how to disable node drains altogether, see the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.3.3 Additional components upgrades #
Currently, all additional components are installed via Helm charts. For a full list of the components for a specific release, refer to the Release Notes (Section 36.1, “Abstract”).
For Helm charts deployed through EIB (Chapter 9, Edge Image Builder), the Upgrade Controller updates the existing HelmChart CR of each component.
For Helm charts deployed outside of EIB, the Upgrade Controller creates a HelmChart
resource for each component.
After the creation/update
of the HelmChart
resource, the Upgrade Controller relies on the helm-controller to pick up this change and proceed with the actual component upgrade.
Charts will be upgraded sequentially based on their order in the ReleaseManifest
. Additional values can also be passed through the UpgradePlan
. For more information about this, refer to the UpgradePlan (Section 20.4.1, “UpgradePlan”) section.
20.4 Kubernetes API extensions #
Extensions to the Kubernetes API introduced by the Upgrade Controller.
20.4.1 UpgradePlan #
The Upgrade Controller
introduces a new Kubernetes custom resource called an UpgradePlan
.
The UpgradePlan
serves as an instruction mechanism for the Upgrade Controller
and it supports the following configurations:
releaseVersion
- Edge release version to which the cluster should be upgraded to. The release version must follow semantic versioning and should be retrieved from the Release Notes (Section 36.1, “Abstract”).disableDrain
- Optional; instructs the Upgrade Controller on whether to disable node drains. Useful for when you have workloads with Disruption Budgets.Example for
control-plane
node drain disablement:spec: disableDrain: controlPlane: true
Example for
control-plane
andworker
node drain disablement:spec: disableDrain: controlPlane: true worker: true
helm
- Optional; specifies additional values for components installed via Helm.WarningIt is only advised to use this field for values that are critical for upgrades. Standard chart value updates should be performed after the respective charts have been upgraded to the next version.
Example:
spec: helm: - chart: foo values: bar: baz
20.4.2 ReleaseManifest #
The Upgrade Controller
introduces a new Kubernetes custom resource called a ReleaseManifest
.
The ReleaseManifest
is created by the Upgrade Controller
and holds component data for one specific Edge release version. This means that each Edge release version upgrade will be represented by a different ReleaseManifest
resource.
The ReleaseManifest
should always be created by the Upgrade Controller
.
It is not advisable to manually create or edit the ReleaseManifest
. Users that decide to do so, should do this at their own risk.
Component data that the ReleaseManifest
ships include, but is not limited to:
For an example of how a ReleaseManifest
can look, refer to the upstream documentation. Please note that this is just an example and it is not intended to be created as a valid ReleaseManifest
resource.
20.5 Tracking the upgrade process #
This section serves as means to track and debug the upgrade process
that the Upgrade Controller
initiates once the user creates an UpgradePlan
.
20.5.1 General #
General information about the state of the upgrade process
can be viewed in the UpgradePlan’s
status conditions.
The UpgradePlan
resource’s status can be viewed in the following way:
kubectl get upgradeplan <upgradeplan_name> -n upgrade-controller-system -o yaml
Running UpgradePlan
example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.1.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Control plane nodes are being upgraded
reason: InProgress
status: "False"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Kubernetes upgrade is not yet started
reason: Pending
status: Unknown
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Rancher upgrade is not yet started
reason: Pending
status: Unknown
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Longhorn upgrade is not yet started
reason: Pending
status: Unknown
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: MetalLB upgrade is not yet started
reason: Pending
status: Unknown
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: CDI upgrade is not yet started
reason: Pending
status: Unknown
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: KubeVirt upgrade is not yet started
reason: Pending
status: Unknown
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: NeuVector upgrade is not yet started
reason: Pending
status: Unknown
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: EndpointCopierOperator upgrade is not yet started
reason: Pending
status: Unknown
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Elemental upgrade is not yet started
reason: Pending
status: Unknown
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: SRIOV upgrade is not yet started
reason: Pending
status: Unknown
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Akri upgrade is not yet started
reason: Pending
status: Unknown
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Metal3 upgrade is not yet started
reason: Pending
status: Unknown
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: RancherTurtles upgrade is not yet started
reason: Pending
status: Unknown
type: RancherTurtlesUpgraded
observedGeneration: 1
sucNameSuffix: 90315a2b6d
Here you can view every component that the Upgrade Controller
will try to schedule an upgrade for. Each condition follows the below template:
lastTransitionTime
- the last time that this component condition has transitioned from one status to another.message
- message that indicates the current upgrade state of the specific component condition.reason
- the current upgrade state of the specific component condition. Possiblereasons
include:Succeeded
- upgrade of the specific component is successful.Failed
- upgrade of the specific component has failed.InProgress
- upgrade of the specific component is currently in progress.Pending
- upgrade of the specific component is not yet scheduled.Skipped
- specific component is not found on the cluster, so its upgrade will be skipped.Error
- specific component has encountered a transient error.
status
- status of the current conditiontype
, one ofTrue, False, Unknown
.type
- indicator for the currently upgraded component.
The Upgrade Controller
creates SUC Plans
for component conditions of type "OSUpgraded" and "KubernetesUpgraded". To further track the SUC Plans created for these components, refer to the Monitoring System Upgrade Controller Plans (Section 19.3, “Monitoring System Upgrade Controller Plans”) section.
All other component condition types can be further tracked by viewing the resources created for them by the helm-controller. For more information, see the Helm Controller (Section 20.5.2, “Helm Controller”) section.
An UpgradePlan
scheduled by the Upgrade Controller
can be marked as successful
once:
There are no
Pending
orInProgress
component conditions.The
lastSuccessfulReleaseVersion
property points to thereleaseVersion
that is specified in theUpgradePlan’s
configuration. This property is added to theUpgradePlan’s
status by theUpgrade Controller
once theupgrade process
is successful.
Successful UpgradePlan
example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.1.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:48Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:59Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart rancher upgrade succeeded
reason: Succeeded
status: "True"
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart longhorn is not installed
reason: Skipped
status: "False"
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Specified version of chart metallb is already installed
reason: Skipped
status: "False"
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart cdi is not installed
reason: Skipped
status: "False"
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart kubevirt is not installed
reason: Skipped
status: "False"
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart neuvector-crd is not installed
reason: Skipped
status: "False"
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Specified version of chart endpoint-copier-operator is already installed
reason: Skipped
status: "False"
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Chart elemental-operator upgrade succeeded
reason: Succeeded
status: "True"
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:27:15Z"
message: Chart sriov-crd is not installed
reason: Skipped
status: "False"
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:27:16Z"
message: Chart akri is not installed
reason: Skipped
status: "False"
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:27:19Z"
message: Chart metal3 is not installed
reason: Skipped
status: "False"
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:27:27Z"
message: Chart rancher-turtles is not installed
reason: Skipped
status: "False"
type: RancherTurtlesUpgraded
lastSuccessfulReleaseVersion: 3.1.0
observedGeneration: 1
sucNameSuffix: 90315a2b6d
20.5.2 Helm Controller #
This section covers how to track resources created by the helm-controller.
The below steps assume that kubectl
has been configured to connect to the cluster where the Upgrade Controller
has been deployed to.
Locate the
HelmChart
resource for the specific component:kubectl get helmcharts -n kube-system
Using the name of the
HelmChart
resource, locate the upgrade Pod that was created by thehelm-controller
:kubectl get pods -l helmcharts.helm.cattle.io/chart=<helmchart_name> -n kube-system # Example for Rancher kubectl get pods -l helmcharts.helm.cattle.io/chart=rancher -n kube-system NAME READY STATUS RESTARTS AGE helm-install-rancher-tv9wn 0/1 Completed 0 16m
View the logs of the component specific pod:
kubectl logs <pod_name> -n kube-system
20.6 Known Limitations #
Downstream
cluster upgrades are not yet managed by theUpgrade Controller
. For information on how to upgradedownstream
clusters, refer to the Downstream clusters (Chapter 28, Downstream clusters) section.The
Upgrade Controller
expects any additional SUSE Edge Helm charts that are deployed through EIB (Chapter 9, Edge Image Builder) to have their HelmChart CR deployed in thekube-system
namespace. To do this, configure theinstallationNamespace
property in your EIB definition file. For more information, see the upstream documentation.Currently the
Upgrade Controller
has no way to determine the current running Edge release version on themanagement
cluster. Ensure to provide an Edge release version that is greater than the currently running Edge release version on the cluster.Currently the
Upgrade Controller
supports non air-gapped environment upgrades only. Air-gapped upgrades are not yet possible.