21 Upgrade Controller #
See the Upgrade Controller documentation.
A Kubernetes controller capable of performing infrastructure platform upgrades consisting of:
Operating System (SUSE Linux Micro)
Kubernetes (K3s & RKE2)
Additional components (Rancher, Elemental, SUSE Security, etc.)
21.1 How does SUSE Edge use Upgrade Controller? #
The Upgrade Controller is essential in automating the (formerly manual) "Day 2" operations required to upgrade management clusters from one SUSE Edge release version to the next.
To achieve this automation, the Upgrade Controller utilizes tools such as the System Upgrade Controller (Chapter 20, System Upgrade Controller) and the Helm Controller.
For further details on how the Upgrade Controller works, see Section 21.3, “How does the Upgrade Controller work?”.
For known limitations that the Upgrade Controller has, see Section 21.6, “Known Limitations”.
21.2 Installing the Upgrade Controller #
21.2.1 Prerequisites #
System Upgrade Controller (Section 20.2, “Installing the System Upgrade Controller”)
A Kubernetes cluster; either K3s or RKE2
21.2.2 Steps #
Install the Upgrade Controller Helm chart on your management cluster:
helm install upgrade-controller oci://registry.suse.com/edge/3.2/upgrade-controller-chart --version 302.0.0+up0.1.1 --create-namespace --namespace upgrade-controller-system
Validate the Upgrade Controller deployment:
kubectl get deployment -n upgrade-controller-system
Validate the Upgrade Controller pod:
kubectl get pods -n upgrade-controller-system
Validate the Upgrade Controller pod logs:
kubectl logs <pod_name> -n upgrade-controller-system
21.3 How does the Upgrade Controller work? #
In order to perform an Edge release upgrade, the Upgrade Controller introduces two new Kubernetes custom resources:
UpgradePlan (Section 21.4.1, “UpgradePlan”) - created by the user; holds configurations regarding an Edge release upgrade.
ReleaseManifest (Section 21.4.2, “ReleaseManifest”) - created by the Upgrade Controller; holds component versions specific to a particular Edge release version. This file must not be edited by users.
The Upgrade Controller proceeds to create a ReleaseManifest
resource that holds the component data for the Edge release version specified by the user under the releaseVersion
property in the UpgradePlan
resource.
Using the component data from the ReleaseManifest
, the Upgrade Controller proceeds to upgrade the Edge release components in the following order:
Operating System (OS) (Section 21.3.1, “Operating System upgrade”).
Kubernetes (Section 21.3.2, “Kubernetes upgrade”).
Additional components (Section 21.3.3, “Additional components upgrades”).
During the upgrade process, the Upgrade Controller continually outputs upgrade information to the created UpgradePlan
. For more information on how to track the upgrade process, see Tracking the upgrade process (Section 21.5, “Tracking the upgrade process”).
21.3.1 Operating System upgrade #
To upgrade the operating system, the Upgrade Controller creates SUC (Chapter 20, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to control plane node OS upgrades -
control-plane-<os-name>-<os-version>-<suffix>
.For SUC Plans related to worker node OS upgrades -
workers-<os-name>-<os-version>-<suffix>
.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual OS upgrade.
Depending on the ReleaseManifest
, the OS upgrade may include:
Package only updates - for use-cases where the OS version does not change between Edge releases.
Full OS migration - for use-cases where the OS version changes between Edge releases.
The upgrade is executed one node at a time starting with the control plane nodes first. Only if the control-plane node upgrade finishes will the worker nodes begin to be upgraded.
The Upgrade Controller configures the OS SUC Plans to do perform a drain of the cluster nodes if the cluster has more than one node of the specified type.
For clusters where the control plane nodes are greater than one and there is only one worker node, a drain will be performed only for the control plane nodes and vice versa.
For information on how to disable node drains altogether, see the UpgradePlan (Section 21.4.1, “UpgradePlan”) section.
21.3.2 Kubernetes upgrade #
To upgrade the Kubernetes distribution of a cluster, the Upgrade Controller creates SUC (Chapter 20, System Upgrade Controller) Plans that have the following naming template:
For SUC Plans related to control plane node Kubernetes upgrades -
control-plane-<k8s-version>-<suffix>
.For SUC Plans related to worker node Kubernetes upgrades -
workers-<k8s-version>-<suffix>
.
Based on these plans, SUC proceeds to create workloads on each node of the cluster that perform the actual Kubernetes upgrade.
The Kubernetes upgrade will happen one node at a time starting with the control plane nodes first. Only if the control plane node upgrade finishes will the worker nodes begin to be upgraded.
The Upgrade Controller configures the Kubernetes SUC Plans to perform a drain of the cluster nodes if the cluster has more than one node of the specified type.
For clusters where the control plane nodes are greater than one and there is only one worker node, a drain will be performed only for the control plane nodes and vice versa.
For information on how to disable node drains altogether, see Section 21.4.1, “UpgradePlan”.
21.3.3 Additional components upgrades #
Currently, all additional components are installed via Helm charts. For a full list of the components for a specific release, refer to the Release Notes (Section 40.1, “Abstract”).
For Helm charts deployed through EIB (Chapter 10, Edge Image Builder), the Upgrade Controller updates the existing HelmChart CR of each component.
For Helm charts deployed outside of EIB, the Upgrade Controller creates a HelmChart
resource for each component.
After the creation/update of the HelmChart
resource, the Upgrade Controller relies on the helm-controller to pick up this change and proceed with the actual component upgrade.
Charts will be upgraded sequentially based on their order in the ReleaseManifest
. Additional values can also be passed through the UpgradePlan
. For more information about this, refer to Section 21.4.1, “UpgradePlan”.
21.4 Kubernetes API extensions #
Extensions to the Kubernetes API introduced by the Upgrade Controller.
21.4.1 UpgradePlan #
The Upgrade Controller introduces a new Kubernetes custom resource called an UpgradePlan
.
The UpgradePlan
serves as an instruction mechanism for the Upgrade Controller and it supports the following configurations:
releaseVersion
- Edge release version to which the cluster should be upgraded to. The release version must follow semantic versioning and should be retrieved from the Release Notes (Section 40.1, “Abstract”).disableDrain
- Optional; instructs the Upgrade Controller on whether to disable node drains. Useful for when you have workloads with Disruption Budgets.Example for control plane node drain disablement:
spec: disableDrain: controlPlane: true
Example for control plane and worker node drain disablement:
spec: disableDrain: controlPlane: true worker: true
helm
- Optional; specifies additional values for components installed via Helm.WarningIt is only advised to use this field for values that are critical for upgrades. Standard chart value updates should be performed after the respective charts have been upgraded to the next version.
Example:
spec: helm: - chart: foo values: bar: baz
21.4.2 ReleaseManifest #
The Upgrade Controller introduces a new Kubernetes custom resource called a ReleaseManifest
.
The ReleaseManifest
resource is created by the Upgrade Controller and holds component data for one specific Edge release version. This means that each Edge release version upgrade will be represented by a different ReleaseManifest
resource.
The Release Manifest should always be created by the Upgrade Controller.
It is not advisable to manually create or edit the ReleaseManifest
resources. Users that decide to do so should do this at their own risk.
Component data that the Release Manifest ships include, but is not limited to:
For an example of how a Release Manifest can look, refer to the upstream documentation. Please note that this is just an example and it is not intended to be created as a valid ReleaseManifest
resource.
21.5 Tracking the upgrade process #
This section serves as means to track and debug the upgrade process that the Upgrade Controller initiates once the user creates an UpgradePlan
resource.
21.5.1 General #
General information about the state of the upgrade process can be viewed in the Upgrade Plan’s status conditions.
The Upgrade Plan resource’s status can be viewed in the following way:
kubectl get upgradeplan <upgradeplan_name> -n upgrade-controller-system -o yaml
Running Upgrade Plan example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.2.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Control plane nodes are being upgraded
reason: InProgress
status: "False"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Kubernetes upgrade is not yet started
reason: Pending
status: Unknown
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Rancher upgrade is not yet started
reason: Pending
status: Unknown
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Longhorn upgrade is not yet started
reason: Pending
status: Unknown
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: MetalLB upgrade is not yet started
reason: Pending
status: Unknown
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: CDI upgrade is not yet started
reason: Pending
status: Unknown
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: KubeVirt upgrade is not yet started
reason: Pending
status: Unknown
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: NeuVector upgrade is not yet started
reason: Pending
status: Unknown
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: EndpointCopierOperator upgrade is not yet started
reason: Pending
status: Unknown
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Elemental upgrade is not yet started
reason: Pending
status: Unknown
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: SRIOV upgrade is not yet started
reason: Pending
status: Unknown
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Akri upgrade is not yet started
reason: Pending
status: Unknown
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: Metal3 upgrade is not yet started
reason: Pending
status: Unknown
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:26:27Z"
message: RancherTurtles upgrade is not yet started
reason: Pending
status: Unknown
type: RancherTurtlesUpgraded
observedGeneration: 1
sucNameSuffix: 90315a2b6d
Here you can view every component that the Upgrade Controller will try to schedule an upgrade for. Each condition follows the below template:
lastTransitionTime
- the last time that this component condition has transitioned from one status to another.message
- message that indicates the current upgrade state of the specific component condition.reason
- the current upgrade state of the specific component condition. Possiblereasons
include:Succeeded
- upgrade of the specific component is successful.Failed
- upgrade of the specific component has failed.InProgress
- upgrade of the specific component is currently in progress.Pending
- upgrade of the specific component is not yet scheduled.Skipped
- specific component is not found on the cluster, so its upgrade will be skipped.Error
- specific component has encountered a transient error.
status
- status of the current conditiontype
, one ofTrue
,False
,Unknown
.type
- indicator for the currently upgraded component.
The Upgrade Controller creates SUC Plans for component conditions of type OSUpgraded
and KubernetesUpgraded
. To further track the SUC Plans created for these components, refer to Section 20.3, “Monitoring System Upgrade Controller Plans”.
All other component condition types can be further tracked by viewing the resources created for them by the helm-controller. For more information, see Section 21.5.2, “Helm Controller”.
An Upgrade Plan scheduled by the Upgrade Controller can be marked as successful
once:
There are no
Pending
orInProgress
component conditions.The
lastSuccessfulReleaseVersion
property points to thereleaseVersion
that is specified in the Upgrade Plan’s configuration. This property is added to the Upgrade Plan’s status by the Upgrade Controller once the upgrade process is successful.
Successful UpgradePlan
example:
apiVersion: lifecycle.suse.com/v1alpha1
kind: UpgradePlan
metadata:
name: upgrade-plan-mgmt-3-1-0
namespace: upgrade-controller-system
spec:
releaseVersion: 3.2.0
status:
conditions:
- lastTransitionTime: "2024-10-01T06:26:48Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: OSUpgraded
- lastTransitionTime: "2024-10-01T06:26:59Z"
message: All cluster nodes are upgraded
reason: Succeeded
status: "True"
type: KubernetesUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart rancher upgrade succeeded
reason: Succeeded
status: "True"
type: RancherUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart longhorn is not installed
reason: Skipped
status: "False"
type: LonghornUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Specified version of chart metallb is already installed
reason: Skipped
status: "False"
type: MetalLBUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart cdi is not installed
reason: Skipped
status: "False"
type: CDIUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart kubevirt is not installed
reason: Skipped
status: "False"
type: KubeVirtUpgraded
- lastTransitionTime: "2024-10-01T06:27:13Z"
message: Chart neuvector-crd is not installed
reason: Skipped
status: "False"
type: NeuVectorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Specified version of chart endpoint-copier-operator is already installed
reason: Skipped
status: "False"
type: EndpointCopierOperatorUpgraded
- lastTransitionTime: "2024-10-01T06:27:14Z"
message: Chart elemental-operator upgrade succeeded
reason: Succeeded
status: "True"
type: ElementalUpgraded
- lastTransitionTime: "2024-10-01T06:27:15Z"
message: Chart sriov-crd is not installed
reason: Skipped
status: "False"
type: SRIOVUpgraded
- lastTransitionTime: "2024-10-01T06:27:16Z"
message: Chart akri is not installed
reason: Skipped
status: "False"
type: AkriUpgraded
- lastTransitionTime: "2024-10-01T06:27:19Z"
message: Chart metal3 is not installed
reason: Skipped
status: "False"
type: Metal3Upgraded
- lastTransitionTime: "2024-10-01T06:27:27Z"
message: Chart rancher-turtles is not installed
reason: Skipped
status: "False"
type: RancherTurtlesUpgraded
lastSuccessfulReleaseVersion: 3.1.0
observedGeneration: 1
sucNameSuffix: 90315a2b6d
21.5.2 Helm Controller #
This section covers how to track resources created by the helm-controller.
The below steps assume that kubectl
has been configured to connect to the cluster where the Upgrade Controller has been deployed to.
Locate the
HelmChart
resource for the specific component:kubectl get helmcharts -n kube-system
Using the name of the
HelmChart
resource, locate the upgrade Pod that was created by thehelm-controller
:kubectl get pods -l helmcharts.helm.cattle.io/chart=<helmchart_name> -n kube-system # Example for Rancher kubectl get pods -l helmcharts.helm.cattle.io/chart=rancher -n kube-system NAME READY STATUS RESTARTS AGE helm-install-rancher-tv9wn 0/1 Completed 0 16m
View the logs of the component specific pod:
kubectl logs <pod_name> -n kube-system
21.6 Known Limitations #
Downstream cluster upgrades are not yet managed by the Upgrade Controller. For information on how to upgrade downstream clusters, refer to Chapter 32, Downstream clusters.
The Upgrade Controller expects any additional SUSE Edge Helm charts that are deployed through EIB (Chapter 10, Edge Image Builder) to have their HelmChart CR deployed in the
kube-system
namespace. To do this, configure theinstallationNamespace
property in your EIB definition file. For more information, see the upstream documentation.Currently the Upgrade Controller has no way to determine the current running Edge release version on the management cluster. Ensure to provide an Edge release version that is greater than the currently running Edge release version on the cluster.
Currently the Upgrade Controller supports non air-gapped environment upgrades only. Air-gapped upgrades are not yet possible.