59 Lifecycle actions #
This section covers the lifecycle management actions for clusters deployed via SUSE Telco Cloud.
59.1 Load Balancer Exclusion #
There are many lifecycle actions that require nodes to be drained. During the draining process, all pods will be moved to other nodes in the cluster. After the draining process is finished, the node does not host any services and therefore should not have any traffic routed to it. Load balancers, such as MetalLB, can be made aware of this by applying a label to the node:
node.kubernetes.io/exclude-from-external-load-balancers: "true"For more details see: Kubernetes Documentation.
To see the labels on all your nodes in a cluster, you can run:
kubectl get nodes -o json | jq -r '.items[].metadata | .name, .labels'In the case of upgrades of downstream clusters, this can be automated by annotating the RKE2ControlPlane on the management cluster:
rke2.controlplane.cluster.x-k8s.io/load-balancer-exclusion="true"This immediately creates an annotation on all machine objects on the management cluster for that RKE2ControlPlane.
pre-drain.delete.hook.machine.cluster.x-k8s.io/rke2-lb-exclusion: ""With this annotation on the machine objects, any node on the downstream cluster that is scheduled for draining will get the above node label attached prior to the start of the draining process. The label will be removed from the node once it is available and ready again.
59.2 Management cluster upgrades #
The upgrade of the management cluster is described in the Day 2 management cluster (Chapter 58, Management Cluster) documentation.
59.3 Downstream cluster upgrades #
Upgrading downstream clusters involves updating several components. The following sections cover the upgrade process for each of the components.
Upgrading the operating system
For this process, check the following reference (Chapter 49, Prepare downstream cluster image for connected scenarios) to build the new image with a new operating system version.
With this new image generated by EIB, the next provision phase uses the new operating version provided.
In the following step, the new image is used to upgrade the nodes.
Upgrading the RKE2 cluster
The changes required to upgrade the RKE2 cluster using the automated workflow are the following:
Change the block
RKE2ControlPlanein thecapi-provisioning-example.yamlshown in the following section (Chapter 51, Downstream cluster provisioning with Directed network provisioning (single-node)):Specify the desired
rolloutStrategy.Change the version of the
RKE2cluster to the new version replacing${RKE2_NEW_VERSION}.Decide if an ingress controller is to be deployed in the downstream cluster:
[Option 0]: Do not deploy any ingress controller
[Option 1]: Deploy only
Traefik[Option 2]: Deploy both
Ingress-NGINXandTraefik(to be used for complex ingress migration scenarios)
The Traefik ingress provider integrated into RKE2/K3s is the only ingress controller supported in SUSE Telco Cloud 3.6 release, being still possible to temporarily run Ingress-NGINX alongside Traefik in order to support complex ingress migration scenarios, but only after SUSE Telco Cloud Management and/or Downstream clusters have been upgraded to version 3.6 and for the time required to perform that migration. Since Traefik is not yet the default ingress controller in RKE2 (it will be
from RKE2 v1.36 onwards), it must be explicitly "requested" from the RKE2 server configuration file.
RKE2 Ingress NGINX to Traefik Migration guide provides details on the ingress migration paths available
once the Traefik ingress controller replaces the discontinued Ingress-NGINX.
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: RKE2ControlPlane
metadata:
name: single-node-cluster
namespace: default
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
name: single-node-cluster-controlplane
version: ${RKE2_NEW_VERSION}
replicas: 1
rolloutStrategy:
type: "RollingUpdate"
rollingUpdate:
maxSurge: 0
serverConfig:
cni: cilium
#===========================================================================
# Uncomment the following lines if selecting [Option 0]: Do not deploy
# any ingress controller
#===========================================================================
#disableComponents:
# pluginComponents:
# - "rke2-ingress-nginx"
#---------------------------------------------------------------------------
rolloutStrategy:
rollingUpdate:
maxSurge: 0
registrationMethod: "control-plane-endpoint"
agentConfig:
format: ignition
additionalUserData:
config: |
variant: fcos
version: 1.4.0
systemd:
units:
- name: rke2-preinstall.service
enabled: true
contents: |
[Unit]
Description=rke2-preinstall
Wants=network-online.target
Before=rke2-install.service
ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
[Service]
Type=oneshot
User=root
ExecStartPre=/bin/sh -c "mount -L config-2 /mnt"
ExecStart=/bin/sh -c "sed -i \"s/BAREMETALHOST_UUID/$(jq -r .uuid /mnt/openstack/latest/meta_data.json)/\" /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-name: $(jq -r .name /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-label:\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \" - metal3.io/uuid=$(jq -r .uuid /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStartPost=/bin/sh -c "umount /mnt"
[Install]
WantedBy=multi-user.target
# rke2-ingress-deployment.service unit
- name: rke2-ingress-deployment.service
enabled: true
contents: |
[Unit]
Description=rke2-ingress-deployment
Wants=rke2-preinstall.service
Before=rke2-install.service
ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
[Service]
Type=oneshot
User=root
#===============================================================================================================================
# Leave one (and only one) of the two following ExecStart lines uncommented, depending on the desired ingress-controller(s):
# [Option 1]: Deploy only "Traefik"
# [Option 2]: Deploy both "Ingress-NGINX" and "Traefik"
#
# Keep both commented ONLY in case of seleting [Option 0]: "Do not deploy any ingress controller"
#===============================================================================================================================
#ExecStart=/bin/sh -c "echo \"ingress-controller: traefik\" >> /etc/rancher/rke2/config.yaml" # [Option 1]
ExecStart=/bin/sh -c "echo -e \"ingress-controller:\n- ingress-nginx\n- traefik\" >> /etc/rancher/rke2/config.yaml" # [Option 2]
#-------------------------------------------------------------------------------------------------------------------------------
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /var/lib/rancher/rke2/server/manifests
overwrite: true
files:
#############################################################################
# if [Option 2]: "Deploy both `Ingress-NGINX` and `Traefik`" is selected
#############################################################################
- path: /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
hostPort:
enabled: false # not needed when exposing through a type:LoadBalancer service
config:
use-forwarded-headers: "true"
enable-real-ip: "true"
publishService:
enabled: true
service:
enabled: true
type: LoadBalancer
externalTrafficPolicy: Local
mode: 0644
user:
name: root
group:
name: root
#############################################################################
# if [Option 1]: "Deploy only `Traefik`" OR [Option 2]: "Deploy both
#`Ingress-NGINX` and `Traefik`" is selected
#############################################################################
- path: /var/lib/rancher/rke2/server/manifests/rke2-traefik-config.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-traefik
namespace: kube-system
spec:
valuesContent: |-
ingressClass:
isDefaultClass: false # Assumes [Option 2]; set to true if [Option 1]: "only deploying `Traefik`"
ports:
web:
hostPort: null # disallow hostPort
exposedPort: 80
websecure:
hostPort: null # disallow hostPort
exposedPort: 443
service:
enabled: true
type: LoadBalancer
spec:
externalTrafficPolicy: Local
allocateLoadBalancerNodePorts: false # k8s GA from 1.24; supported by MetalLB
providers:
kubernetesIngressNginx: # this provider allows Traefik to "understand" most of the Ingress-NGINX annotations
enabled: true
ingressClass: "rke2-ingress-nginx-migration"
controllerClass: "rke2.cattle.io/ingress-nginx-migration"
mode: 0644
user:
name: root
group:
name: root
kubelet:
extraArgs:
- provider-id=metal3://BAREMETALHOST_UUID
nodeName: "localhost.localdomain"Change the block
Metal3MachineTemplatein thecapi-provisioning-example.yamlshown in the following section (Chapter 51, Downstream cluster provisioning with Directed network provisioning (single-node)):Change the image name and checksum to the new version generated in the previous step.
Add the directive
nodeReusetotrueto avoid creating a new node.Add the directive
automatedCleaningModetometadatato enable the automated cleaning for the node.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
name: single-node-cluster-controlplane
namespace: default
spec:
nodeReuse: True
template:
spec:
automatedCleaningMode: metadata
dataTemplate:
name: single-node-cluster-controlplane-template
hostSelector:
matchLabels:
cluster-role: control-plane
image:
checksum: http://imagecache.local:8080/${NEW_IMAGE_GENERATED}.sha256
checksumType: sha256
format: raw
url: http://imagecache.local:8080/${NEW_IMAGE_GENERATED}.rawBefore applying the capi-provisioning-example.yaml file, it is always a good
practice to inform external load balancers (e.g. MetalLB) about nodes being
drained so that they do not route traffic to nodes in this state. As mentioned
in the Section 59.1, “Load Balancer Exclusion” section, you can automate this by annotating
the RKE2ControlPlane on the management cluster. In this example, an
RKE2ControlPlane object called multinode-cluster is annotated:
kubectl annotate RKE2ControlPlane/multinode-cluster rke2.controlplane.cluster.x-k8s.io/load-balancer-exclusion="true"Verify that the machine objects have been annotated:
pre-drain.delete.hook.machine.cluster.x-k8s.io/rke2-lb-exclusion: ""Fetch the annotations for all your machine objects:
kubectl get machines -o json | jq -r '.items[].metadata | .name, .annotations'Without these annotations users might experience longer response times for services as the load-balancers are unaware of drained nodes.
After making these changes, the capi-provisioning-example.yaml file can be applied to the cluster using the following command:
kubectl apply -f capi-provisioning-example.yaml