52 Downstream cluster provisioning with Directed network provisioning (multi-node) #
This section describes the workflow used to automate the provisioning of a multi-node downstream cluster using directed network provisioning and MetalLB as a load-balancer strategy.
This is the simplest way to automate the provisioning of a downstream cluster. The following diagram shows the workflow used to automate the provisioning of a multi-node downstream cluster using directed network provisioning and MetalLB.
Requirements
The image generated using
EIB, as described in the previous section (Chapter 49, Prepare downstream cluster image for connected scenarios), with the minimal configuration to set up the downstream cluster has to be located in the management cluster exactly on the path you configured on this section (Note).The management server created and available to be used on the following sections. For more information, refer to the Management Cluster section: Part V, “Setting up the management cluster”.
Workflow
The following diagram shows the workflow used to automate the provisioning of a multi-node downstream cluster using directed network provisioning:
Enroll the three bare-metal hosts to make them available for the provisioning process.
Provision the three bare-metal hosts to install and configure the operating system and the Kubernetes cluster using
MetalLB.
Enroll the bare-metal hosts
The first step is to enroll the three bare-metal hosts in the management cluster to make them available to be provisioned.
To do that, the following files (bmh-example-node1.yaml, bmh-example-node2.yaml and bmh-example-node3.yaml) must be created in the management cluster, to specify the BMC credentials to be used and the BaremetalHost object to be enrolled in the management cluster.
Only the values between
$\{…\}have to be replaced with the real values.We will walk you through the process for only one host. The same steps apply to the other two nodes.
apiVersion: v1
kind: Secret
metadata:
name: node1-example-credentials
type: Opaque
data:
username: ${BMC_NODE1_USERNAME}
password: ${BMC_NODE1_PASSWORD}
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: node1-example
labels:
cluster-role: control-plane
spec:
architecture: x86_64
online: true
bootMACAddress: ${BMC_NODE1_MAC}
bmc:
address: ${BMC_NODE1_ADDRESS}
disableCertificateVerification: true
credentialsName: node1-example-credentialsWhere:
${BMC_NODE1_USERNAME}— The username for the BMC of the first bare-metal host.${BMC_NODE1_PASSWORD}— The password for the BMC of the first bare-metal host.${BMC_NODE1_MAC}— The MAC address of the first bare-metal host to be used.${BMC_NODE1_ADDRESS}— The URL for the first bare-metal host BMC (for example,redfish-virtualmedia://192.168.200.75/redfish/v1/Systems/1/). The host part of the URL can be an IP address (v4 or v6) or a domain name, where the existing infrastructure allows. To learn more about the different options available depending on your hardware provider, check the following link.
If no network configuration for the host has been specified, either at image build time or through the
BareMetalHostdefinition, an autoconfiguration mechanism (DHCP, DHCPv6, SLAAC) will be used. For more details or complex configurations, check the Chapter 53, Advanced Network Configuration.Single-stack IPv6 clusters are in tech preview status and not yet officially supported.
Architecture must be either
x86_64oraarch64, depending on the architecture of the bare-metal host to be enrolled.All modern servers come with a dual-stack capable BMC, however IPv6 support (and possibly the option of using hostnames for the VirtualMedia capability) should be verified before use in production in a dual-stack environment.
Once the file is created, the following command must be executed in the management cluster to start enrolling the bare-metal hosts in the management cluster:
$ kubectl apply -f bmh-example-node1.yaml
$ kubectl apply -f bmh-example-node2.yaml
$ kubectl apply -f bmh-example-node3.yamlThe new bare-metal host objects are enrolled, changing their state from registering to inspecting and available. The changes can be checked using the following command:
$ kubectl get bmh -o wideThe BaremetalHost object is in the registering state until the BMC credentials are validated. Once the credentials are validated, the BaremetalHost object changes its state to inspecting, and this step could take some time depending on the hardware (up to 20 minutes). During the inspecting phase, the hardware information is retrieved and the Kubernetes object is updated. Check the information using the following command: kubectl get bmh -o yaml.
Provision step
Once the three bare-metal hosts are enrolled and available, the next step is to provision the bare-metal hosts to install and configure the operating system and the Kubernetes cluster, creating a load balancer to manage them.
To do that, the following file (capi-provisioning-example.yaml) must be created in the management cluster with the following information (the `capi-provisioning-example.yaml can be generated by joining the following blocks).
Only values between
$\{…\}must be replaced with the real values.The
VIPaddress is a reserved IP address that is not assigned to any node and is used to configure the load balancer. In a dual-stack cluster, both an IPv4 and IPv6 can be specified, but in the following examples priority will be given to the IPv4 address.
Below is the cluster definition, where the cluster network can be configured using the pods and the services blocks. Also, it contains the references to the control plane and the infrastructure (using the Metal3 provider) objects to be used.
apiVersion: cluster.x-k8s.io/v1beta2
kind: Cluster
metadata:
name: multinode-cluster
namespace: default
labels:
cluster-api.cattle.io/rancher-auto-import: "true"
spec:
clusterNetwork:
pods:
cidrBlocks:
- 192.168.0.0/18
- fd00:1234:4321::/48
services:
cidrBlocks:
- 10.96.0.0/12
- fd00:5678:8765:4321::/112
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: RKE2ControlPlane
name: multinode-cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3Cluster
name: multinode-clusterBoth single-stack and dual-stack deployments are possible, remove the IPv6 CIDRs and IPv6 VIP addresses (in the subsequent sections) for an IPv4 only cluster.
Adding the label
cluster-api.cattle.io/rancher-auto-import: "true"to thecluster.x-k8s.ioobjects will import the cluster into Rancher (by creating a correspondingclusters.management.cattle.ioobject). See the Cluster API documentation for more information.
The Metal3Cluster object specifies the control-plane endpoint that uses the VIP address already reserved (replacing the ${EDGE_VIP_ADDRESS_IPV4}) to be configured and the noCloudProvider because the three bare-metal nodes are used.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3Cluster
metadata:
name: multinode-cluster
namespace: default
spec:
controlPlaneEndpoint:
host: ${EDGE_VIP_ADDRESS_IPV4}
port: 6443
noCloudProvider: trueThe RKE2ControlPlane object specifies the control-plane configuration to be used, and the Metal3MachineTemplate object specifies the control-plane image to be used.
A load balancer exclusion annotation that informs external load balancers like MetalLB that a node is going to be drained during lifecycle operations like upgrades of downstream clusters. For details see: Section 59.1, “Load Balancer Exclusion”
The number of replicas to be used (in this case, three).
The advertisement mode to be used by the Load Balancer (
addressuses the L2 implementation), as well as the address to be used (replacing the${EDGE_VIP_ADDRESS}with theVIPaddress).The
serverConfigwith theCNIplug-in to be used (in this case,Cilium), and the additionalVIPaddress(es) and name(s) to be listed undertlsSan.The agentConfig block contains the
Ignitionformat to be used and theadditionalUserDatato be used to configure theRKE2node with information like:The systemd service named
rke2-preinstall.serviceto replace automatically theBAREMETALHOST_UUIDandnode-nameduring the provisioning process using the Ironic information plus adding themetal3.io/uuidlabel to Node objects with theBareMetalHostUUID.The systemd service named
rke2-traefik-deployment.serviceto set the RKE2ingress-controllerconfig. server option in/etc/rancher/rke2/config.yamlfile totraefik.The
storageblock which contains the Helm charts to be used to install theMetalLBand theendpoint-copier-operator.The
metalLBcustom resource file with theIPaddressPooland theL2Advertisementto be used (replacing${EDGE_VIP_ADDRESS_IPV4}with theVIPaddress).The
endpoint-svc.yamlfile to be used to configure thekubernetes-vipservice to be used by theMetalLBto manage theVIPaddress.
The last block of information contains the Kubernetes version to be used. The
${RKE2_VERSION}is the version ofRKE2to be used replacing this value (for example,v1.35.3+rke2r3).
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: RKE2ControlPlane
metadata:
name: multinode-cluster
namespace: default
annotations: {
rke2.controlplane.cluster.x-k8s.io/load-balancer-exclusion: "true"
}
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
name: multinode-cluster-controlplane
replicas: 3
version: ${RKE2_VERSION}
rolloutStrategy:
type: "RollingUpdate"
rollingUpdate:
maxSurge: 0
registrationMethod: "control-plane-endpoint"
registrationAddress: ${EDGE_VIP_ADDRESS}
serverConfig:
cni: cilium
tlsSan:
- ${EDGE_VIP_ADDRESS_IPV4}
- ${EDGE_VIP_ADDRESS_IPV6}
- https://${EDGE_VIP_ADDRESS_IPV4}.sslip.io
- https://${EDGE_VIP_ADDRESS_IPV6}.sslip.io
agentConfig:
format: ignition
additionalUserData:
config: |
variant: fcos
version: 1.4.0
systemd:
units:
- name: rke2-preinstall.service
enabled: true
contents: |
[Unit]
Description=rke2-preinstall
Wants=network-online.target
Before=rke2-install.service
ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
[Service]
Type=oneshot
User=root
ExecStartPre=/bin/sh -c "mount -L config-2 /mnt"
ExecStart=/bin/sh -c "sed -i \"s/BAREMETALHOST_UUID/$(jq -r .uuid /mnt/openstack/latest/meta_data.json)/\" /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-name: $(jq -r .name /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-label:\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \" - metal3.io/uuid=$(jq -r .uuid /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStartPost=/bin/sh -c "umount /mnt"
[Install]
WantedBy=multi-user.target
# rke2-traefik-deployment.service unit to be removed once "traefik" being the default ingress controller (starting with RKE2 v1.36)
- name: rke2-traefik-deployment.service
enabled: true
contents: |
[Unit]
Description=rke2-traefik-deployment
Wants=rke2-preinstall.service
Before=rke2-install.service
ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
[Service]
Type=oneshot
User=root
ExecStart=/bin/sh -c "echo \"ingress-controller: traefik\" >> /etc/rancher/rke2/config.yaml"
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /var/lib/rancher/rke2/server/manifests
overwrite: true
files:
# https://docs.rke2.io/networking/multus_sriov#using-multus-with-cilium
- path: /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
cni:
exclusive: false
mode: 0644
user:
name: root
group:
name: root
- path: /var/lib/rancher/rke2/server/manifests/endpoint-copier-operator.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: endpoint-copier-operator
namespace: kube-system
spec:
chart: oci://registry.suse.com/edge/charts/endpoint-copier-operator
targetNamespace: endpoint-copier-operator
version: 306.0.1+up0.3.0
createNamespace: true
- path: /var/lib/rancher/rke2/server/manifests/metallb.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: metallb
namespace: kube-system
spec:
chart: oci://registry.suse.com/edge/charts/metallb
targetNamespace: metallb-system
version: 306.0.2+up0.15.3
createNamespace: true
- path: /var/lib/rancher/rke2/server/manifests/metallb-cr.yaml
overwrite: true
contents:
inline: |
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: kubernetes-vip-ip-pool
namespace: metallb-system
spec:
addresses:
- ${EDGE_VIP_ADDRESS_IPV4}/32
- ${EDGE_VIP_ADDRESS_IPV6}/128
serviceAllocation:
priority: 100
namespaces:
- default
serviceSelectors:
- matchExpressions:
- {key: "serviceType", operator: In, values: [kubernetes-vip]}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: ip-pool-l2-adv
namespace: metallb-system
spec:
ipAddressPools:
- kubernetes-vip-ip-pool
- path: /var/lib/rancher/rke2/server/manifests/endpoint-svc.yaml
overwrite: true
contents:
inline: |
apiVersion: v1
kind: Service
metadata:
name: kubernetes-vip
namespace: default
labels:
serviceType: kubernetes-vip
spec:
ipFamilyPolicy: PreferDualStack
ports:
- name: rke2-api
port: 9345
protocol: TCP
targetPort: 9345
- name: k8s-api
port: 6443
protocol: TCP
targetPort: 6443
type: LoadBalancer
- path: /var/lib/rancher/rke2/server/manifests/rke2-traefik-config.yaml
overwrite: true
contents:
inline: |
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-traefik
namespace: kube-system
spec:
valuesContent: |-
ingressClass:
isDefaultClass: true
ports:
web:
hostPort: null # disallow hostPort
exposedPort: 80
websecure:
hostPort: null # disallow hostPort
exposedPort: 443
service:
enabled: true
type: LoadBalancer
spec:
externalTrafficPolicy: Local
allocateLoadBalancerNodePorts: false # k8s GA from 1.24; supported by MetalLB
mode: 0644
user:
name: root
group:
name: root
kubelet:
extraArgs:
- provider-id=metal3://BAREMETALHOST_UUID
nodeName: "Node-multinode-cluster"The Metal3MachineTemplate object specifies the following information:
The
dataTemplateto be used as a reference to the template.The
hostSelectorto be used matching with the label created during the enrollment process.The
imageto be used as a reference to the image generated usingEIBon the previous section (Chapter 49, Prepare downstream cluster image for connected scenarios), andchecksumandchecksumTypeto be used to validate the image.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
name: multinode-cluster-controlplane
namespace: default
spec:
template:
spec:
dataTemplate:
name: multinode-cluster-controlplane-template
hostSelector:
matchLabels:
cluster-role: control-plane
image:
checksum: http://imagecache.local:8080/eibimage-output-telco.raw.sha256
checksumType: sha256
format: raw
url: http://imagecache.local:8080/eibimage-output-telco.rawThe Metal3DataTemplate object specifies the metaData for the downstream cluster.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3DataTemplate
metadata:
name: multinode-cluster-controlplane-template
namespace: default
spec:
clusterName: multinode-cluster
metaData:
objectNames:
- key: name
object: machine
- key: local-hostname
object: machine
- key: local_hostname
object: machineThe following yaml files are an example configuration for the worker nodes.
A MachineDeployment:
apiVersion: cluster.x-k8s.io/v1beta2
kind: MachineDeployment
metadata:
labels:
cluster.x-k8s.io/cluster-name: multinode-cluster
nodepool: nodepool-0
name: multinode-cluster-workers
namespace: default
spec:
clusterName: multinode-cluster
replicas: 3
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: multinode-cluster
nodepool: nodepool-0
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: multinode-cluster
nodepool: nodepool-0
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: RKE2ConfigTemplate
name: multinode-cluster-workers
clusterName: multinode-cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
name: multinode-cluster-workers
deletion:
nodeDrainTimeoutSeconds: 0
version: ${RKE2_VERSION}The RKE2ConfigTemplate` object specifies the configuration template to be used for multinode cluster worker nodes.
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: RKE2ConfigTemplate
metadata:
name: multinode-cluster-workers
namespace: default
spec:
template:
spec:
agentConfig:
format: ignition
kubelet:
extraArgs:
- provider-id=metal3://BAREMETALHOST_UUID
nodeName: "Node-multinode-cluster-worker"
additionalUserData:
config: |
variant: fcos
version: 1.4.0
systemd:
units:
- name: rke2-preinstall.service
enabled: true
contents: |
[Unit]
Description=rke2-preinstall
Wants=network-online.target
Before=rke2-install.service
ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
[Service]
Type=oneshot
User=root
ExecStartPre=/bin/sh -c "mount -L config-2 /mnt"
ExecStart=/bin/sh -c "sed -i \"s/BAREMETALHOST_UUID/$(jq -r .uuid /mnt/openstack/latest/meta_data.json)/\" /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-name: $(jq -r .name /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \"node-label:\" >> /etc/rancher/rke2/config.yaml"
ExecStart=/bin/sh -c "echo \" - metal3.io/uuid=$(jq -r .uuid /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
ExecStartPost=/bin/sh -c "umount /mnt"
[Install]
WantedBy=multi-user.targetThe Metal3MachineTemplate object contain references to dataTemplate, hostSelector, and image for the worker nodes:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
name: multinode-cluster-workers
namespace: default
spec:
template:
spec:
dataTemplate:
name: multinode-cluster-workers-template
hostSelector:
matchLabels:
cluster-role: worker
image:
checksum: http://imagecache.local:8080/eibimage-slmicro-rt-telco.raw.sha256
checksumType: sha256
format: raw
url: http://imagecache.local:8080/eibimage-slmicro-rt-telco.rawThe Metal3DataTemplate object specifies the metaData for the downstream cluster for the worker nodes:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3DataTemplate
metadata:
name: multinode-cluster-workers-template
namespace: default
spec:
clusterName: multinode-cluster
metaData:
objectNames:
- key: name
object: machine
- key: local-hostname
object: machine
- key: local_hostname
object: machineOnce the file is created by joining the previous blocks, run the following command in the management cluster to start provisioning the new three bare-metal hosts:
$ kubectl apply -f capi-provisioning-example.yaml