26 Metal3 #
26.1 BareMetalHost selection and Cluster association #
Once a Metalˆ3ˆ cluster object and its corresponding associated objects are created, a process to choose which BareMetalHost will be part of
the cluster is performed.
This process connects a BareMetalHost with a specific Metal3MachineTemplate using standard
Kubernetes labels and selectors.
As an example, each BareMetalHost is labeled to identify its properties and intended cluster
(e.g., its cluster-role, the cluster name, location, etc.):
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: mynode1
labels:
cluster-role: control-plane
cluster: foobar
location: madrid
datacenter: xyz
<snip>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: mynode2
labels:
cluster-role: worker
cluster: foobar
location: madrid
datacenter: xyz
<snip>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: mynode3
labels:
cluster-role: worker
cluster: foobar2
location: madrid
datacenter: xyz
<snip>
...Then, the Metal3MachineTemplate object uses the spec.hostSelector field to match the desired BareMetalHost.
Both matchLabels (for exact key-value matching) and matchExpressions (for more complex rules) can be used:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
name: foobar-cluster-controlplane
namespace: mynamespace
spec:
template:
spec:
hostSelector:
matchLabels:
cluster-role: control-plane
cluster: foobar
<snip>
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
name: foobar-cluster-worker
namespace: mynamespace
spec:
template:
spec:
hostSelector:
matchExpressions:
- { key: cluster-role, operator: In, values: [worker] }
- { key: cluster, operator: In, values: [foobar] }
<snip>Kubernetes namespaces can be also used to better organize the different objects.
26.2 Clean up old EFI boot entries #
Sometimes, the UEFI boot manager contains multiple entries for older operating systems that are probably not needed anymore (especially for host being re-provisioned multiple times). You can clean up those old entries by following any of the following procedures:
Delete them on the BIOS/EFI setup interface directly (the exact procedure will depend on the hardware).
Run the UEFI
bcfgshell as:# List the entries bcfg boot dump -b # Delete entry number X bcfg boot rm X # X is the number associated the entry to remove. For example, if the entry is "Boot0002 foobar", then X is 2.Use
efibootmgron a Linux system as:# List the entries efibootmgr -v # Delete entry number X efibootmgr -b X -B
The process may leave orphaned files on the EFI System Partition (ESP), usually found under subdirectories named by the vendor (e.g., EFI/opensuse or EFI/Microsoft).
While these files are generally harmless, they should be deleted if they consume excessive space as it can prevent the installation of a new OS or a boot manager update.
Removal may require explicitly mounting the ESP, typically mounted as /boot/efi/EFI on Linux systems.
26.3 Custom network configuration using the two-secrets approach #
When Metal3 provisions a bare metal node, it goes through two distinct phases that may each require different network configuration:
The IPA phase, where the Ironic Python Agent (IPA) ramdisk runs during hardware inspection and provisioning
The target OS phase, where the deployed SLE Micro system runs after first boot
The two-secrets approach addresses this by allowing a separate network configuration secret for each phase, using the preprovisioningNetworkDataName field for the IPA phase and the networkData field for the target OS phase.
This is particularly useful when interface names differ between phases, which can happen because the IPA kernel and the SLE Micro kernel may discover the same hardware under different names.
26.3.1 Example of interface renaming for VLANs #
A common scenario is when hardware gets a long PCI-based interface name such as enp1s0np123.
Adding a VLAN on top of it may exceed the Linux kernel hard limit of 15 characters for interface names:
enp1s0np123.100 = 15 chars (barely fits, risky)
enp1s0np123.3669 = 17 chars (exceeds limit, fails)
eth0.3669 = 9 chars (works)The IPA phase must reference enp1s0np123 (the kernel-discovered name), while the target OS should use a short name like eth0 so that eth0.3669 stays under the limit.
nmc (nm-configurator) bridges the two phases by matching interfaces via MAC address rather than name — you declare name: eth0 alongside the hardware MAC address, and nmc creates the NetworkManager profile with the desired name regardless of what the kernel assigned.
26.3.2 Prerequisites: #
26.3.2.1 EIB image setup #
As per the static network configuration guide the EIB image must include a first-boot script that reads the network configuration from the config-2 partition Metal3 writes during provisioning.
Create the following script at /opt/EIB/network/configure-network.sh:
#!/bin/bash
set -eux
# Source: https://documentation.suse.com/suse-edge/3.5/html/edge/quickstart-metal3.html#metal3-add-network-eib
CONFIG_DRIVE=$(blkid --label config-2 || true)
if [ -z "${CONFIG_DRIVE}" ]; then
echo "No config-2 device found, skipping network configuration"
exit 0
fi
mount -o ro $CONFIG_DRIVE /mnt
NETWORK_DATA_FILE="/mnt/openstack/latest/network_data.json"
if [ ! -f "${NETWORK_DATA_FILE}" ]; then
umount /mnt
echo "No network_data.json found, skipping network configuration"
exit 0
fi
DESIRED_HOSTNAME=$(cat /mnt/openstack/latest/meta_data.json | tr ',{}' '\n' | grep '\"metal3-name\"' | sed 's/.*\"metal3-name\": \"\(.*\)\"/\1/')
echo "${DESIRED_HOSTNAME}" > /etc/hostname
mkdir -p /tmp/nmc/{desired,generated}
cp ${NETWORK_DATA_FILE} /tmp/nmc/desired/_all.yaml
umount /mnt
./nmc generate --config-dir /tmp/nmc/desired --output-dir /tmp/nmc/generated
./nmc apply --config-dir /tmp/nmc/generatedThen make it executable and build the EIB image as normal:
mkdir -p /opt/EIB/network
chmod +x /opt/EIB/network/configure-network.sh===
EIB automatically picks up scripts from the network/ directory. Combustion runs them on first boot in initramfs, before the full OS starts.
===
===
The script also sets the node hostname from Metal3's metal3-name metadata field.
===
=== Configuring the two secrets
The following examples use dummy values throughout: data NIC MAC aa:bb:cc:11:22:33, boot NIC MAC aa:bb:cc:44:55:66, node IP 10.0.0.10/24, gateway 10.0.0.1, DNS 10.0.0.53, VLAN ID 100, and BMC address 10.1.0.10.
Secret 1 — IPA phase (static-networkdata-ipa.yaml): references the kernel-assigned interface name. DHCP is used here to keep it simple during hardware discovery:
apiVersion: v1
kind: Secret
metadata:
name: static-networkdata-ipa
namespace: default
type: Opaque
stringData:
networkData: |
interfaces:
- name: enp1s0np123
type: ethernet
state: up
mac-address: "aa:bb:cc:11:22:33"
ipv4:
enabled: true
dhcp: true
dns-resolver:
config:
server:
- 10.0.0.53Secret 2 — target OS phase (static-networkdata-os.yaml): references the desired short name and declares the VLAN. The same MAC address is used so nmc can match the interface:
apiVersion: v1
kind: Secret
metadata:
name: static-networkdata-os
namespace: default
type: Opaque
stringData:
networkData: |
interfaces:
- name: eth0
type: ethernet
state: up
mac-address: "aa:bb:cc:11:22:33"
mtu: 1500
ipv4:
enabled: false
dhcp: false
- name: eth0.100
type: vlan
state: up
mtu: 1500
vlan:
base-iface: eth0
id: 100
ipv4:
address:
- ip: 10.0.0.10
prefix-length: 24
enabled: true
dhcp: false
dns-resolver:
config:
server:
- 10.0.0.53
routes:
config:
- destination: 0.0.0.0/0
next-hop-address: 10.0.0.1
next-hop-interface: eth0.100The BareMetalHost object references both secrets:
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: my-node
namespace: default
spec:
online: true
bootMACAddress: "aa:bb:cc:44:55:66"
rootDeviceHints:
deviceName: /dev/nvme0n1
bmc:
address: redfish-virtualmedia://10.1.0.10/redfish/v1/Systems/1/
disableCertificateVerification: true
credentialsName: my-node-credentials
preprovisioningNetworkDataName: static-networkdata-ipa
networkData:
name: static-networkdata-ospreprovisioningNetworkDataName is a plain string field, while networkData is a SecretReference object requiring a name: sub-key.
The syntax differs between the two and is a common source of errors.
Apply all objects:
kubectl apply -f bmc-credentials.yaml
kubectl apply -f static-networkdata-ipa.yaml
kubectl apply -f static-networkdata-os.yaml
kubectl apply -f baremetalhost.yamlAfter provisioning, SSH to the node and verify:
# Interface names
ip link show
# Expected: eth0 and eth0.100@eth0
# IP on VLAN interface
ip addr show eth0.100
# NetworkManager profiles
nmcli connection show
# VLAN details
nmcli connection show eth0.100 | grep -E '(vlan.parent|vlan.id)'