26 Metal³ #

26.1 `BareMetalHost` selection and Cluster association #

Once a Metalˆ3ˆ cluster object and its corresponding associated objects are created, a process to choose which BareMetalHost will be part of the cluster is performed. This process connects a BareMetalHost with a specific Metal3MachineTemplate using standard Kubernetes labels and selectors.

As an example, each BareMetalHost is labeled to identify its properties and intended cluster (e.g., its cluster-role, the cluster name, location, etc.):

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: mynode1
  labels:
    cluster-role: control-plane
    cluster: foobar
    location: madrid
    datacenter: xyz
<snip>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: mynode2
  labels:
    cluster-role: worker
    cluster: foobar
    location: madrid
    datacenter: xyz
<snip>
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: mynode3
  labels:
    cluster-role: worker
    cluster: foobar2
    location: madrid
    datacenter: xyz
<snip>
...

Then, the Metal3MachineTemplate object uses the spec.hostSelector field to match the desired BareMetalHost.

Both matchLabels (for exact key-value matching) and matchExpressions (for more complex rules) can be used:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
  name: foobar-cluster-controlplane
  namespace: mynamespace
spec:
  template:
    spec:
      hostSelector:
        matchLabels:
          cluster-role: control-plane
          cluster: foobar
<snip>
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: Metal3MachineTemplate
metadata:
  name: foobar-cluster-worker
  namespace: mynamespace
spec:
  template:
    spec:
      hostSelector:
        matchExpressions:
          - { key: cluster-role, operator: In, values: [worker] }
          - { key: cluster, operator: In, values: [foobar] }
<snip>

Note

Kubernetes namespaces can be also used to better organize the different objects.

26.2 Clean up old EFI boot entries #

Sometimes, the UEFI boot manager contains multiple entries for older operating systems that are probably not needed anymore (especially for host being re-provisioned multiple times). You can clean up those old entries by following any of the following procedures:

Delete them on the BIOS/EFI setup interface directly (the exact procedure will depend on the hardware).

Run the UEFI bcfg shell as:

# List the entries
bcfg boot dump -b
# Delete entry number X
bcfg boot rm X
# X is the number associated the entry to remove. For example, if the entry is "Boot0002 foobar", then X is 2.

Use efibootmgr on a Linux system as:

# List the entries
efibootmgr -v
# Delete entry number X
efibootmgr -b X -B

The process may leave orphaned files on the EFI System Partition (ESP), usually found under subdirectories named by the vendor (e.g., EFI/opensuse or EFI/Microsoft). While these files are generally harmless, they should be deleted if they consume excessive space as it can prevent the installation of a new OS or a boot manager update. Removal may require explicitly mounting the ESP, typically mounted as /boot/efi/EFI on Linux systems.

26.3 Custom network configuration using the two-secrets approach #

When Metal³ provisions a bare metal node, it goes through two distinct phases that may each require different network configuration:

The IPA phase, where the Ironic Python Agent (IPA) ramdisk runs during hardware inspection and provisioning
The target OS phase, where the deployed SLE Micro system runs after first boot

The two-secrets approach addresses this by allowing a separate network configuration secret for each phase, using the preprovisioningNetworkDataName field for the IPA phase and the networkData field for the target OS phase. This is particularly useful when interface names differ between phases, which can happen because the IPA kernel and the SLE Micro kernel may discover the same hardware under different names.

26.3.1 Example of interface renaming for VLANs #

A common scenario is when hardware gets a long PCI-based interface name such as enp1s0np123. Adding a VLAN on top of it may exceed the Linux kernel hard limit of 15 characters for interface names:

enp1s0np123.100   = 15 chars  (barely fits, risky)
enp1s0np123.3669  = 17 chars  (exceeds limit, fails)
eth0.3669         =  9 chars  (works)

The IPA phase must reference enp1s0np123 (the kernel-discovered name), while the target OS should use a short name like eth0 so that eth0.3669 stays under the limit. nmc (nm-configurator) bridges the two phases by matching interfaces via MAC address rather than name — you declare name: eth0 alongside the hardware MAC address, and nmc creates the NetworkManager profile with the desired name regardless of what the kernel assigned.

26.3.2 Prerequisites: #

26.3.2.1 EIB image setup #

As per the static network configuration guide the EIB image must include a first-boot script that reads the network configuration from the config-2 partition Metal³ writes during provisioning. Create the following script at /opt/EIB/network/configure-network.sh:

#!/bin/bash
set -eux

# Source: https://documentation.suse.com/suse-edge/3.5/html/edge/quickstart-metal3.html#metal3-add-network-eib

CONFIG_DRIVE=$(blkid --label config-2 || true)
if [ -z "${CONFIG_DRIVE}" ]; then
  echo "No config-2 device found, skipping network configuration"
  exit 0
fi

mount -o ro $CONFIG_DRIVE /mnt

NETWORK_DATA_FILE="/mnt/openstack/latest/network_data.json"

if [ ! -f "${NETWORK_DATA_FILE}" ]; then
  umount /mnt
  echo "No network_data.json found, skipping network configuration"
  exit 0
fi

DESIRED_HOSTNAME=$(cat /mnt/openstack/latest/meta_data.json | tr ',{}' '\n' | grep '\"metal3-name\"' | sed 's/.*\"metal3-name\": \"\(.*\)\"/\1/')
echo "${DESIRED_HOSTNAME}" > /etc/hostname

mkdir -p /tmp/nmc/{desired,generated}
cp ${NETWORK_DATA_FILE} /tmp/nmc/desired/_all.yaml
umount /mnt

./nmc generate --config-dir /tmp/nmc/desired --output-dir /tmp/nmc/generated
./nmc apply --config-dir /tmp/nmc/generated

Then make it executable and build the EIB image as normal:

mkdir -p /opt/EIB/network
chmod +x /opt/EIB/network/configure-network.sh

Note

=== EIB automatically picks up scripts from the network/ directory. Combustion runs them on first boot in initramfs, before the full OS starts. ===

Note

=== The script also sets the node hostname from Metal³'s metal3-name metadata field. === === Configuring the two secrets

The following examples use dummy values throughout: data NIC MAC aa:bb:cc:11:22:33, boot NIC MAC aa:bb:cc:44:55:66, node IP 10.0.0.10/24, gateway 10.0.0.1, DNS 10.0.0.53, VLAN ID 100, and BMC address 10.1.0.10.

Secret 1 — IPA phase (static-networkdata-ipa.yaml): references the kernel-assigned interface name. DHCP is used here to keep it simple during hardware discovery:

apiVersion: v1
kind: Secret
metadata:
  name: static-networkdata-ipa
  namespace: default
type: Opaque
stringData:
  networkData: |
    interfaces:
    - name: enp1s0np123
      type: ethernet
      state: up
      mac-address: "aa:bb:cc:11:22:33"
      ipv4:
        enabled: true
        dhcp: true
    dns-resolver:
      config:
        server:
        - 10.0.0.53

Secret 2 — target OS phase (static-networkdata-os.yaml): references the desired short name and declares the VLAN. The same MAC address is used so nmc can match the interface:

apiVersion: v1
kind: Secret
metadata:
  name: static-networkdata-os
  namespace: default
type: Opaque
stringData:
  networkData: |
    interfaces:
    - name: eth0
      type: ethernet
      state: up
      mac-address: "aa:bb:cc:11:22:33"
      mtu: 1500
      ipv4:
        enabled: false
        dhcp: false
    - name: eth0.100
      type: vlan
      state: up
      mtu: 1500
      vlan:
        base-iface: eth0
        id: 100
      ipv4:
        address:
        - ip: 10.0.0.10
          prefix-length: 24
        enabled: true
        dhcp: false
    dns-resolver:
      config:
        server:
        - 10.0.0.53
    routes:
      config:
      - destination: 0.0.0.0/0
        next-hop-address: 10.0.0.1
        next-hop-interface: eth0.100

The BareMetalHost object references both secrets:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: my-node
  namespace: default
spec:
  online: true
  bootMACAddress: "aa:bb:cc:44:55:66"
  rootDeviceHints:
    deviceName: /dev/nvme0n1
  bmc:
    address: redfish-virtualmedia://10.1.0.10/redfish/v1/Systems/1/
    disableCertificateVerification: true
    credentialsName: my-node-credentials
  preprovisioningNetworkDataName: static-networkdata-ipa
  networkData:
    name: static-networkdata-os

Warning

preprovisioningNetworkDataName is a plain string field, while networkData is a SecretReference object requiring a name: sub-key. The syntax differs between the two and is a common source of errors.

Apply all objects:

kubectl apply -f bmc-credentials.yaml
kubectl apply -f static-networkdata-ipa.yaml
kubectl apply -f static-networkdata-os.yaml
kubectl apply -f baremetalhost.yaml

After provisioning, SSH to the node and verify:

# Interface names
ip link show
# Expected: eth0 and eth0.100@eth0

# IP on VLAN interface
ip addr show eth0.100

# NetworkManager profiles
nmcli connection show

# VLAN details
nmcli connection show eth0.100 | grep -E '(vlan.parent|vlan.id)'

26 Metal3 #

26.1 BareMetalHost selection and Cluster association #

26.2 Clean up old EFI boot entries #

26.3 Custom network configuration using the two-secrets approach #

26.3.1 Example of interface renaming for VLANs #

26.3.2 Prerequisites: #

26.3.2.1 EIB image setup #

26 Metal³ #

26.1 `BareMetalHost` selection and Cluster association #