This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
SUSE Telco Cloud Documentation|Fully automated directed network provisioning|Telco features (DPDK, SR-IOV, CPU isolation, huge pages, NUMA, etc.)

54 Telco features (DPDK, SR-IOV, CPU isolation, huge pages, NUMA, etc.)

The directed network provisioning workflow allows to automate the Telco features to be used in the downstream clusters to run Telco workloads on top of those servers.

Requirements

Configuration

Use the following two sections as the base to enroll and provision the hosts:

The Telco features covered in this section are the following:

  • DPDK and VFs creation

  • SR-IOV and VFs allocation to be used by the workloads

  • CPU isolation and performance tuning

  • Huge pages configuration

  • Kernel parameters tuning

Note
Note

For more information about the Telco features, see Part VI, “Telco features configuration”.

The changes required to enable the Telco features shown above are all inside the RKE2ControlPlane block in the provision file capi-provisioning-example.yaml. The rest of the information inside the file capi-provisioning-example.yaml is the same as the information provided in the provisioning section (Chapter 51, Downstream cluster provisioning with Directed network provisioning (single-node)).

To make the process clear, the changes required on that block (RKE2ControlPlane) to enable the Telco features are the following:

  • The ignition file /var/lib/rancher/rke2/server/manifests/configmap-sriov-custom-auto.yaml to be used to define the interfaces, drivers and the number of VFs to be created and exposed to the workloads.

    • The values inside the config map sriov-custom-auto-config are the only values to be replaced with real values.

      • ${RESOURCE_NAME1} — The resource name to be used for the first PF interface (for example, sriov-resource-du1). It is added to the prefix rancher.io to be used as a label to be used by the workloads (for example, rancher.io/sriov-resource-du1).

      • ${SRIOV-NIC-NAME1} — The name of the first PF interface to be used (for example, eth0).

      • ${PF_NAME1} — The name of the first physical function PF to be used. Generate more complex filters using this (for example, eth0#2-5).

      • ${DRIVER_NAME1} — The driver name to be used for the first VF interface (for example, vfio-pci).

      • ${NUM_VFS1} — The number of VFs to be created for the first PF interface (for example, 8).

  • The /var/sriov-auto-filler.sh to be used as a translator between the high-level config map sriov-custom-auto-config and the sriovnetworknodepolicy which contains the low-level hardware information. This script has been created to abstract the user from the complexity to know in advance the hardware information. No changes are required in this file, but it should be present if we need to enable sr-iov and create VFs.

  • The kernel arguments to be used to enable the following features:

Parameter

Value

Description

isolcpus

domain,nohz,managed_irq,1-30,33-62

Isolate the cores 1-30 and 33-62.

skew_tick

1

Allows the kernel to skew the timer interrupts across the isolated CPUs.

nohz

on

Allows the kernel to run the timer tick on a single CPU when the system is idle.

nohz_full

1-30,33-62

kernel boot parameter is the current main interface to configure full dynticks along with CPU Isolation.

rcu_nocbs

1-30,33-62

Allows the kernel to run the RCU callbacks on a single CPU when the system is idle.

irqaffinity

0,31,32,63

Allows the kernel to run the interrupts on a single CPU when the system is idle.

idle

poll

Minimizes the latency of exiting the idle state.

iommu

pt

Allows to use vfio for the dpdk interfaces.

intel_iommu

on

Enables the use of vfio for VFs.

hugepagesz

1G

Allows to set the size of huge pages to 1 G.

hugepages

40

Number of huge pages defined before.

default_hugepagesz

1G

Default value to enable huge pages.

nowatchdog

 

Disables the watchdog.

nmi_watchdog

0

Disables the NMI watchdog.

  • The following systemd services are used to enable the following:

    • rke2-preinstall.service to replace automatically the BAREMETALHOST_UUID and node-name during the provisioning process using the Ironic information.

    • cpu-partitioning.service to enable the isolation cores of the CPU (for example, 1-30,33-62).

    • performance-settings.service to enable the CPU performance tuning.

    • sriov-custom-auto-vfs.service to install the sriov Helm chart, wait until custom resources are created and run the /var/sriov-auto-filler.sh to replace the values in the config map sriov-custom-auto-config and create the sriovnetworknodepolicy to be used by the workloads.

  • The ${RKE2_VERSION} is the version of RKE2 to be used replacing this value (for example, v1.35.3+rke2r3).

With all these changes mentioned, the RKE2ControlPlane block in the capi-provisioning-example.yaml will look like the following:

apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: RKE2ControlPlane
metadata:
  name: single-node-cluster
  namespace: default
spec:
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: Metal3MachineTemplate
    name: single-node-cluster-controlplane
  replicas: 1
  version: ${RKE2_VERSION}
  rolloutStrategy:
    type: "RollingUpdate"
    rollingUpdate:
      maxSurge: 0
  serverConfig:
    cni: calico
    cniMultusEnable: true
  agentConfig:
    format: ignition
    additionalUserData:
      config: |
        variant: fcos
        version: 1.4.0
        storage:
          files:
          - path: /var/lib/rancher/rke2/server/manifests/configmap-sriov-custom-auto.yaml
            overwrite: true
            contents:
              inline: |
                apiVersion: v1
                kind: ConfigMap
                metadata:
                  name: sriov-custom-auto-config
                  namespace: kube-system
                data:
                  config.json: |
                    [
                        {
                          "resourceName": "${RESOURCE_NAME1}",
                          "interface": "${SRIOV-NIC-NAME1}",
                          "pfname": "${PF_NAME1}",
                          "driver": "${DRIVER_NAME1}",
                          "numVFsToCreate": ${NUM_VFS1}
                        },
                        {
                          "resourceName": "${RESOURCE_NAME2}",
                          "interface": "${SRIOV-NIC-NAME2}",
                          "pfname": "${PF_NAME2}",
                          "driver": "${DRIVER_NAME2}",
                          "numVFsToCreate": ${NUM_VFS2}
                        }
                    ]
            mode: 0644
            user:
              name: root
            group:
              name: root
          - path: /var/lib/rancher/rke2/server/manifests/sriov-crd.yaml
            overwrite: true
            contents:
              inline: |
                apiVersion: helm.cattle.io/v1
                kind: HelmChart
                metadata:
                  name: sriov-crd
                  namespace: kube-system
                spec:
                  chart: oci://registry.suse.com/edge/charts/sriov-crd
                  targetNamespace: sriov-network-operator
                  version: 306.0.4+up1.6.0
                  createNamespace: true
          - path: /var/lib/rancher/rke2/server/manifests/sriov-network-operator.yaml
            overwrite: true
            contents:
              inline: |
                apiVersion: helm.cattle.io/v1
                kind: HelmChart
                metadata:
                  name: sriov-network-operator
                  namespace: kube-system
                spec:
                  chart: oci://registry.suse.com/edge/charts/sriov-network-operator
                  targetNamespace: sriov-network-operator
                  version: 306.0.4+up1.6.0
                  createNamespace: true
        kernel_arguments:
          should_exist:
            - intel_iommu=on
            - iommu=pt
            - idle=poll
            - mce=off
            - hugepagesz=1G hugepages=40
            - hugepagesz=2M hugepages=0
            - default_hugepagesz=1G
            - irqaffinity=${NON-ISOLATED_CPU_CORES}
            - isolcpus=domain,nohz,managed_irq,${ISOLATED_CPU_CORES}
            - nohz_full=${ISOLATED_CPU_CORES}
            - rcu_nocbs=${ISOLATED_CPU_CORES}
            - rcu_nocb_poll
            - nosoftlockup
            - nowatchdog
            - nohz=on
            - nmi_watchdog=0
            - skew_tick=1
            - quiet
        systemd:
          units:
          - name: rke2-preinstall.service
            enabled: true
            contents: |
              [Unit]
              Description=rke2-preinstall
              Wants=network-online.target
              Before=rke2-install.service
              ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
              [Service]
              Type=oneshot
              User=root
              ExecStartPre=/bin/sh -c "mount -L config-2 /mnt"
              ExecStart=/bin/sh -c "sed -i \"s/BAREMETALHOST_UUID/$(jq -r .uuid /mnt/openstack/latest/meta_data.json)/\" /etc/rancher/rke2/config.yaml"
              ExecStart=/bin/sh -c "echo \"node-name: $(jq -r .name /mnt/openstack/latest/meta_data.json)\" >> /etc/rancher/rke2/config.yaml"
              ExecStartPost=/bin/sh -c "umount /mnt"
              [Install]
              WantedBy=multi-user.target
          # rke2-traefik-deployment.service unit to be removed once "traefik" being the default ingress controller (starting with RKE2 v1.36)
          - name: rke2-traefik-deployment.service
            enabled: true
            contents: |
              [Unit]
              Description=rke2-traefik-deployment
              Wants=rke2-preinstall.service
              Before=rke2-install.service
              ConditionPathExists=!/run/cluster-api/bootstrap-success.complete
              [Service]
              Type=oneshot
              User=root
              ExecStart=/bin/sh -c "echo \"ingress-controller: traefik\" >> /etc/rancher/rke2/config.yaml"
              [Install]
              WantedBy=multi-user.target
          - name: cpu-partitioning.service
            enabled: true
            contents: |
              [Unit]
              Description=cpu-partitioning
              Wants=network-online.target
              After=network.target network-online.target
              [Service]
              Type=oneshot
              User=root
              ExecStart=/bin/sh -c "echo isolated_cores=${ISOLATED_CPU_CORES} > /etc/tuned/cpu-partitioning-variables.conf"
              ExecStartPost=/bin/sh -c "tuned-adm profile cpu-partitioning"
              ExecStartPost=/bin/sh -c "systemctl enable tuned.service"
              [Install]
              WantedBy=multi-user.target
          - name: performance-settings.service
            enabled: true
            contents: |
              [Unit]
              Description=performance-settings
              Wants=network-online.target
              After=network.target network-online.target cpu-partitioning.service
              [Service]
              Type=oneshot
              User=root
              ExecStart=/bin/sh -c "/opt/performance-settings/performance-settings.sh"
              [Install]
              WantedBy=multi-user.target
          - name: sriov-custom-auto-vfs.service
            enabled: true
            contents: |
              [Unit]
              Description=SRIOV Custom Auto VF Creation
              Wants=network-online.target  rke2-server.target
              After=network.target network-online.target rke2-server.target
              [Service]
              User=root
              Type=forking
              TimeoutStartSec=900
              ExecStart=/bin/sh -c "while ! /var/lib/rancher/rke2/bin/kubectl --kubeconfig=/etc/rancher/rke2/rke2.yaml wait --for condition=ready nodes --all ; do sleep 2 ; done"
              ExecStartPost=/bin/sh -c "while [ $(/var/lib/rancher/rke2/bin/kubectl --kubeconfig=/etc/rancher/rke2/rke2.yaml get sriovnetworknodestates.sriovnetwork.openshift.io --ignore-not-found --no-headers -A | wc -l) -eq 0 ]; do sleep 1; done"
              ExecStartPost=/bin/sh -c "/opt/sriov/sriov-auto-filler.sh"
              RemainAfterExit=yes
              KillMode=process
              [Install]
              WantedBy=multi-user.target
    kubelet:
      extraArgs:
      - provider-id=metal3://BAREMETALHOST_UUID
    nodeName: "localhost.localdomain"

Once the file is created by joining the previous blocks, the following command must be executed in the management cluster to start provisioning the new downstream cluster using the Telco features:

$ kubectl apply -f capi-provisioning-example.yaml