This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.

Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
SUSE Telco Cloud Documentation|Telco features configuration|SR-IOV

39 SR-IOV

SR-IOV (Single Root I/O Virtualization) allows a single physical device, such as a network adapter, to separate its resources across multiple PCIe hardware functions. This enables direct resource access for various applications.

We provide two distinct methods for deploying SR-IOV in your cluster:

In rare cases where you need both solutions - using the Network Operator for network devices and the Device Plugin for vRAN Accelerators - you must deploy them into separate Kubernetes namespaces. This separation is essential to prevent conflicts between two deployments.

39.1 Option 1: SR-IOV Network Device Plugin Daemonset and configMap

SR-IOV Network Device Plugin discovers and advertises network resources, such as PCI physical functions (PFs), and their virtual functions (VFs), on a Kubernetes host.

  • Prepare the config map for the device plugin

We need to create a config map that defines SR-IOV resource pools. Run lspci command to retrieve the information:

$ lspci | grep -i acc
07:00.0 Processing accelerators: Intel Corporation Device 57c2
07:00.1 Processing accelerators: Intel Corporation Device 57c3
07:00.2 Processing accelerators: Intel Corporation Device 57c3
07:00.3 Processing accelerators: Intel Corporation Device 57c3
07:00.4 Processing accelerators: Intel Corporation Device 57c3
07:00.5 Processing accelerators: Intel Corporation Device 57c3
07:00.6 Processing accelerators: Intel Corporation Device 57c3
07:00.7 Processing accelerators: Intel Corporation Device 57c3
07:01.0 Processing accelerators: Intel Corporation Device 57c3
07:01.1 Processing accelerators: Intel Corporation Device 57c3
07:01.2 Processing accelerators: Intel Corporation Device 57c3
07:01.3 Processing accelerators: Intel Corporation Device 57c3
07:01.4 Processing accelerators: Intel Corporation Device 57c3
07:01.5 Processing accelerators: Intel Corporation Device 57c3
07:01.6 Processing accelerators: Intel Corporation Device 57c3
07:01.7 Processing accelerators: Intel Corporation Device 57c3
07:02.0 Processing accelerators: Intel Corporation Device 57c3
0a:00.0 Processing accelerators: Intel Corporation Device 57c2

$ lspci | grep -i net
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.2 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.3 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
51:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
51:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
51:01.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.3 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.3 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)

The SR-IOV Device Plugin uses a configMap containing a JSON file to define which hardware resources Kubernetes should expose. This configuration is based on two core concepts: selectors (for hardware discovery) and resources (for Kubernetes exposure).

A resource is the named entity that pods consume (e.g. rancher.io/intel_fec_5g). Resources can be defined as one of two types:

  • accelerator: Used for vRAN accelerator cards (like ACC100/vRAN Boost).

  • netdevice: Used for standard network interfaces (NICs).

You define the target devices using selectors to filter the hardware on the node:

  • vendors: 8086 (Intel)

  • devices: 57c3 (FEC VF), 1889 (NIC VF)

  • drivers: vfio-pci

  • pfNames: p2p1 (physical interface name)

For network cards, you can also select a subset of Virtual Functions (VFs) from a Physical Function:

  • pfNames: ["eth1#1,2,3,4,5,6"] or [eth1#1-6]

To allow pods to request the devices, each resource must have a name, which is composed of a prefix and a name:

  • resourceName: pci_sriov_net_bh_dpdk

  • resourcePrefix: rancher.io

Pods would then request the combined resource name: rancher.io/pci_sriov_net_bh_dpdk .

Note
Note

This document does not list all possible selectors. Different resource types use different sets of selectors. For comprehensive details, refer to the SR-IOV Network Device Plugin repository.

The ConfigMap below is an example that creates three resources: one for the vRAN Accelerator card (FEC) and two for two different NIC ports.

For FEC card, you must first retrieve the device ID and VFIO token. Follow the instructions in Chapter 41, vRAN Acceleration (Intel ACC100/VRB1/VRB2) chapter for prerequisites.

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
        "resourceList": [
            {
            	"resourcePrefix": "rancher.io",
                "resourceName": "intel_fec_5g",
                "deviceType": "accelerator",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["57c3"]
                },
        		"additionalInfo": {
          			"*": {
            			"VFIO_TOKEN": "00112233-4455-6677-8899-aabbccddeeff"
          			}
          		}
            },
            {
            	"resourcePrefix": "rancher.io",
                "resourceName": "intel_sriov_odu",
                "deviceType": "netdevice",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["1889"],
                    "drivers": ["vfio-pci"],
                    "pfNames": ["p2p1"]
                }
            },
            {
            	"resourcePrefix": "rancher.io",
                "resourceName": "intel_sriov_oru",
                "deviceType": "netdevice",
                "selectors": {
                    "vendors": ["8086"],
                    "devices": ["1889"],
                    "drivers": ["vfio-pci"],
                    "pfNames": ["p2p2"]
                }
            }
        ]
    }
  • Prepare the daemonset file to deploy the device plugin.

The device plugin supports several architectures (arm, amd, ppc64le), so the same file can be used for different architectures by deploying several daemonset for each architecture.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sriov-device-plugin
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-sriov-device-plugin-amd64
  namespace: kube-system
  labels:
    tier: node
    app: sriovdp
spec:
  selector:
    matchLabels:
      name: sriov-device-plugin
  template:
    metadata:
      labels:
        name: sriov-device-plugin
        tier: node
        app: sriovdp
    spec:
      hostNetwork: true
      nodeSelector:
        kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: sriov-device-plugin
      containers:
      - name: kube-sriovdp
        image: registry.suse.com/rancher/hardened-sriov-network-device-plugin:v3.9.0-build20250425
        imagePullPolicy: IfNotPresent
        args:
        - --log-dir=sriovdp
        - --log-level=10
        securityContext:
          privileged: true
        resources:
          requests:
            cpu: "250m"
            memory: "40Mi"
          limits:
            cpu: 1
            memory: "200Mi"
        volumeMounts:
        - name: devicesock
          mountPath: /var/lib/kubelet/
          readOnly: false
        - name: log
          mountPath: /var/log
        - name: config-volume
          mountPath: /etc/pcidp
        - name: device-info
          mountPath: /var/run/k8s.cni.cncf.io/devinfo/dp
      volumes:
        - name: devicesock
          hostPath:
            path: /var/lib/kubelet/
        - name: log
          hostPath:
            path: /var/log
        - name: device-info
          hostPath:
            path: /var/run/k8s.cni.cncf.io/devinfo/dp
            type: DirectoryOrCreate
        - name: config-volume
          configMap:
            name: sriovdp-config
            items:
            - key: config.json
              path: config.json
  • After applying the configMap and the daemonset, the device plugin will be deployed and the interfaces will be discovered and available for the pods.

    $ kubectl get pods -n kube-system | grep sriov
    kube-system  kube-sriov-device-plugin-amd64-twjfl  1/1  Running  0  2m
  • Verify all nodes if interfaces were discovered and became available for the pods:

    $ kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, allocatable: .status.allocatable}'
    {
      "name": "node1.suse.edge.com",
      "allocatable": {
    	  "cpu": "64",
    	  "ephemeral-storage": "256196109726",
    	  "hugepages-1Gi": "40Gi",
    	  "hugepages-2Mi": "0",
    	  "rancher.io/intel_fec_5g": "16",
    	  "rancher.io/intel_sriov_odu": "4",
    	  "rancher.io/intel_sriov_oru": "4",
    	  "memory": "221396384Ki",
    	  "pods": "110"
      }
    }
  • The resourceName for FEC accelerator is rancher.io/intel_fec_5g and 16 VFs are available for use.

  • The resourceName for NIC cards are rancher.io/intel_sriov_odu and rancher.io/intel_sriov_oru. Each resource provides 4 VFs.

Important
Important

If no interfaces are detected as allocatable resources in the kubernetes nodes, it is essential to resolve this issue. One common cause is ill-formed configMap spec, so better review the configMap and its selectors.

39.2 Option 2 (Recommended): SR-IOV Network Operator

  • Get Helm if not present:

$ curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
  • Install SR-IOV Network Operator on sriov-network-operator namespace:

helm install sriov-crd oci://registry.suse.com/edge/charts/sriov-crd -n sriov-network-operator
helm install sriov-network-operator oci://registry.suse.com/edge/charts/sriov-network-operator -n sriov-network-operator
  • Check the deployed CRDs and pods:

$ kubectl get crd
$ kubectl -n sriov-network-operator get pods
  • Check if SR-IOV label is applied to the nodes.

With all resources running, the label appears automatically in your node:

$ kubectl get nodes -oyaml | grep feature.node.kubernetes.io/network-sriov.capable

feature.node.kubernetes.io/network-sriov.capable: "true"
  • Review the daemonset to see the new sriov-network-config-daemon and sriov-rancher-nfd-worker as active and ready:

$ kubectl get daemonset -n sriov-network-operator
NAMESPACE             NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                           AGE
sriov-network-operator   sriov-network-config-daemon     1         1         1       1            1           feature.node.kubernetes.io/network-sriov.capable=true   45m
sriov-network-operator   sriov-rancher-nfd-worker        1         1         1       1            1           <none>                                                  45m

In a few minutes, the nodes will be detected and fully configured with SR-IOV capabilities. The update can sometimes take up to 10 minutes:

$ kubectl get sriovnetworknodestates -A
NAMESPACE             NAME     AGE
sriov-network-operator   xr11-2   83s
  • Check if the interfaces were detected.

The interfaces discovered should be the PCI address of the network device. Check this information with the lspci command in the host.

$ kubectl get sriovnetworknodestates -n sriov-network-operator -oyaml
apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
  metadata:
    creationTimestamp: "2023-06-07T09:52:37Z"
    generation: 1
    name: xr11-2
    namespace: sriov-network-operator
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: 80b72499-e26b-4072-a75c-f9a6218ec357
    resourceVersion: "356603"
    uid: e1f1654b-92b3-44d9-9f87-2571792cc1ad
  spec:
    dpConfigVersion: "356507"
  status:
    interfaces:
    - deviceID: "1592"
      driver: ice
      eSwitchMode: legacy
      linkType: ETH
      mac: 40:a6:b7:9b:35:f0
      mtu: 1500
      name: p2p1
      pciAddress: "0000:51:00.0"
      totalvfs: 128
      vendor: "8086"
    - deviceID: "1592"
      driver: ice
      eSwitchMode: legacy
      linkType: ETH
      mac: 40:a6:b7:9b:35:f1
      mtu: 1500
      name: p2p2
      pciAddress: "0000:51:00.1"
      totalvfs: 128
      vendor: "8086"
    syncStatus: Succeeded
kind: List
metadata:
  resourceVersion: ""
Note
Note

If your interface is not detected here, ensure that it is present in the next config map:

$ kubectl get cm supported-nic-ids -oyaml -n sriov-network-operator

If your device is not listed, edit the config map by adding the right values to be discovered. Then restart the sriov-network-config-daemon pods on each node for update to take effect.

  • Create the SriovNetworkNodePolicy to configure the VFs

This policy creates the resource intelnicsDpdk for pod consumption. It also binds vfio-pci driver to the provided PCI device and creates 8 VFs with an MTU size of 1500:

Note
Note

The resourceName field must not contain any special characters and must be unique across the cluster. The example uses the deviceType: vfio-pci because DPDK is used in combination with SR-IOV. If you don’t use DPDK, configure deviceType: netdevice (default value).

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-dpdk
  namespace: sriov-network-operator
spec:
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  resourceName: intelnicsDpdk
  deviceType: vfio-pci
  numVfs: 8
  mtu: 1500
  nicSelector:
    deviceID: "1592"
    vendor: "8086"
    rootDevices:
    - 0000:51:00.0
  • Validate configurations on all nodes:

With the predefined resourcePrefix rancher.io, a resource rancher.io/intelnicsDpdk with 8 VFs should be discovered.

$ kubectl get nodes -o jsonpath='{"items": [ { "name": @.metadata.name, "allocatable": @.status.allocatable } ]}' | jq
{
  "name": "node1.suse.edge.com",
  "allocatable": {
	  "cpu": "64",
	  "ephemeral-storage": "256196109726",
	  "hugepages-1Gi": "60Gi",
	  "hugepages-2Mi": "0",
	  "rancher.io/intel_fec_5g": "16",
	  "memory": "200424836Ki",
	  "pods": "110",
	  "rancher.io/intelnicsDpdk": "8"
  }
}
  • (Optional) Create the sriovnetwork

This step is optional and only required for custom network definitions. Specify the resourceName to bind to the previously created node policy.

If the networkNamespace is set, the network is exposed to pods in that namespace. Otherwise, the network becomes available in the Network Operator’s installation namespace.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: network-dpdk
  namespace: sriov-network-operator		# where SRIOV Operator is installed
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "192.168.0.0/24",
      "rangeStart": "192.168.0.20",
      "rangeEnd": "192.168.0.60",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "192.168.0.1"
    }
  vlan: 500
  resourceName: intelnicsDpdk
  networkNamespace: default 		# where workloads are deployed
  • If the update is successful, a NetworkAttachmentDefinition (NAD) is created in target cluster.

$ kubectl get net-attach-def -A -oyaml

apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
  kind: NetworkAttachmentDefinition
  metadata:
    annotations:
      k8s.v1.cni.cncf.io/resourceName: rancher.io/intelnicsDpdk
    creationTimestamp: "2023-06-08T11:22:27Z"
    generation: 1
    name: network-dpdk
    namespace: default
    resourceVersion: "13124"
    uid: df7c89f5-177c-4f30-ae72-7aef3294fb15
  spec:
    config: '{ "cniVersion":"0.4.0", "name":"network-dpdk","type":"sriov","vlan":500,"vlanQoS":0,"ipam":{"type":"host-local","subnet":"192.168.0.0/24","rangeStart":"192.168.0.10","rangeEnd":"192.168.0.60","routes":[{"dst":"0.0.0.0/0"}],"gateway":"192.168.0.1"}
      }'
kind: List
metadata:
  resourceVersion: ""

The workload pods could use the resourceName rancher.io/intelnicsDpdk to use the VFs of the network interface.