39 SR-IOV #
SR-IOV (Single Root I/O Virtualization) allows a single physical device, such as a network adapter, to separate its resources across multiple PCIe hardware functions. This enables direct resource access for various applications.
We provide two distinct methods for deploying SR-IOV in your cluster:
Section 39.1, “Option 1: SR-IOV Network Device Plugin Daemonset and configMap”: This method supports both network devices and vRAN accelerator.
Section 39.2, “Option 2 (Recommended): SR-IOV Network Operator”: This automated method provides simpler deployment. This method is only for network devices.
In rare cases where you need both solutions - using the Network Operator for network devices and the Device Plugin for vRAN Accelerators - you must deploy them into separate Kubernetes namespaces. This separation is essential to prevent conflicts between two deployments.
39.1 Option 1: SR-IOV Network Device Plugin Daemonset and configMap #
SR-IOV Network Device Plugin discovers and advertises network resources, such as PCI physical functions (PFs), and their virtual functions (VFs), on a Kubernetes host.
Prepare the config map for the device plugin
We need to create a config map that defines SR-IOV resource pools. Run lspci command to retrieve the information:
$ lspci | grep -i acc
07:00.0 Processing accelerators: Intel Corporation Device 57c2
07:00.1 Processing accelerators: Intel Corporation Device 57c3
07:00.2 Processing accelerators: Intel Corporation Device 57c3
07:00.3 Processing accelerators: Intel Corporation Device 57c3
07:00.4 Processing accelerators: Intel Corporation Device 57c3
07:00.5 Processing accelerators: Intel Corporation Device 57c3
07:00.6 Processing accelerators: Intel Corporation Device 57c3
07:00.7 Processing accelerators: Intel Corporation Device 57c3
07:01.0 Processing accelerators: Intel Corporation Device 57c3
07:01.1 Processing accelerators: Intel Corporation Device 57c3
07:01.2 Processing accelerators: Intel Corporation Device 57c3
07:01.3 Processing accelerators: Intel Corporation Device 57c3
07:01.4 Processing accelerators: Intel Corporation Device 57c3
07:01.5 Processing accelerators: Intel Corporation Device 57c3
07:01.6 Processing accelerators: Intel Corporation Device 57c3
07:01.7 Processing accelerators: Intel Corporation Device 57c3
07:02.0 Processing accelerators: Intel Corporation Device 57c3
0a:00.0 Processing accelerators: Intel Corporation Device 57c2
$ lspci | grep -i net
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.2 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
19:00.3 Ethernet controller: Broadcom Inc. and subsidiaries BCM57504 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 11)
51:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
51:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
51:01.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:01.3 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.2 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
51:11.3 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)The SR-IOV Device Plugin uses a configMap containing a JSON file to define which hardware resources Kubernetes should expose. This configuration is based on two core concepts: selectors (for hardware discovery) and resources (for Kubernetes exposure).
A resource is the named entity that pods consume (e.g. rancher.io/intel_fec_5g). Resources can be defined as one of two types:
accelerator: Used for vRAN accelerator cards (like ACC100/vRAN Boost).netdevice: Used for standard network interfaces (NICs).
You define the target devices using selectors to filter the hardware on the node:
vendors:8086(Intel)devices:57c3(FEC VF),1889(NIC VF)drivers:vfio-pcipfNames:p2p1(physical interface name)
For network cards, you can also select a subset of Virtual Functions (VFs) from a Physical Function:
pfNames:["eth1#1,2,3,4,5,6"]or[eth1#1-6]
To allow pods to request the devices, each resource must have a name, which is composed of a prefix and a name:
resourceName:pci_sriov_net_bh_dpdkresourcePrefix:rancher.io
Pods would then request the combined resource name: rancher.io/pci_sriov_net_bh_dpdk .
This document does not list all possible selectors. Different resource types use different sets of selectors. For comprehensive details, refer to the SR-IOV Network Device Plugin repository.
The ConfigMap below is an example that creates three resources: one for the vRAN Accelerator card (FEC) and two for two different NIC ports.
For FEC card, you must first retrieve the device ID and VFIO token. Follow the instructions in Chapter 41, vRAN Acceleration (Intel ACC100/VRB1/VRB2) chapter for prerequisites.
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [
{
"resourcePrefix": "rancher.io",
"resourceName": "intel_fec_5g",
"deviceType": "accelerator",
"selectors": {
"vendors": ["8086"],
"devices": ["57c3"]
},
"additionalInfo": {
"*": {
"VFIO_TOKEN": "00112233-4455-6677-8899-aabbccddeeff"
}
}
},
{
"resourcePrefix": "rancher.io",
"resourceName": "intel_sriov_odu",
"deviceType": "netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["1889"],
"drivers": ["vfio-pci"],
"pfNames": ["p2p1"]
}
},
{
"resourcePrefix": "rancher.io",
"resourceName": "intel_sriov_oru",
"deviceType": "netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["1889"],
"drivers": ["vfio-pci"],
"pfNames": ["p2p2"]
}
}
]
}Prepare the
daemonsetfile to deploy the device plugin.
The device plugin supports several architectures (arm, amd, ppc64le), so the same file can be used for different architectures by deploying several daemonset for each architecture.
apiVersion: v1
kind: ServiceAccount
metadata:
name: sriov-device-plugin
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-amd64
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
nodeSelector:
kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: registry.suse.com/rancher/hardened-sriov-network-device-plugin:v3.9.0-build20250425
imagePullPolicy: IfNotPresent
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
resources:
requests:
cpu: "250m"
memory: "40Mi"
limits:
cpu: 1
memory: "200Mi"
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
- name: device-info
mountPath: /var/run/k8s.cni.cncf.io/devinfo/dp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: device-info
hostPath:
path: /var/run/k8s.cni.cncf.io/devinfo/dp
type: DirectoryOrCreate
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.jsonAfter applying the configMap and the
daemonset, the device plugin will be deployed and the interfaces will be discovered and available for the pods.$ kubectl get pods -n kube-system | grep sriov kube-system kube-sriov-device-plugin-amd64-twjfl 1/1 Running 0 2mVerify all nodes if interfaces were discovered and became available for the pods:
$ kubectl get nodes -o json | jq '.items[] | {name: .metadata.name, allocatable: .status.allocatable}' { "name": "node1.suse.edge.com", "allocatable": { "cpu": "64", "ephemeral-storage": "256196109726", "hugepages-1Gi": "40Gi", "hugepages-2Mi": "0", "rancher.io/intel_fec_5g": "16", "rancher.io/intel_sriov_odu": "4", "rancher.io/intel_sriov_oru": "4", "memory": "221396384Ki", "pods": "110" } }The resourceName for
FECaccelerator israncher.io/intel_fec_5gand 16 VFs are available for use.The resourceName for NIC cards are
rancher.io/intel_sriov_oduandrancher.io/intel_sriov_oru. Each resource provides 4 VFs.
If no interfaces are detected as allocatable resources in the kubernetes nodes, it is essential to resolve this issue. One common cause is ill-formed configMap spec, so better review the configMap and its selectors.
39.2 Option 2 (Recommended): SR-IOV Network Operator #
Get Helm if not present:
$ curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bashInstall SR-IOV Network Operator on
sriov-network-operatornamespace:
helm install sriov-crd oci://registry.suse.com/edge/charts/sriov-crd -n sriov-network-operator
helm install sriov-network-operator oci://registry.suse.com/edge/charts/sriov-network-operator -n sriov-network-operatorCheck the deployed CRDs and pods:
$ kubectl get crd
$ kubectl -n sriov-network-operator get podsCheck if SR-IOV label is applied to the nodes.
With all resources running, the label appears automatically in your node:
$ kubectl get nodes -oyaml | grep feature.node.kubernetes.io/network-sriov.capable
feature.node.kubernetes.io/network-sriov.capable: "true"Review the
daemonsetto see the newsriov-network-config-daemonandsriov-rancher-nfd-workeras active and ready:
$ kubectl get daemonset -n sriov-network-operator
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
sriov-network-operator sriov-network-config-daemon 1 1 1 1 1 feature.node.kubernetes.io/network-sriov.capable=true 45m
sriov-network-operator sriov-rancher-nfd-worker 1 1 1 1 1 <none> 45mIn a few minutes, the nodes will be detected and fully configured with SR-IOV capabilities. The update can sometimes take up to 10 minutes:
$ kubectl get sriovnetworknodestates -A
NAMESPACE NAME AGE
sriov-network-operator xr11-2 83sCheck if the interfaces were detected.
The interfaces discovered should be the PCI address of the network device. Check this information with the lspci command in the host.
$ kubectl get sriovnetworknodestates -n sriov-network-operator -oyaml
apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
creationTimestamp: "2023-06-07T09:52:37Z"
generation: 1
name: xr11-2
namespace: sriov-network-operator
ownerReferences:
- apiVersion: sriovnetwork.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: SriovNetworkNodePolicy
name: default
uid: 80b72499-e26b-4072-a75c-f9a6218ec357
resourceVersion: "356603"
uid: e1f1654b-92b3-44d9-9f87-2571792cc1ad
spec:
dpConfigVersion: "356507"
status:
interfaces:
- deviceID: "1592"
driver: ice
eSwitchMode: legacy
linkType: ETH
mac: 40:a6:b7:9b:35:f0
mtu: 1500
name: p2p1
pciAddress: "0000:51:00.0"
totalvfs: 128
vendor: "8086"
- deviceID: "1592"
driver: ice
eSwitchMode: legacy
linkType: ETH
mac: 40:a6:b7:9b:35:f1
mtu: 1500
name: p2p2
pciAddress: "0000:51:00.1"
totalvfs: 128
vendor: "8086"
syncStatus: Succeeded
kind: List
metadata:
resourceVersion: ""If your interface is not detected here, ensure that it is present in the next config map:
$ kubectl get cm supported-nic-ids -oyaml -n sriov-network-operatorIf your device is not listed, edit the config map by adding the right values to be discovered. Then restart the sriov-network-config-daemon pods on each node for update to take effect.
Create the
SriovNetworkNodePolicyto configure theVFs
This policy creates the resource intelnicsDpdk for pod consumption. It also binds vfio-pci driver to the provided PCI device and creates 8 VFs with an MTU size of 1500:
The resourceName field must not contain any special characters and must be unique across the cluster.
The example uses the deviceType: vfio-pci because DPDK is used in combination with SR-IOV. If you don’t use DPDK, configure deviceType: netdevice (default value).
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-dpdk
namespace: sriov-network-operator
spec:
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
resourceName: intelnicsDpdk
deviceType: vfio-pci
numVfs: 8
mtu: 1500
nicSelector:
deviceID: "1592"
vendor: "8086"
rootDevices:
- 0000:51:00.0Validate configurations on all nodes:
With the predefined resourcePrefix rancher.io, a resource rancher.io/intelnicsDpdk with 8 VFs should be discovered.
$ kubectl get nodes -o jsonpath='{"items": [ { "name": @.metadata.name, "allocatable": @.status.allocatable } ]}' | jq
{
"name": "node1.suse.edge.com",
"allocatable": {
"cpu": "64",
"ephemeral-storage": "256196109726",
"hugepages-1Gi": "60Gi",
"hugepages-2Mi": "0",
"rancher.io/intel_fec_5g": "16",
"memory": "200424836Ki",
"pods": "110",
"rancher.io/intelnicsDpdk": "8"
}
}(Optional) Create the
sriovnetwork
This step is optional and only required for custom network definitions. Specify the resourceName to bind to the previously created node policy.
If the networkNamespace is set, the network is exposed to pods in that namespace. Otherwise, the network becomes available in the Network Operator’s installation namespace.
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: network-dpdk
namespace: sriov-network-operator # where SRIOV Operator is installed
spec:
ipam: |
{
"type": "host-local",
"subnet": "192.168.0.0/24",
"rangeStart": "192.168.0.20",
"rangeEnd": "192.168.0.60",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "192.168.0.1"
}
vlan: 500
resourceName: intelnicsDpdk
networkNamespace: default # where workloads are deployedIf the update is successful, a NetworkAttachmentDefinition (NAD) is created in target cluster.
$ kubectl get net-attach-def -A -oyaml
apiVersion: v1
items:
- apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: rancher.io/intelnicsDpdk
creationTimestamp: "2023-06-08T11:22:27Z"
generation: 1
name: network-dpdk
namespace: default
resourceVersion: "13124"
uid: df7c89f5-177c-4f30-ae72-7aef3294fb15
spec:
config: '{ "cniVersion":"0.4.0", "name":"network-dpdk","type":"sriov","vlan":500,"vlanQoS":0,"ipam":{"type":"host-local","subnet":"192.168.0.0/24","rangeStart":"192.168.0.10","rangeEnd":"192.168.0.60","routes":[{"dst":"0.0.0.0/0"}],"gateway":"192.168.0.1"}
}'
kind: List
metadata:
resourceVersion: ""The workload pods could use the resourceName rancher.io/intelnicsDpdk to use the VFs of the network interface.