36 CPU Pinning on Host #
CPU pinning, also known as processor affinity, is the technique of binding a process or thread to a specific CPU core, preventing the operating system’s scheduler from moving it. By ensuring a process always runs on the same core, it benefits from faster access to data that remains in that core’s cache memory. This practice is common in high-performance computing environments because it dramatically improves performance and reduces overhead.
36.1 Isolating CPUs via TuneD #
tuned is a system tuning tool that monitors system conditions to optimize performance using various predefined profiles. A key feature is its ability to isolate CPU cores for specific workloads, like real-time applications. This prevents the OS from utilizing these cores and potentially increasing latency.
To enable and configure this feature, the first thing is to create a profile for the CPU cores we want to isolate. In this example, among 64 cores, we dedicate 60 cores (1-30,33-62) for the application and remaining 4 cores are used for housekeeping. Note that the design of isolated CPUs heavily depends on the real-time applications.
$ echo "export tuned_params" >> /etc/grub.d/00_tuned
$ echo "isolated_cores=1-30,33-62" >> /etc/tuned/cpu-partitioning-variables.conf
$ tuned-adm profile cpu-partitioning
Tuned (re)started, changes applied.36.2 Isolating CPUs via kernel arguments #
Then we need to modify the GRUB option to isolate CPU cores and other important parameters for CPU usage. The following options are important to be customized with your current hardware specifications:
| parameter | value | description |
|---|---|---|
isolcpus | domain,nohz,managed_irq,1-30,33-62 | Isolate the cores 1-30 and 33-62. |
skew_tick | 1 | This option allows the kernel to skew the timer interrupts across the isolated CPUs. |
nohz | on | When enabled, kernel’s periodic timer interrupt (the 'tick') will stop on any CPU core that is idle. This primary benefits the housekeeping CPUs ( |
nohz_full | 1-30,33-62 | For the isolated cores, this stops the tick and it does so even when the CPU is running a single active task. It means it makes the CPU run in full tickless mode (or 'dyntick'). The kernel will only deliver timer interrupts when they are actually needed. |
rcu_nocbs | 1-30,33-62 | This option offloads the RCU callback processing from specified CPU cores. |
rcu_nocb_poll | When this option is set, no-RCU-callback CPUs will regularly 'poll' to see if callback handling is required, rather than being explicitly woken up by other CPUs. This can reduce the interrupt overhead. | |
irqaffinity | 0,31,32,63 | This option allows the kernel to run the interrupts to the housekeeping cores. |
idle | poll | This minimizes the latency of exiting the idle state, but at the cost of keeping the CPU running at full speed in the idle thread. |
nmi_watchdog | 0 | This option disables only the NMI watchdog. This can be omitted when |
nowatchdog | This option disables the soft-lockup watchdog which is implemented as a timer running in the timer hard-interrupt context. |
The following commands modify the GRUB configuration and apply the changes mentioned above to be present on the next boot:
Edit the /etc/default/grub file with above parameters and the file will look like this:
GRUB_CMDLINE_LINUX="BOOT_IMAGE=/boot/vmlinuz-6.4.0-9-rt root=UUID=77b713de-5cc7-4d4c-8fc6-f5eca0a43cf9 skew_tick=1 rd.timeout=60 rd.retry=45 console=ttyS1,115200 console=tty0 default_hugepagesz=1G hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=0 ignition.platform.id=openstack net.ifnames=1 intel_iommu=on iommu=pt irqaffinity=0,31,32,63 isolcpus=domain,nohz,managed_irq,1-30,33-62 nohz_full=1-30,33-62 nohz=on mce=off nosoftlockup nowatchdog nmi_watchdog=0 quiet rcu_nocb_poll rcu_nocbs=1-30,33-62 rcupdate.rcu_cpu_stall_suppress=1 rcupdate.rcu_expedited=1 rcupdate.rcu_normal_after_boot=1 rcupdate.rcu_task_stall_timeout=0 rcutree.kthread_prio=99 security=selinux selinux=1 idle=poll"Update the GRUB configuration:
$ transactional-update grub.cfg
$ rebootTo validate that the parameters are applied after the reboot, the following command can be used to check the kernel command line:
$ cat /proc/cmdlineThere is another script that can be used to tune the CPU configuration, which basically is doing the following steps:
Set the CPU governor to
performance.Unset the timer migration to the isolated CPUs.
Migrate the kdaemon threads to the housekeeping CPUs.
Set the isolated CPUs latency to the lowest possible value.
Delay the vmstat updates to 300 seconds.
The script is available at SUSE Telco Cloud Examples repository.