1.. _rt_perf_tips_rtvm: 2 3ACRN Real-Time VM Performance Tips 4################################## 5 6Background 7********** 8 9The ACRN real-time VM (RTVM) is a special type of ACRN post-launched VM. 10This document shows how you can configure RTVMs to potentially achieve 11near bare-metal performance by configuring certain key technologies and 12eliminating use of a VM-exit within RT tasks, thereby avoiding this 13common virtualization overhead issue. 14 15Neighbor VMs such as Service VMs, Human-Machine-Interface (HMI) VMs, or 16other real-time VMs, may negatively affect the execution of real-time 17tasks on an RTVM. This document also shows technologies used to isolate 18potential runtime noise from neighbor VMs. 19 20Here are some key technologies that can significantly improve 21RTVM performance: 22 23- LAPIC passthrough with core partitioning. 24- PCIe Device Passthrough: Only MSI interrupt-capable PCI devices are 25 supported for the RTVM. 26- Enable CAT (Cache Allocation Technology)-based cache isolation: RTVM uses 27 a dedicated CLOS (Class of Service). While others may share CLOS, the GPU 28 uses a CLOS that will not overlap with the RTVM CLOS. 29- PMD virtio: Both virtio BE and FE work in polling mode so 30 interrupts and notification between the Service VM and RTVM are not needed. 31 All RTVM guest memory is hidden from the Service VM except for the virtio 32 queue memory. 33 34This document summarizes tips from issues encountered and 35resolved during real-time development and performance tuning. 36 37Mandatory Options for an RTVM 38***************************** 39 40An RTVM is a post-launched VM with LAPIC passthrough. Pay attention to 41these options when you launch an ACRN RTVM: 42 43Tip: Apply the acrn-dm option ``--lapic_pt`` 44 The LAPIC passthrough feature of ACRN is configured via the 45 ``--lapic_pt`` option, but the feature is actually enabled when LAPIC is 46 switched to X2APIC mode. Both conditions should be met to enable an 47 RTVM. The ``--rtvm`` option will be automatically attached once 48 ``--lapic_pt`` is applied. 49 50Tip: Use virtio polling mode 51 Polling mode prevents the frontend of the VM-exit from sending a 52 notification to the backend. We recommend that you passthrough a 53 physical peripheral device (such as block or an Ethernet device), to an 54 RTVM. If no physical device is available, ACRN supports virtio devices 55 and enables polling mode to avoid a VM-exit at the frontend. Enable 56 virtio polling mode via the option ``--virtio_poll [polling interval]``. 57 58Avoid VM-exit Latency 59********************* 60 61VM-exit has a significant negative impact on virtualization performance. 62A single VM-exit causes several micro-seconds or longer latency, 63depending on what's done in VMX-root mode. VM-exit is classified into two 64types: triggered by external CPU events or triggered by operations initiated 65by the vCPU. 66 67ACRN eliminates almost all VM-exits triggered by external events by 68using LAPIC passthrough. A few exceptions exist: 69 70- SMI - This brings the processor into the SMM, causing a much longer 71 performance impact. The SMI should be handled in the BIOS. 72 73- NMI - ACRN uses NMI for system-level notification. 74 75You should avoid VM-exits triggered by operations initiated by the vCPU. Refer 76to the `Intel 64 and IA-32 Architectures Software Developer's Manual (SDM) 77<https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html>`_ 78"Instructions That Cause VM Exits Unconditionally" (SDM V3, 25.1.2) and 79"Instructions That Cause VM Exits Conditionally" (SDM V3, 25.1.3). 80 81Tip: Do not use CPUID in a real-time critical section. 82 The CPUID instruction causes VM-exits unconditionally. You should 83 detect CPU capability **before** entering a RT-critical section. 84 CPUID can be executed at any privilege level to serialize instruction 85 execution and its high efficiency of execution. It's commonly used as a 86 serializing instruction in an application by using CPUID 87 immediately before and after RDTSC. Remove use of CPUID in this case by 88 using RDTSCP instead of RDTSC. RDTSCP waits until all previous 89 instructions have been executed before reading the counter, and the 90 subsequent instructions after the RDTSCP normally have data dependency 91 on it, so they must wait until the RDTSCP has been executed. 92 93 RDMSR and WRMSR are instructions that cause VM-exits conditionally. On the 94 ACRN RTVM, most MSRs are not intercepted by the HV, so they won't cause a 95 VM-exit. But there are exceptions for security consideration: 96 97 1) read from APICID and LDR; 98 2) write to TSC_ADJUST if VMX_TSC_OFFSET_FULL is zero; 99 otherwise, read and write to TSC_ADJUST and TSC_DEADLINE; 100 3) write to ICR. 101 102Tip: Do not use RDMSR to access APICID and LDR in an RT critical section. 103 ACRN does not present a physical APICID to a guest, so APICID 104 and LDR are virtualized even though LAPIC is passthrough. As a result, 105 access to APICID and LDR can cause a VM-exit. 106 107Tip: Guarantee that VMX_TSC_OFFSET_FULL is zero; otherwise, do not access TSC_ADJUST and TSC_DEADLINE in the RT critical section. 108 ACRN uses VMX_TSC_OFFSET_FULL as the offset between vTSC_ADJUST and 109 pTSC_ADJUST. If VMX_TSC_OFFSET_FULL is zero, intercepting 110 TSC_ADJUST and TSC_DEADLINE is not necessary. Otherwise, they should be 111 intercepted to guarantee functionality. 112 113Tip: Utilize Preempt-RT Linux mechanisms to reduce the access of ICR from the RT core. 114 #. Add ``domain`` to ``isolcpus`` ( ``isolcpus=nohz,domain,1`` ) to the kernel parameters. 115 #. Add ``idle=poll`` to the kernel parameters. 116 #. Add ``rcu_nocb_poll`` along with ``rcu_nocbs=1`` to the kernel parameters. 117 #. Disable the logging service such as ``journald`` or ``syslogd`` if possible. 118 119 The parameters shown above are recommended for the guest Preempt-RT 120 Linux. For an UP RTVM, ICR interception is not a problem. But for an SMP 121 RTVM, IPI may be needed between vCPUs. These tips are about reducing ICR 122 access. The example above assumes it is a dual-core RTVM, while core 0 123 is a housekeeping core and core 1 is a real-time core. The ``domain`` 124 flag makes strong isolation of the RT core from the general SMP 125 balancing and scheduling algorithms. The parameters ``idle=poll`` and 126 ``rcu_nocb_poll`` could prevent the RT core from sending reschedule IPI 127 to wakeup tasks on core 0 in most cases. The logging service is disabled 128 because an IPI may be issued to the housekeeping core to notify the 129 logging service when there are kernel messages output on the RT core. 130 131 .. note:: 132 If an ICR access is inevitable within the RT critical section, be 133 aware of the extra 3~4 microsecond latency for each access. 134 135Tip: Create and initialize the RT tasks at the beginning to avoid runtime access to control registers. 136 Accessing Control Registers is another cause of a VM-exit. An ACRN access 137 to CR3 and CR8 does not cause a VM-exit. However, writes to CR0 and CR4 may cause a 138 VM-exit, which would happen at the spawning and initialization of a new task. 139 140Isolating the Impact of Neighbor VMs 141************************************ 142 143ACRN makes use of several technologies and hardware features to avoid 144performance impact on the RTVM by neighbor VMs: 145 146Tip: Do not share CPUs allocated to the RTVM with other RT or non-RT VMs. 147 ACRN enables CPU sharing to improve the utilization of CPU resources. 148 However, for an RT VM, CPUs should be dedicatedly allocated for determinism. 149 150Tip: Use RDT such as CAT and MBA to allocate dedicated resources to the RTVM. 151 ACRN enables Intel Resource Director Technology such as CAT, and MBA 152 components such as the GPU via the memory hierarchy. The availability of RDT is 153 hardware-specific. Refer to the :ref:`rdt_configuration`. 154 155Tip: Lock the GPU to a feasible lowest frequency. 156 A GPU can put a heavy load on the power/memory subsystem. Locking 157 the GPU frequency as low as possible can help improve RT performance 158 determinism. GPU frequency can usually be locked in the BIOS, but such 159 BIOS support is platform-specific. 160 161Miscellaneous 162************* 163 164Tip: Disable timer migration on Preempt-RT Linux. 165 Because most tasks are set affinitive to the housekeeping core, the timer 166 armed by RT tasks might be migrated to the nearest busy CPU for power 167 saving. But it will hurt RT determinism because the timer interrupts raised 168 on the housekeeping core need to be resent to the RT core. The timer 169 migration can be disabled by the command:: 170 171 echo 0 > /proc/kernel/timer_migration 172 173Tip: Add ``mce=off`` to RT VM kernel parameters. 174 This parameter disables the MCE periodic timer and avoids a VM-exit. 175 176Tip: Disable the Intel processor C-state and P-state of the RTVM. 177 Power management of a processor could save power, but it could also impact 178 the RT performance because the power state is changing. C-state and P-state 179 PM mechanism can be disabled by adding ``processor.max_cstate=0 180 intel_idle.max_cstate=0 intel_pstate=disable`` to the kernel parameters. 181 182Tip: Exercise caution when setting ``/proc/sys/kernel/sched_rt_runtime_us``. 183 Setting ``/proc/sys/kernel/sched_rt_runtime_us`` to ``-1`` can be a 184 problem. A value of ``-1`` allows RT tasks to monopolize a CPU, so that 185 a mechanism such as ``nohz`` might get no chance to work, which can hurt 186 the RT performance or even (potentially) lock up a system. 187 188Tip: Disable the software workaround for Machine Check Error on Page Size Change. 189 By default, the software workaround for Machine Check Error on Page Size 190 Change is conditionally applied to the models that may be affected by the 191 issue. However, the software workaround has a negative impact on 192 performance. If all guest OS kernels are trusted, you can disable the 193 software workaround (by deselecting the :term:`Enable MCE workaround` option 194 in the ACRN Configurator tool) for performance. 195 196.. note:: 197 The tips for preempt-RT Linux are mostly applicable to the Linux-based RTOS as well, such as Xenomai. 198