1.. _sriov_virtualization: 2 3Enable SR-IOV Virtualization 4############################ 5 6SR-IOV (Single Root Input/Output Virtualization) can isolate PCIe devices 7to improve performance that is similar to bare-metal levels. SR-IOV consists 8of two basic units: PF (Physical Function), which supports SR-IOV PCIe 9extended capability and manages entire physical devices; and VF (Virtual 10Function), a "lightweight" PCIe function that is a passthrough device for 11VMs. 12 13For details, refer to Chapter 9 of PCI-SIG's 14`PCI Express Base Specification Revision 4.0, Version 1.0 15<https://pcisig.com/pci-express-architecture-configuration-space-test-specification-revision-40-version-10>`_. 16 17SR-IOV Architectural Overview 18***************************** 19 20.. figure:: images/sriov-image1.png 21 :align: center 22 :name: SR-IOV-architecture-overview 23 24 SR-IOV Architectural Overview 25 26- **SI** - A System Image known as a VM. 27 28- **VI** - A Virtualization Intermediary known as a hypervisor. 29 30- **SR-PCIM** - A Single Root PCI Manager; it is a software entity for 31 SR-IOV management. 32 33- **PF** - A PCIe Function that supports the SR-IOV capability 34 and is accessible to an SR-PCIM, a VI, or an SI. 35 36- **VF** - A "light-weight" PCIe Function that is directly accessible by an 37 SI. 38 39SR-IOV Extended Capability 40-------------------------- 41 42The SR-IOV Extended Capability defined here is a PCIe extended 43capability that must be implemented in each PF device that supports the 44SR-IOV feature. This capability is used to describe and control a PF's 45SR-IOV capabilities. 46 47.. figure:: images/sriov-image2.png 48 :align: center 49 :name: SR-IOV-extended-capability 50 51 SR-IOV Extended Capability 52 53- **PCIe Extended Capability ID** - 0010h. 54 55- **SR-IOV Capabilities** - VF Migration-Capable and ARI-Capable. 56 57- **SR-IOV Control** - Enable/Disable VFs; VF migration state query. 58 59- **SR-IOV Status** - VF Migration Status. 60 61- **Initial VFs** - Indicates to the SR-PCIM the number of VFs that are 62 initially associated with the PF. 63 64- **Total VFs** - Indicates the maximum number of VFs that can be 65 associated with the PF. 66 67- **Num VFs** - Controls the number of VFs that are visible. *Num VFs* <= 68 *Initial VFs* = *Total VFs*. 69 70- **Function Link Dependency** - The field used to describe 71 dependencies between PFs. VF dependencies are the same as the 72 dependencies of their associated PFs. 73 74- **First VF Offset** - A constant that defines the Routing ID 75 offset of the first VF that is associated with the PF that contains 76 this Capability structure. 77 78- **VF Stride** - Defines the Routing ID offset from one VF to the 79 next one for all VFs associated with the PF that contains this 80 Capability structure. 81 82- **VF Device ID** - The field that contains the Device ID that should be 83 presented for every VF to the SI. 84 85- **Supported Page Sizes** - The field that indicates the page sizes 86 supported by the PF. 87 88- **System Page Size** - The field that defines the page size the system 89 will use to map the VFs' memory addresses. Software must set the 90 value of the *System Page Size* to one of the page sizes set in the 91 *Supported Page Sizes* field. 92 93- **VF BARs** - Fields that must define the VF's Base Address 94 Registers (BARs). These fields behave as normal PCI BARs. 95 96- **VF Migration State Array Offset** - Register that contains a 97 PF BAR relative pointer to the VF Migration State Array. 98 99- **VF Migration State Array** - Located using the VF Migration 100 State Array Offset register of the SR-IOV Capability block. 101 102For details, refer to the *PCI Express Base Specification Revision 4.0, Version 1.0 Chapter 9.3.3*. 103 104SR-IOV Architecture in ACRN 105--------------------------- 106 107.. figure:: images/sriov-image3.png 108 :align: center 109 :name: SR-IOV-architecure-in-acrn 110 111 SR-IOV Architectural in ACRN 112 1131. A hypervisor detects an SR-IOV capable PCIe device in the physical PCI 114 device enumeration phase. 115 1162. The hypervisor intercepts the PF's SR-IOV capability and accesses whether 117 to enable/disable VF devices based on the ``VF_ENABLE`` state. All 118 read/write requests for a PF device passthrough to the PF physical 119 device. 120 1213. The hypervisor waits for 100ms after ``VF_ENABLE`` is set and initializes 122 VF devices. The differences between a normal passthrough device and 123 SR-IOV VF device are physical device detection, BARs, and MSI-X 124 initialization. The hypervisor uses ``Subsystem Vendor ID`` to detect the 125 SR-IOV VF physical device instead of ``Vendor ID`` since no valid 126 ``Vendor ID`` exists for the SR-IOV VF physical device. The VF BARs are 127 initialized by its associated PF's SR-IOV capabilities, not PCI 128 standard BAR registers. The MSI-X mapping base address is also from the 129 PF's SR-IOV capabilities, not PCI standard BAR registers. 130 131SR-IOV Passthrough VF Architecture in ACRN 132------------------------------------------ 133 134.. figure:: images/sriov-image4.png 135 :align: center 136 :name: SR-IOV-vf-passthrough 137 138 SR-IOV VF Passthrough Architecture in ACRN 139 1401. The SR-IOV VF device needs to bind the PCI-stud driver instead of the 141 vendor-specific VF driver before the device passthrough. 142 1432. The user configures the ``acrn-dm`` boot parameter with the passthrough 144 SR-IOV VF device. When the User VM starts, ``acrn-dm`` invokes a 145 hypercall to set the *vdev-VF0* device in the User VM. 146 1473. The hypervisor emulates ``Device ID/Vendor ID`` and ``Memory Space Enable 148 (MSE)`` in the configuration space for an assigned SR-IOV VF device. The 149 assigned VF ``Device ID`` comes from its associated PF's capability. The 150 ``Vendor ID`` is the same as the PF's ``Vendor ID`` and the ``MSE`` is always 151 set when reading the SR-IOV VF device's control register. 152 1534. The vendor-specific VF driver in the target VM probes the assigned SR-IOV 154 VF device. 155 156SR-IOV Initialization Flow 157-------------------------- 158 159.. figure:: images/sriov-image5.png 160 :align: center 161 :name: SR-IOV-init-flow 162 163 SR-IOV Initialization Flow 164 165When an SR-IOV capable device is initialized, all access to the 166configuration space will passthrough to the physical device directly. 167The Service VM can identify all capabilities of the device from the SR-IOV 168extended capability and then create a *sysfs* node for SR-IOV management. 169 170SR-IOV VF Enable Flow 171--------------------- 172 173.. figure:: images/sriov-image6.png 174 :align: center 175 :width: 900px 176 :name: SR-IOV-enable-flow 177 178 SR-IOV VF Enable Flow 179 180The application enables ``n`` VF devices via an SR-IOV PF device ``sysfs`` node. 181The hypervisor intercepts all SR-IOV capability access and checks the 182``VF_ENABLE`` state. If ``VF_ENABLE`` is set, the hypervisor creates n 183virtual devices after 100ms so that VF physical devices have enough time to 184be created. The Service VM waits 100ms and then only accesses the first VF 185device's configuration space including Class Code, Reversion ID, Subsystem 186Vendor ID, Subsystem ID. The Service VM uses the first VF device 187information to initialize subsequent VF devices. 188 189SR-IOV VF Disable Flow 190---------------------- 191 192.. figure:: images/sriov-image7.png 193 :align: center 194 :name: SR-IOV-disable-flow 195 196 SR-IOV VF Disable Flow 197 198The application disables SR-IOV VF devices by writing zero to the SR-IOV PF 199device ``sysfs`` node. The hypervisor intercepts all SR-IOV capability 200accesses and checks the ``VF_ENABLE`` state. If ``VF_ENABLE`` is clear, the 201hypervisor makes VF virtual devices invisible from the Service VM so that all 202access to VF devices will return ``0xFFFFFFFF`` as an error. The VF physical 203devices are removed within 1s of when ``VF_ENABLE`` is clear. 204 205SR-IOV VF Assignment Policy 206--------------------------- 207 208.. figure:: images/sriov-image8.png 209 :align: center 210 :name: SR-IOV-vf-assignment 211 212 SR-IOV VF Assignment 213 2141. All SR-IOV PF devices are managed by the Service VM. 215 2162. The SR-IOV PF cannot passthrough to the User VM. 217 2183. All VFs can passthrough to the User VM, but we do not recommend 219 a passthrough to high privilege VMs because the PF device may impact 220 the assigned VFs' functionality and stability. 221 222SR-IOV Usage Guide in ACRN 223-------------------------- 224 225We use the Intel 82576 NIC as an example in the following instructions. We 226only support LaaG (Linux as a Guest). 227 2281. Ensure that the 82576 VF driver is compiled into the User VM Kernel 229 (set ``CONFIG_IGBVF=y`` in the Kernel Config). 230 231#. When the Service VM boots, the ``lspci -v`` command indicates 232 that the Intel 82576 NIC devices have SR-IOV capability and their PF 233 drivers are ``igb``. 234 235 .. figure:: images/sriov-image9.png 236 :align: center 237 :name: 82576-pf 238 239 82576 SR-IOV PF Devices 240 241#. Input the ``echo n > /sys/class/net/enp109s0f0/device/sriov\_numvfs`` 242 command in the Service VM to enable n VF devices for the first PF 243 device (\ *enp109s0f0)*. The number *n* can't be more than *TotalVFs* 244 coming from the return value of command 245 ``cat /sys/class/net/enp109s0f0/device/sriov\_totalvfs``. Here we 246 use *n = 2* as an example. 247 248 .. figure:: images/sriov-image10.png 249 :align: center 250 :name: 82576-vf 251 252 82576 SR-IOV VF Devices 253 254 .. figure:: images/sriov-image11.png 255 :align: center 256 :name: 82576-vf-nic 257 258 82576 SR-IOV VF NIC 259 260#. Passthrough an SR-IOV VF device to guest. 261 262 a. Unbind the igbvf driver in the Service VM. 263 264 i. ``modprobe pci\_stub`` 265 266 ii. ``echo "8086 10ca" > /sys/bus/pci/drivers/pci-stub/new\_id`` 267 268 iii. ``echo "0000:6d:10.0" > /sys/bus/pci/devices/0000:6d:10.0/driver/unbind`` 269 270 iv. ``echo "0000:6d:10.0" > /sys/bus/pci/drivers/pci-stub/bind`` 271 272 b. Add the SR-IOV VF device parameter (``-s X, passthru,6d/10/0``) in 273 the launch User VM script 274 275 .. figure:: images/sriov-image12.png 276 :align: center 277 :name: 82576-nic-passthru 278 279 Configure 82576 NIC as a Passthrough Device 280 281 c. Boot the User VM 282 283SR-IOV Limitations in ACRN 284-------------------------- 285 2861. The SR-IOV migration feature is not supported. 287 2882. If an SR-IOV PF device is detected during the enumeration phase, but 289 not enough room exists for its total VF devices, the PF device will be 290 dropped. The platform uses the ``MAX_PCI_DEV_NUM`` ACRN configuration to 291 support the maximum number of PCI devices. Make sure ``MAX_PCI_DEV_NUM`` is 292 more than the number of all PCI devices, including the total SR-IOV VF 293 devices. 294