1.. _partition-mode-hld: 2 3Partition Mode 4############## 5 6ACRN is a type 1 hypervisor that supports running multiple guest operating 7systems (OS). Typically, the platform BIOS/bootloader boots ACRN, and 8ACRN loads single or multiple guest OSes. Refer to :ref:`hv-startup` for 9details on the start-up flow of the ACRN hypervisor. 10 11ACRN supports two modes of operation: sharing mode and partition mode. 12This document describes ACRN's high-level design for partition mode 13support. 14 15.. contents:: 16 :depth: 2 17 :local: 18 19Introduction 20************ 21 22In partition mode, ACRN provides guests with exclusive access to cores, 23memory, cache, and peripheral devices. Partition mode enables developers 24to dedicate resources exclusively among the guests. However, there is no 25support today in x86 hardware or in ACRN to partition resources such as 26peripheral buses (e.g., PCI). On x86 platforms that support Cache 27Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA), developers 28can partition Level 2 (L2) cache, Last Level Cache (LLC), and memory bandwidth 29among the guests. Refer to 30:ref:`hv_rdt` for more details on ACRN RDT high-level design and 31:ref:`rdt_configuration` for RDT configuration. 32 33 34ACRN expects static partitioning of resources either by code 35modification for guest configuration or through compile-time config 36options. All the devices exposed to the guests are either physical 37resources or are emulated in the hypervisor. There is no need for a 38Device Model and Service VM. :numref:`pmode2vms` shows a partition mode 39example of two VMs with exclusive access to physical resources. 40 41.. figure:: images/partition-image3.png 42 :align: center 43 :name: pmode2vms 44 45 Partition Mode Example with Two VMs 46 47Guest Info 48********** 49 50ACRN uses multi-boot info passed from the platform bootloader to know 51the location of each guest kernel in memory. ACRN creates a copy of each 52guest kernel into each of the guests' memory. Current implementation of 53ACRN requires developers to specify kernel parameters for the guests as 54part of the guest configuration. ACRN picks up kernel parameters from the guest 55configuration and copies them to the corresponding guest memory. 56 57.. figure:: images/partition-image18.png 58 :align: center 59 60 Guest Info 61 62ACRN Setup for Guests 63********************* 64 65Cores 66===== 67 68ACRN requires the developer to specify the number of guests and the 69cores dedicated for each guest. Also, the developer needs to specify 70the physical core used as the bootstrap processor (BSP) for each guest. As 71the processors are brought to life in the hypervisor, it checks if they are 72configured as BSP for any of the guests. If a processor is the BSP of any of 73the guests, ACRN proceeds to build the memory mapping for the guest, 74mptable, E820 entries, and zero page for the guest. As described in 75`Guest info`_, ACRN creates copies of guest kernel and kernel 76parameters into guest memory. :numref:`partBSPsetup` explains these 77events in chronological order. 78 79.. figure:: images/partition-image7.png 80 :align: center 81 :name: partBSPsetup 82 83 Event Order for Processor Setup 84 85Memory 86====== 87 88For each guest in partition mode, the ACRN developer specifies the size of 89memory for the guest and the starting address in the host physical 90address in the guest configuration. There is no support for HIGHMEM for 91partition mode guests. The developer needs to take care of two aspects 92for assigning host memory to the guests: 93 941) Sum of guest PCI hole and guest "System RAM" is less than 4GB. 95 962) Pick the starting address in the host physical address and the 97 size so that it does not overlap with any reserved regions in 98 host E820. 99 100ACRN creates EPT mapping for the guest between GPA (0, memory size) and 101HPA (starting address in guest configuration, memory size). 102 103E820 and Zero Page Info 104======================= 105 106A default E820 is used for all the guests in partition mode. This table 107shows the reference E820 layout. Zero page is created with this 108E820 info for all the guests. 109 110+------------------------+ 111| RAM | 112| | 113| 0 - 0xEFFFFH | 114+------------------------+ 115| RESERVED (MPTABLE) | 116| | 117| 0xF0000H - 0x100000H | 118+------------------------+ 119| RAM | 120| | 121| 0x100000H - LOWMEM | 122+------------------------+ 123| RESERVED | 124+------------------------+ 125| PCI HOLE | 126+------------------------+ 127| RESERVED | 128+------------------------+ 129 130Platform Info - mptable 131======================= 132 133ACRN, in partition mode, uses mptable to convey platform info to each 134guest. Using this platform information, number of cores used for each 135guest, and whether the guest needs devices with INTX, ACRN builds 136mptable and copies it to the guest memory. In partition mode, ACRN uses 137physical APIC IDs to pass to the guests. 138 139I/O - Virtual Devices 140===================== 141 142Port I/O is supported for PCI device config space 0xcfc and 0xcf8, vUART 1430x3f8, vRTC 0x70 and 0x71, and vPIC ranges 0x20/21, 0xa0/a1, and 1440x4d0/4d1. MMIO is supported for vIOAPIC. ACRN exposes a virtual 145host-bridge at BDF (Bus Device Function) 0.0:0 to each guest. Access to 146256 bytes of config space for virtual host bridge is emulated. 147 148I/O - Passthrough Devices 149========================= 150 151ACRN, in partition mode, supports passing through PCI devices on the 152platform. All the passthrough devices are exposed as child devices under 153the virtual host bridge. ACRN does not support either passing through 154bridges or emulating virtual bridges. Passthrough devices should be 155statically allocated to each guest using the guest configuration. ACRN 156expects the developer to provide the virtual BDF to BDF of the 157physical device mapping for all the passthrough devices as part of each guest 158configuration. 159 160Runtime ACRN Support for Guests 161******************************* 162 163ACRN, in partition mode, supports an option to pass through LAPIC of the 164physical CPUs to the guest. ACRN expects developers to specify if the 165guest needs LAPIC passthrough using guest configuration. When the guest 166configures vLAPIC as x2APIC, and if the guest configuration has LAPIC 167passthrough enabled, ACRN passes the LAPIC to the guest. The guest can access 168the LAPIC hardware directly without hypervisor interception. During 169runtime of the guest, this option differentiates how ACRN supports 170inter-processor interrupt handling and device interrupt handling. This 171will be discussed in detail in the corresponding sections. 172 173.. figure:: images/partition-image16.png 174 :align: center 175 176 LAPIC Passthrough 177 178Guest SMP Boot Flow 179=================== 180 181The core APIC IDs are reported to the guest using mptable info. SMP boot 182flow is similar to sharing mode. Refer to :ref:`vm-startup` 183for guest SMP boot flow in ACRN. Partition mode guests startup is the same as 184the Service VM startup in sharing mode. 185 186Inter-Processor Interrupt (IPI) Handling 187======================================== 188 189Guests Without LAPIC Passthrough 190-------------------------------- 191 192For guests without LAPIC passthrough, IPIs between guest CPUs are handled in 193the same way as sharing mode in ACRN. Refer to :ref:`virtual-interrupt-hld` 194for more details. 195 196Guests With LAPIC Passthrough 197----------------------------- 198 199ACRN supports passthrough if and only if the guest is using x2APIC mode 200for the vLAPIC. In LAPIC passthrough mode, writes to the Interrupt Command 201Register (ICR) x2APIC MSR are intercepted. The guest writes the IPI info, 202including vector, and destination APIC IDs to the ICR. Upon an IPI request 203from the guest, ACRN does a sanity check on the destination processors 204programmed into the ICR. If the destination is a valid target for the guest, 205ACRN sends an IPI with the same vector from the ICR to the physical CPUs 206corresponding to the destination processor info in the ICR. 207 208.. figure:: images/partition-image14.png 209 :align: center 210 211 IPI Handling for Guests With LAPIC Passthrough 212 213Passthrough Device Support 214========================== 215 216Configuration Space Access 217-------------------------- 218 219ACRN emulates Configuration Space Address (0xcf8) I/O port and 220Configuration Space Data (0xcfc) I/O port for guests to access PCI 221devices configuration space. Within the config space of a device, Base 222Address registers (BAR), offsets starting from 0x10H to 0x24H, provide 223the information about the resources (I/O and MMIO) used by the PCI 224device. ACRN virtualizes the BAR registers and for the rest of the 225config space, forwards reads and writes to the physical config space of 226passthrough devices. Refer to the `I/O`_ section below for more details. 227 228.. figure:: images/partition-image1.png 229 :align: center 230 231 Configuration Space Access 232 233DMA 234--- 235 236ACRN developers need to statically define the passthrough devices for each 237guest using the guest configuration. For devices to DMA to/from guest 238memory directly, ACRN parses the list of passthrough devices for each 239guest and creates context entries in the VT-d remapping hardware. EPT 240page tables created for the guest are used for VT-d page tables. 241 242I/O 243--- 244 245ACRN supports I/O for passthrough devices with two restrictions. 246 2471) Supports only MMIO. Thus, this requires developers to expose I/O BARs as 248 not present in the guest configuration. 249 2502) Supports only 32-bit MMIO BAR type. 251 252As the guest PCI sub-system scans the PCI bus and assigns a Guest Physical 253Address (GPA) to the MMIO BAR, ACRN maps the GPA to the address in the 254physical BAR of the passthrough device using EPT. The following timeline chart 255explains how PCI devices are assigned to the guest and how BARs are mapped upon 256guest initialization. 257 258.. figure:: images/partition-image13.png 259 :align: center 260 261 I/O for Passthrough Devices 262 263Interrupt Configuration 264----------------------- 265 266ACRN supports both legacy (INTx) and MSI interrupts for passthrough 267devices. 268 269INTx Support 270~~~~~~~~~~~~ 271 272ACRN expects developers to identify the interrupt line info (0x3CH) from 273the physical BAR of the passthrough device and build an interrupt entry in 274the mptable for the corresponding guest. As the guest configures the vIOAPIC 275for the interrupt RTE, ACRN writes the info from the guest RTE into the 276physical IOAPIC RTE. Upon the guest kernel request to mask the interrupt, 277ACRN writes to the physical RTE to mask the interrupt at the physical 278IOAPIC. When the guest masks the RTE in vIOAPIC, ACRN masks the interrupt 279RTE in the physical IOAPIC. Level triggered interrupts are not 280supported. 281 282MSI Support 283~~~~~~~~~~~ 284 285The guest reads/writes to the PCI configuration space to configure MSI 286interrupts using an address. Data and control registers are passed through to 287the physical BAR of the passthrough device. Refer to `Configuration 288Space Access`_ for details on how the PCI configuration space is emulated. 289 290Virtual Device Support 291====================== 292 293ACRN provides read-only vRTC support for partition mode guests. Writes 294to the data port are discarded. 295 296For port I/O to ports other than vPIC, vRTC, or vUART, reads return 0xFF and 297writes are discarded. 298 299Interrupt Delivery 300================== 301 302Guests Without LAPIC Passthrough 303-------------------------------- 304 305In ACRN partition mode, interrupts stay disabled after a vmexit. The 306processor does not take interrupts when it is executing in VMX root 307mode. ACRN configures the processor to take vmexit upon external 308interrupt if the processor is executing in VMX non-root mode. Upon an 309external interrupt, after sending EOI to the physical LAPIC, ACRN 310injects the vector into the vLAPIC of the vCPU running on the 311processor. Guests using a Linux kernel use vectors less than 0xECh 312for device interrupts. 313 314.. figure:: images/partition-image20.png 315 :align: center 316 317 Interrupt Delivery for Guests Without LAPIC Passthrough 318 319Guests With LAPIC Passthrough 320----------------------------- 321 322For guests with LAPIC passthrough, ACRN does not configure vmexit upon 323external interrupts. There is no vmexit upon device interrupts and they are 324handled by the guest IDT. 325 326Hypervisor IPI Service 327====================== 328 329ACRN needs IPIs for events such as flushing TLBs across CPUs, sending virtual 330device interrupts (e.g., vUART to vCPUs), and others. 331 332Guests Without LAPIC Passthrough 333-------------------------------- 334 335Hypervisor IPIs work the same way as in sharing mode. 336 337Guests With LAPIC Passthrough 338----------------------------- 339 340Since external interrupts are passed through to the guest IDT, IPIs do not 341trigger vmexit. ACRN uses NMI delivery mode and the NMI exiting is 342chosen for vCPUs. At the time of NMI interrupt on the target processor, 343if the processor is in non-root mode, vmexit happens on the processor 344and the event mask is checked for servicing the events. 345 346Debug Console 347============= 348 349For details on how the hypervisor console works, refer to 350:ref:`hv-console`. 351 352For a guest console in partition mode, ACRN provides an option to pass 353``vmid`` as an argument to ``vm_console``. vmid is the same as the one 354developers use in the guest configuration. 355 356Guests Without LAPIC Passthrough 357-------------------------------- 358 359Works the same way as sharing mode. 360 361Hypervisor Console 362================== 363 364ACRN uses the TSC deadline timer to provide a timer service. The hypervisor 365console uses a timer on CPU0 to poll characters on the serial device. To 366support LAPIC passthrough, the TSC deadline MSR is passed through and the local 367timer interrupt is also delivered to the guest IDT. Instead of the TSC 368deadline timer, ACRN uses the VMX preemption timer to poll the serial device. 369 370Guest Console 371============= 372 373ACRN exposes vUART to partition mode guests. vUART uses vPIC to inject an 374interrupt to the guest BSP. If the guest has more than one core, 375during runtime, vUART might need to inject an interrupt to the guest BSP from 376another core (other than BSP). As mentioned in section `Hypervisor IPI 377Service`_, ACRN uses NMI delivery mode for notifying the CPU running the BSP 378of the guest. 379