1.. _hld-overview: 2 3ACRN High-Level Design Overview 4############################### 5 6ACRN is an open-source reference hypervisor (HV) that runs on top of 7:ref:`Intel platforms <hardware>` for heterogeneous use cases such as 8Software-defined Cockpit (SDC), or In-vehicle Experience (IVE) for 9automotive, or human-machine interface (HMI) and real-time OS for industry. 10ACRN provides embedded hypervisor vendors with a reference I/O mediation 11solution with a permissive license and provides auto makers and industry users a 12reference software stack for corresponding use. 13 14ACRN Use Cases 15************** 16 17Software-Defined Cockpit 18======================== 19 20The SDC system consists of multiple systems: the instrument cluster (IC) 21system, the In-vehicle Infotainment (IVI) system, and one or more rear 22seat entertainment (RSE) systems. Each system runs as a VM for better 23isolation. 24 25The Instrument Control (IC) system manages graphic displays of: 26 27- driving speed, engine RPM, temperature, fuel level, odometer, trip mile, etc. 28- alerts of low fuel or tire pressure 29- rear-view camera (RVC) and surround-camera view for driving assistance 30 31In-Vehicle Infotainment 32======================= 33 34A typical In-vehicle Infotainment (IVI) system supports: 35 36- Navigation systems 37- Radios, audio and video playback 38- Mobile devices connection for calls, music, and applications via voice 39 recognition and/or gesture Recognition / Touch 40- Rear-seat RSE services such as: 41 42 - entertainment system 43 - virtual office 44 - connection to IVI front system and mobile devices (cloud 45 connectivity) 46 47ACRN supports guest OSes of Linux and Android. OEMs can use the ACRN hypervisor 48and the Linux or Android guest OS reference code to implement their own VMs for 49a customized IC/IVI/RSE. 50 51Industry Usage 52============== 53 54A typical industry usage includes one Windows HMI + one real-time VM (RTVM): 55 56- Windows HMI as a guest OS with display to provide human-machine interface 57- RTVM that runs a specific RTOS on it to handle 58 real-time workloads such as PLC control 59 60ACRN supports a Windows* Guest OS for such HMI capability. ACRN continues to add 61features to enhance its real-time performance to meet hard-RT key performance 62indicators for its RTVM: 63 64- Cache Allocation Technology (CAT) 65- Memory Bandwidth Allocation (MBA) 66- LAPIC passthrough 67- Polling mode driver 68- Always Running Timer (ART) 69- Intel Time Coordinated Computing (TCC) features, such as split lock 70 detection and cache locking 71 72 73Hardware Requirements 74********************* 75 76Mandatory IA CPU features: 77 78- Long mode 79- MTRR 80- TSC deadline timer 81- NX, SMAP, SMEP 82- Intel-VT including VMX, EPT, VT-d, APICv, VPID, INVEPT and INVVPID 83 84Recommended Memory: 4GB, 8GB preferred. 85 86 87ACRN Architecture 88***************** 89 90ACRN is a type 1 hypervisor that runs on top of bare metal. It supports 91certain :ref:`Intel platforms <hardware>` and can be easily extended to support 92future 93platforms. ACRN implements a hybrid VMM architecture, using a privileged 94Service VM to manage I/O devices and 95provide I/O mediation. Multiple User VMs can be supported, running Ubuntu, 96Android, Windows, or an RTOS such as Zephyr. 97 98ACRN 1.0 99======== 100 101ACRN 1.0 is designed mainly for auto use cases such as SDC and IVI. 102 103Instrument cluster applications are critical in the SDC use case, and may 104require functional safety certification in the future. Running the IC system in 105a separate VM can isolate it from other VMs and their applications, thereby 106reducing the attack surface and minimizing potential interference. However, 107running the IC system in a separate VM introduces additional latency for the IC 108applications. Some country regulations require an IVE system to show a rear-view 109camera (RVC) within 2 seconds, which is difficult to achieve if a separate 110instrument cluster VM is started after the User VM is booted. 111 112:numref:`overview-arch1.0` shows the architecture of ACRN 1.0 together with 113the IC VM and Service VM. As shown, the Service VM owns most of the platform 114devices and 115provides I/O mediation to VMs. Some of the PCIe devices function as a 116passthrough mode to User VMs according to VM configuration. In addition, 117the Service VM could run the IC applications and HV helper applications such 118as the Device Model, VM manager, etc., where the VM manager is responsible 119for VM start/stop/pause, virtual CPU pause/resume, etc. 120 121.. figure:: images/over-image34.png 122 :align: center 123 :name: overview-arch1.0 124 125 ACRN 1.0 Architecture 126 127ACRN 2.0 128======== 129 130ACRN 2.0 extended ACRN to support a pre-launched VM (mainly for safety VM) 131and real-time (RT) VM. 132 133:numref:`overview-arch2.0` shows the architecture of ACRN 2.0; the main 134differences compared to ACRN 1.0 are that: 135 136- ACRN 2.0 supports a pre-launched VM, with isolated resources, 137 including CPU, memory, and hardware devices. 138 139- ACRN 2.0 adds a few necessary device emulations in the hypervisor, such as 140 vPCI and vUART, to avoid interference between different VMs. 141 142- ACRN 2.0 supports an RTVM as a post-launched User VM, with features such as 143 LAPIC passthrough and PMD virtio driver. 144 145.. figure:: images/over-image35.png 146 :align: center 147 :name: overview-arch2.0 148 149 ACRN 2.0 Architecture 150 151.. _intro-io-emulation: 152 153Device Emulation 154================ 155 156ACRN adopts various approaches for emulating devices for the User VM: 157 158- **Emulated device**: A virtual device using this approach is emulated in 159 the Service VM by trapping accesses to the device in the User VM. Two 160 sub-categories exist for emulated devices: 161 162 - fully emulated, allowing native drivers to be used 163 unmodified in the User VM, and 164 - para-virtualized, requiring front-end drivers in 165 the User VM to function. 166 167- **Passthrough device**: A device passed through to the User VM is fully 168 accessible to the User VM without interception. However, interrupts 169 are first handled by the hypervisor before 170 being injected to the User VM. 171 172- **Mediated passthrough device**: A mediated passthrough device is a 173 hybrid of the previous two approaches. Performance-critical 174 resources (mostly data-plane related) are passed-through to the User VMs, and 175 other resources (mostly control-plane related) are emulated. 176 177 178.. _ACRN-io-mediator: 179 180I/O Emulation 181------------- 182 183The Device Model (DM) is a place for managing User VM devices: it allocates 184memory for the User VMs, configures and initializes the devices shared by the 185guest, loads the virtual BIOS and initializes the virtual CPU state, and 186invokes the hypervisor service to execute the guest instructions. 187 188The following diagram illustrates the control flow of emulating a port 189I/O read from the User VM. 190 191.. figure:: images/over-image29.png 192 :align: center 193 :name: overview-io-emu-path 194 195 I/O (PIO/MMIO) Emulation Path 196 197:numref:`overview-io-emu-path` shows an example I/O emulation flow path. 198 199When a guest executes an I/O instruction (port I/O or MMIO), a VM exit 200happens. The HV takes control and executes the request based on the VM exit 201reason ``VMX_EXIT_REASON_IO_INSTRUCTION`` for port I/O access, for 202example. The HV fetches the additional guest instructions, if any, 203and processes the port I/O instructions at a pre-configured port address 204(in ``AL, 20h``, for example). The HV places the decoded information, such as 205the port I/O address, size of access, read/write, and target register, 206into the I/O request in the I/O request buffer (shown in 207:numref:`overview-io-emu-path`) and then notifies/interrupts the Service VM 208to process. 209 210The Hypervisor service module (HSM) in the Service VM intercepts HV interrupts, 211and accesses the I/O request buffer for the port I/O instructions. It 212then checks to see if any kernel device claims ownership of the 213I/O port. The owning device, if any, executes the requested APIs from a 214VM. Otherwise, the HSM leaves the I/O request in the request buffer 215and wakes up the DM thread for processing. 216 217DM follows the same mechanism as HSM. The I/O processing thread of the 218DM queries the I/O request buffer to get the PIO instruction details and 219checks to see if any (guest) device emulation modules claim ownership of 220the I/O port. If yes, the owning module is invoked to execute requested 221APIs. 222 223When the DM completes the emulation (port I/O 20h access in this example) 224of a device such as uDev1, uDev1 puts the result into the request 225buffer (register AL). The DM returns the control to the HV 226indicating completion of an I/O instruction emulation, typically through 227HSM/hypercall. The HV then stores the result to the guest register 228context, advances the guest IP to indicate the completion of instruction 229execution, and resumes the guest. 230 231MMIO access path is similar except for a VM exit reason of *EPT violation*. 232MMIO access is usually trapped through a ``VMX_EXIT_REASON_EPT_VIOLATION`` in 233the hypervisor. 234 235DMA Emulation 236------------- 237 238The only fully virtualized devices to the User VM are USB xHCI, UART, 239and Automotive I/O controller. None of these require emulating 240DMA transactions. ACRN does not support virtual DMA. 241 242Hypervisor 243********** 244 245ACRN takes advantage of Intel Virtualization Technology (Intel VT). 246The ACRN HV runs in Virtual Machine Extension (VMX) root operation, 247host mode, or VMM mode, while the Service VM and User VM guests run 248in VMX non-root operation, or guest mode. (We'll use "root mode" 249and "non-root mode" for simplicity.) 250 251The VMM mode has 4 rings. ACRN 252runs the HV in ring 0 privilege only, and leaves ring 1-3 unused. A guest 253running in non-root mode has its own full rings (ring 0 to 3). The 254guest kernel runs in ring 0 in guest mode, while the guest userland 255applications run in ring 3 of guest mode (ring 1 and 2 are usually not 256used by commercial OS). 257 258.. figure:: images/over-image11.png 259 :align: center 260 :name: overview-arch-hv 261 262 263 Architecture of ACRN Hypervisor 264 265:numref:`overview-arch-hv` shows an overview of the ACRN hypervisor architecture. 266 267- A platform initialization layer provides an entry 268 point, checking hardware capabilities and initializing the 269 processors, memory, and interrupts. Relocation of the hypervisor 270 image and derivation of encryption seeds are also supported by this 271 component. 272 273- A hardware management and utilities layer provides services for 274 managing physical resources at runtime. Examples include handling 275 physical interrupts and low power state changes. 276 277- A layer sitting on top of hardware management enables virtual 278 CPUs (or vCPUs), leveraging Intel VT. A vCPU loop runs a vCPU in 279 non-root mode and handles VM exit events triggered by the vCPU. 280 This layer handles CPU and memory-related VM 281 exits and provides a way to inject exceptions or interrupts to a 282 vCPU. 283 284- On top of vCPUs are three components for device emulation: one for 285 emulation inside the hypervisor, another for communicating with 286 the Service VM for mediation, and the third for managing passthrough 287 devices. 288 289- The highest layer is a VM management module providing 290 VM lifecycle and power operations. 291 292- A library component provides basic utilities for the rest of the 293 hypervisor, including encryption algorithms, mutual-exclusion 294 primitives, etc. 295 296There are three ways that the hypervisor interacts with the Service VM: 297the VM exits (including hypercalls), upcalls, and through the I/O request buffer. 298Interaction between the hypervisor and the User VM is more restricted, including 299only VM exits and hypercalls related to trusty. 300 301Service VM 302********** 303 304The Service VM is an important guest OS in the ACRN architecture. It 305runs in non-root mode, and contains many critical components, including the VM 306Manager, the Device Model (DM), ACRN services, kernel mediation, and virtio 307and hypercall modules (HSM). The DM manages the User VM and 308provides device emulation for it. The User VMS also provides services 309for system power lifecycle management through the ACRN service and VM manager, 310and services for system debugging through ACRN log/trace tools. 311 312DM 313== 314 315DM (Device Model) is a user-level QEMU-like application in the Service VM 316responsible for creating the User VM and then performing devices emulation 317based on command line configurations. 318 319Based on an HSM kernel module, DM interacts with VM Manager to create the User 320VM. It then emulates devices through full virtualization on the DM user 321level, or para-virtualized based on kernel mediator (such as virtio, 322GVT), or passthrough based on kernel HSM APIs. 323 324Refer to :ref:`hld-devicemodel` for more details. 325 326VM Manager 327========== 328 329VM Manager is a user-level service in the Service VM handling User VM creation and 330VM state management, according to the application requirements or system 331power operations. 332 333VM Manager creates the User VM based on DM application, and does User VM state 334management by interacting with lifecycle service in ACRN service. 335 336Refer to :ref:`hv-vm-management` for more details. 337 338ACRN Service 339============ 340 341ACRN service provides 342system lifecycle management based on IOC polling. It communicates with the 343VM Manager to handle the User VM state, such as S3 and power-off. 344 345HSM 346=== 347 348The HSM (Hypervisor service module) kernel module is the Service VM kernel driver 349supporting User VM management and device emulation. Device Model follows 350the standard Linux char device API (ioctl) to access HSM 351functionalities. HSM communicates with the ACRN hypervisor through 352hypercall or upcall interrupts. 353 354Refer to :ref:`hld-devicemodelhsm` for more details. 355 356Kernel Mediators 357================ 358 359Kernel mediators are kernel modules providing a para-virtualization method 360for the User VMs, for example, an i915 GVT driver. 361 362Log/Trace Tools 363=============== 364 365ACRN Log/Trace tools are user-level applications used to 366capture ACRN hypervisor log and trace data. The HSM kernel module provides a 367middle layer to support these tools. 368 369Refer to :ref:`hld-trace-log` for more details. 370 371User VM 372******* 373 374ACRN can boot Linux and Android guest OSes. For an Android guest OS, ACRN 375provides a VM environment with two worlds: normal world and trusty 376world. The Android OS runs in the normal world. The trusty OS and 377security sensitive applications run in the trusty world. The trusty 378world can see the memory of the normal world, but the normal world cannot see 379the trusty world. 380 381Guest Physical Memory Layout - User VM E820 382=========================================== 383 384DM creates an E820 table for a User VM based on these simple rules: 385 386- If requested VM memory size < low memory limitation (2 GB, 387 defined in DM), then low memory range = [0, requested VM memory 388 size] 389 390- If requested VM memory size > low memory limitation, then low 391 memory range = [0, 2G], and high memory range = 392 [4G, 4G + requested VM memory size - 2G] 393 394.. figure:: images/over-image13.png 395 :align: center 396 397 User VM Physical Memory Layout 398 399User VM Memory Allocation 400========================= 401 402The DM does User VM memory allocation based on the hugetlb mechanism by default. 403The real memory mapping may be scattered in the Service VM physical 404memory space, as shown in :numref:`overview-mem-layout`: 405 406.. figure:: images/over-image15.png 407 :align: center 408 :name: overview-mem-layout 409 410 411 User VM Physical Memory Layout Based on Hugetlb 412 413The User VM's memory is allocated by the Service VM DM application; it may come 414from different huge pages in the Service VM as shown in 415:numref:`overview-mem-layout`. 416 417As the Service VM knows the size of these huge pages, 418GPA\ :sup:`service_vm` and GPA\ :sup:`user_vm`, it works with the hypervisor 419to complete the User VM's host-to-guest mapping using this pseudo code: 420 421.. code-block:: none 422 423 for x in allocated huge pages do 424 x.hpa = gpa2hpa_for_service_vm(x.service_vm_gpa) 425 host2guest_map_for_user_vm(x.hpa, x.user_vm_gpa, x.size) 426 end 427 428OVMF Bootloader 429======================= 430 431Open Virtual Machine Firmware (OVMF) is the virtual bootloader that supports 432the EFI boot of the User VM on the ACRN hypervisor platform. 433 434The VM Manager in the Service VM copies OVMF to the User VM memory while 435creating the User VM virtual BSP. The Service VM passes the start of OVMF and 436related information to HV. HV sets the guest RIP of the User VM virtual BSP as 437the start of OVMF and related guest registers, and launches the User VM virtual 438BSP. The OVMF starts running in the virtual real mode within the User VM. 439Conceptually, OVMF is part of the User VM runtime. 440 441Freedom From Interference 442************************* 443 444The hypervisor is critical for preventing inter-VM interference, using 445the following mechanisms: 446 447- Each physical CPU is dedicated to one vCPU. 448 449 CPU sharing is in the TODO list, but talking about inter-VM interference, 450 sharing a physical CPU among multiple vCPUs gives rise to multiple 451 sources of interference such as the vCPU of one VM flushing the 452 L1 & L2 cache for another, or tremendous interrupts for one VM 453 delaying the execution of another. It also requires vCPU 454 scheduling in the hypervisor to consider more complexities such as 455 scheduling latency and vCPU priority, exposing more opportunities 456 for one VM to interfere with another. 457 458 To prevent such interference, ACRN hypervisor could adopt static 459 core partitioning by dedicating each physical CPU to one vCPU. The 460 physical CPU loops in idle when the vCPU is paused by I/O 461 emulation. This makes the vCPU scheduling deterministic and physical 462 resource sharing is minimized. 463 464- Hardware mechanisms including EPT, VT-d, SMAP and SMEP are leveraged 465 to prevent unintended memory accesses. 466 467 Memory corruption can be a common failure mode. ACRN hypervisor properly 468 sets up the memory-related hardware mechanisms to ensure that: 469 470 1. The Service VM cannot access the memory of the hypervisor, unless explicitly 471 allowed. 472 473 2. The User VM cannot access the memory of the Service VM and the hypervisor. 474 475 3. The hypervisor does not unintendedly access the memory of the Service or User VM. 476 477- The destination of external interrupts is set to be the physical core 478 where the VM that handles them is running. 479 480 External interrupts are always handled by the hypervisor in ACRN. 481 Excessive interrupts to one VM (say VM A) could slow down another 482 VM (VM B) if they are handled by the physical core running VM B 483 instead of VM A. Two mechanisms are designed to mitigate such 484 interference. 485 486 1. The destination of an external interrupt is set to the physical core 487 that runs the vCPU where virtual interrupts will be injected. 488 489 2. The hypervisor maintains statistics on the total number of received 490 interrupts to the Service VM via a hypercall, and has a delay mechanism to 491 temporarily block certain virtual interrupts from being injected. 492 This allows the Service VM to detect the occurrence of an interrupt storm and 493 control the interrupt injection rate when necessary. 494 495Boot Flow 496********* 497 498.. figure:: images/over-image85.png 499 :align: center 500 501.. figure:: images/over-image134.png 502 :align: center 503 504 505 ACRN Boot Flow 506 507Power Management 508**************** 509 510CPU P-State & C-State 511===================== 512 513In ACRN, CPU P-state and C-state (Px/Cx) are controlled by the guest OS. 514The corresponding governors are managed in the Service VM or User VM for 515best power efficiency and simplicity. 516 517Guests should be able to process the ACPI P-state and C-state requests from 518OSPM. The needed ACPI objects for P-state and C-state management should be ready 519in an ACPI table. 520 521The hypervisor can restrict a guest's P-state and C-state requests (per customer 522requirement). MSR accesses of P-state requests could be intercepted by 523the hypervisor and forwarded to the host directly if the requested 524P-state is valid. Guest MWAIT or port I/O accesses of C-state control could 525be passed through to host with no hypervisor interception to minimize 526performance impacts. 527 528This diagram shows CPU P-state and C-state management blocks: 529 530.. figure:: images/over-image4.png 531 :align: center 532 533 534 CPU P-State and C-State Management Block Diagram 535 536System Power State 537================== 538 539ACRN supports ACPI standard defined power states: S3 and S5 in system 540level. For each guest, ACRN assumes the guest implements OSPM and controls its 541own power state accordingly. ACRN doesn't involve guest OSPM. Instead, 542it traps the power state transition request from the guest and emulates it. 543 544.. figure:: images/over-image21.png 545 :align: center 546 :name: overview-pm-block 547 548 ACRN Power Management Diagram Block 549 550:numref:`overview-pm-block` shows the basic diagram block for ACRN PM. 551The OSPM in each guest manages the guest power state transition. The 552Device Model running in the Service VM traps and emulates the power state 553transition of the User VM (Linux VM or Android VM in 554:numref:`overview-pm-block`). VM Manager knows all User VM power states and 555notifies the OSPM of the Service VM once 556the User VM is in the required power state. 557 558Then the OSPM of the Service VM starts the power state transition of the Service VM 559trapped to "Sx Agency" in ACRN, and it starts the power state 560transition. 561 562Some details about the ACPI table for the User VM and Service VM: 563 564- The ACPI table in the User VM is emulated by the Device Model. The Device Model 565 knows which register the User VM writes to trigger power state 566 transitions. The Device Model must register an I/O handler for it. 567 568- The ACPI table in the Service VM is passthrough. There is no ACPI parser 569 in ACRN HV. The power management related ACPI table is 570 generated offline and hard-coded in ACRN HV. 571