1.. _hld-overview:
2
3ACRN High-Level Design Overview
4###############################
5
6ACRN is an open-source reference hypervisor (HV) that runs on top of
7:ref:`Intel platforms <hardware>` for heterogeneous use cases such as
8Software-defined Cockpit (SDC), or In-vehicle Experience (IVE) for
9automotive, or human-machine interface (HMI) and real-time OS for industry.
10ACRN provides embedded hypervisor vendors with a reference I/O mediation
11solution with a permissive license and provides auto makers and industry users a
12reference software stack for corresponding use.
13
14ACRN Use Cases
15**************
16
17Software-Defined Cockpit
18========================
19
20The SDC system consists of multiple systems: the instrument cluster (IC)
21system, the In-vehicle Infotainment (IVI) system, and one or more rear
22seat entertainment (RSE) systems.  Each system runs as a VM for better
23isolation.
24
25The Instrument Control (IC) system manages graphic displays of:
26
27- driving speed, engine RPM, temperature, fuel level, odometer, trip mile, etc.
28- alerts of low fuel or tire pressure
29- rear-view camera (RVC) and surround-camera view for driving assistance
30
31In-Vehicle Infotainment
32=======================
33
34A typical In-vehicle Infotainment (IVI) system supports:
35
36- Navigation systems
37- Radios, audio and video playback
38- Mobile devices connection for calls, music, and applications via voice
39  recognition and/or gesture Recognition / Touch
40- Rear-seat RSE services such as:
41
42  - entertainment system
43  - virtual office
44  - connection to IVI front system and mobile devices (cloud
45    connectivity)
46
47ACRN supports guest OSes of Linux and Android. OEMs can use the ACRN hypervisor
48and the Linux or Android guest OS reference code to implement their own VMs for
49a customized IC/IVI/RSE.
50
51Industry Usage
52==============
53
54A typical industry usage includes one Windows HMI + one real-time VM (RTVM):
55
56- Windows HMI as a guest OS with display to provide human-machine interface
57- RTVM that runs a specific RTOS on it to handle
58  real-time workloads such as PLC control
59
60ACRN supports a Windows* Guest OS for such HMI capability. ACRN continues to add
61features to enhance its real-time performance to meet hard-RT key performance
62indicators for its RTVM:
63
64- Cache Allocation Technology (CAT)
65- Memory Bandwidth Allocation (MBA)
66- LAPIC passthrough
67- Polling mode driver
68- Always Running Timer (ART)
69- Intel Time Coordinated Computing (TCC) features, such as split lock
70  detection and cache locking
71
72
73Hardware Requirements
74*********************
75
76Mandatory IA CPU features:
77
78- Long mode
79- MTRR
80- TSC deadline timer
81- NX, SMAP, SMEP
82- Intel-VT including VMX, EPT, VT-d, APICv, VPID, INVEPT and INVVPID
83
84Recommended Memory: 4GB, 8GB preferred.
85
86
87ACRN Architecture
88*****************
89
90ACRN is a type 1 hypervisor that runs on top of bare metal. It supports
91certain :ref:`Intel platforms <hardware>` and can be easily extended to support
92future
93platforms. ACRN implements a hybrid VMM architecture, using a privileged
94Service VM to manage I/O devices and
95provide I/O mediation. Multiple User VMs can be supported, running Ubuntu,
96Android, Windows, or an RTOS such as Zephyr.
97
98ACRN 1.0
99========
100
101ACRN 1.0 is designed mainly for auto use cases such as SDC and IVI.
102
103Instrument cluster applications are critical in the SDC use case, and may
104require functional safety certification in the future. Running the IC system in
105a separate VM can isolate it from other VMs and their applications, thereby
106reducing the attack surface and minimizing potential interference. However,
107running the IC system in a separate VM introduces additional latency for the IC
108applications. Some country regulations require an IVE system to show a rear-view
109camera (RVC) within 2 seconds, which is difficult to achieve if a separate
110instrument cluster VM is started after the User VM is booted.
111
112:numref:`overview-arch1.0` shows the architecture of ACRN 1.0 together with
113the IC VM and Service VM. As shown, the Service VM owns most of the platform
114devices and
115provides I/O mediation to VMs. Some of the PCIe devices function as a
116passthrough mode to User VMs according to VM configuration. In addition,
117the Service VM could run the IC applications and HV helper applications such
118as the Device Model, VM manager, etc., where the VM manager is responsible
119for VM start/stop/pause, virtual CPU pause/resume, etc.
120
121.. figure:: images/over-image34.png
122   :align: center
123   :name: overview-arch1.0
124
125   ACRN 1.0 Architecture
126
127ACRN 2.0
128========
129
130ACRN 2.0 extended ACRN to support a pre-launched VM (mainly for safety VM)
131and real-time (RT) VM.
132
133:numref:`overview-arch2.0` shows the architecture of ACRN 2.0; the main
134differences compared to ACRN 1.0 are that:
135
136-  ACRN 2.0 supports a pre-launched VM, with isolated resources,
137   including CPU, memory, and hardware devices.
138
139-  ACRN 2.0 adds a few necessary device emulations in the hypervisor, such as
140   vPCI and vUART, to avoid interference between different VMs.
141
142-  ACRN 2.0 supports an RTVM as a post-launched User VM, with features such as
143   LAPIC passthrough and PMD virtio driver.
144
145.. figure:: images/over-image35.png
146   :align: center
147   :name: overview-arch2.0
148
149   ACRN 2.0 Architecture
150
151.. _intro-io-emulation:
152
153Device Emulation
154================
155
156ACRN adopts various approaches for emulating devices for the User VM:
157
158-  **Emulated device**: A virtual device using this approach is emulated in
159   the Service VM by trapping accesses to the device in the User VM. Two
160   sub-categories exist for emulated devices:
161
162   -  fully emulated, allowing native drivers to be used
163      unmodified in the User VM, and
164   -  para-virtualized, requiring front-end drivers in
165      the User VM to function.
166
167-  **Passthrough device**: A device passed through to the User VM is fully
168   accessible to the User VM without interception. However, interrupts
169   are first handled by the hypervisor before
170   being injected to the User VM.
171
172-  **Mediated passthrough device**: A mediated passthrough device is a
173   hybrid of the previous two approaches. Performance-critical
174   resources (mostly data-plane related) are passed-through to the User VMs, and
175   other resources (mostly control-plane related) are emulated.
176
177
178.. _ACRN-io-mediator:
179
180I/O Emulation
181-------------
182
183The Device Model (DM) is a place for managing User VM devices: it allocates
184memory for the User VMs, configures and initializes the devices shared by the
185guest, loads the virtual BIOS and initializes the virtual CPU state, and
186invokes the hypervisor service to execute the guest instructions.
187
188The following diagram illustrates the control flow of emulating a port
189I/O read from the User VM.
190
191.. figure:: images/over-image29.png
192   :align: center
193   :name: overview-io-emu-path
194
195   I/O (PIO/MMIO) Emulation Path
196
197:numref:`overview-io-emu-path` shows an example I/O emulation flow path.
198
199When a guest executes an I/O instruction (port I/O or MMIO), a VM exit
200happens. The HV takes control and executes the request based on the VM exit
201reason ``VMX_EXIT_REASON_IO_INSTRUCTION`` for port I/O access, for
202example. The HV fetches the additional guest instructions, if any,
203and processes the port I/O instructions at a pre-configured port address
204(in ``AL, 20h``, for example). The HV places the decoded information, such as
205the port I/O address, size of access, read/write, and target register,
206into the I/O request in the I/O request buffer (shown in
207:numref:`overview-io-emu-path`) and then notifies/interrupts the Service VM
208to process.
209
210The Hypervisor service module (HSM) in the Service VM intercepts HV interrupts,
211and accesses the I/O request buffer for the port I/O instructions. It
212then checks to see if any kernel device claims ownership of the
213I/O port. The owning device, if any, executes the requested APIs from a
214VM. Otherwise, the HSM leaves the I/O request in the request buffer
215and wakes up the DM thread for processing.
216
217DM follows the same mechanism as HSM. The I/O processing thread of the
218DM queries the I/O request buffer to get the PIO instruction details and
219checks to see if any (guest) device emulation modules claim ownership of
220the I/O port. If yes, the owning module is invoked to execute requested
221APIs.
222
223When the DM completes the emulation (port I/O 20h access in this example)
224of a device such as uDev1, uDev1 puts the result into the request
225buffer (register AL). The DM returns the control to the HV
226indicating completion of an I/O instruction emulation, typically through
227HSM/hypercall. The HV then stores the result to the guest register
228context, advances the guest IP to indicate the completion of instruction
229execution, and resumes the guest.
230
231MMIO access path is similar except for a VM exit reason of *EPT violation*.
232MMIO access is usually trapped through a ``VMX_EXIT_REASON_EPT_VIOLATION`` in
233the hypervisor.
234
235DMA Emulation
236-------------
237
238The only fully virtualized devices to the User VM are USB xHCI, UART,
239and Automotive I/O controller. None of these require emulating
240DMA transactions. ACRN does not support virtual DMA.
241
242Hypervisor
243**********
244
245ACRN takes advantage of Intel Virtualization Technology (Intel VT).
246The ACRN HV runs in Virtual Machine Extension (VMX) root operation,
247host mode, or VMM mode, while the Service VM and User VM guests run
248in VMX non-root operation, or guest mode. (We'll use "root mode"
249and "non-root mode" for simplicity.)
250
251The VMM mode has 4 rings. ACRN
252runs the HV in ring 0 privilege only, and leaves ring 1-3 unused. A guest
253running in non-root mode has its own full rings (ring 0 to 3). The
254guest kernel runs in ring 0 in guest mode, while the guest userland
255applications run in ring 3 of guest mode (ring 1 and 2 are usually not
256used by commercial OS).
257
258.. figure:: images/over-image11.png
259   :align: center
260   :name: overview-arch-hv
261
262
263   Architecture of ACRN Hypervisor
264
265:numref:`overview-arch-hv` shows an overview of the ACRN hypervisor architecture.
266
267-  A platform initialization layer provides an entry
268   point, checking hardware capabilities and initializing the
269   processors, memory, and interrupts. Relocation of the hypervisor
270   image and derivation of encryption seeds are also supported by this
271   component.
272
273-  A hardware management and utilities layer provides services for
274   managing physical resources at runtime. Examples include handling
275   physical interrupts and low power state changes.
276
277-  A layer sitting on top of hardware management enables virtual
278   CPUs (or vCPUs), leveraging Intel VT. A vCPU loop runs a vCPU in
279   non-root mode and handles VM exit events triggered by the vCPU.
280   This layer handles CPU and memory-related VM
281   exits and provides a way to inject exceptions or interrupts to a
282   vCPU.
283
284-  On top of vCPUs are three components for device emulation: one for
285   emulation inside the hypervisor, another for communicating with
286   the Service VM for mediation, and the third for managing passthrough
287   devices.
288
289-  The highest layer is a VM management module providing
290   VM lifecycle and power operations.
291
292-  A library component provides basic utilities for the rest of the
293   hypervisor, including encryption algorithms, mutual-exclusion
294   primitives, etc.
295
296There are three ways that the hypervisor interacts with the Service VM:
297the VM exits (including hypercalls), upcalls, and through the I/O request buffer.
298Interaction between the hypervisor and the User VM is more restricted, including
299only VM exits and hypercalls related to trusty.
300
301Service VM
302**********
303
304The Service VM is an important guest OS in the ACRN architecture. It
305runs in non-root mode, and contains many critical components, including the VM
306Manager, the Device Model (DM), ACRN services, kernel mediation, and virtio
307and hypercall modules (HSM). The DM manages the User VM and
308provides device emulation for it. The User VMS also provides services
309for system power lifecycle management through the ACRN service and VM manager,
310and services for system debugging through ACRN log/trace tools.
311
312DM
313==
314
315DM (Device Model) is a user-level QEMU-like application in the Service VM
316responsible for creating the User VM and then performing devices emulation
317based on command line configurations.
318
319Based on an HSM kernel module, DM interacts with VM Manager to create the User
320VM. It then emulates devices through full virtualization on the DM user
321level, or para-virtualized based on kernel mediator (such as virtio,
322GVT), or passthrough based on kernel HSM APIs.
323
324Refer to :ref:`hld-devicemodel` for more details.
325
326VM Manager
327==========
328
329VM Manager is a user-level service in the Service VM handling User VM creation and
330VM state management, according to the application requirements or system
331power operations.
332
333VM Manager creates the User VM based on DM application, and does User VM state
334management by interacting with lifecycle service in ACRN service.
335
336Refer to :ref:`hv-vm-management` for more details.
337
338ACRN Service
339============
340
341ACRN service provides
342system lifecycle management based on IOC polling. It communicates with the
343VM Manager to handle the User VM state, such as S3 and power-off.
344
345HSM
346===
347
348The HSM (Hypervisor service module) kernel module is the Service VM kernel driver
349supporting User VM management and device emulation. Device Model follows
350the standard Linux char device API (ioctl) to access HSM
351functionalities. HSM communicates with the ACRN hypervisor through
352hypercall or upcall interrupts.
353
354Refer to :ref:`hld-devicemodelhsm` for more details.
355
356Kernel Mediators
357================
358
359Kernel mediators are kernel modules providing a para-virtualization method
360for the User VMs, for example, an i915 GVT driver.
361
362Log/Trace Tools
363===============
364
365ACRN Log/Trace tools are user-level applications used to
366capture ACRN hypervisor log and trace data. The HSM kernel module provides a
367middle layer to support these tools.
368
369Refer to :ref:`hld-trace-log` for more details.
370
371User VM
372*******
373
374ACRN can boot Linux and Android guest OSes. For an Android guest OS, ACRN
375provides a VM environment with two worlds: normal world and trusty
376world. The Android OS runs in the normal world. The trusty OS and
377security sensitive applications run in the trusty world. The trusty
378world can see the memory of the normal world, but the normal world cannot see
379the trusty world.
380
381Guest Physical Memory Layout - User VM E820
382===========================================
383
384DM creates an E820 table for a User VM based on these simple rules:
385
386- If requested VM memory size < low memory limitation (2 GB,
387  defined in DM), then low memory range = [0, requested VM memory
388  size]
389
390- If requested VM memory size > low memory limitation, then low
391  memory range = [0, 2G], and high memory range =
392  [4G, 4G + requested VM memory size - 2G]
393
394.. figure:: images/over-image13.png
395   :align: center
396
397   User VM Physical Memory Layout
398
399User VM Memory Allocation
400=========================
401
402The DM does User VM memory allocation based on the hugetlb mechanism by default.
403The real memory mapping may be scattered in the Service VM physical
404memory space, as shown in :numref:`overview-mem-layout`:
405
406.. figure:: images/over-image15.png
407   :align: center
408   :name: overview-mem-layout
409
410
411   User VM Physical Memory Layout Based on Hugetlb
412
413The User VM's memory is allocated by the Service VM DM application; it may come
414from different huge pages in the Service VM as shown in
415:numref:`overview-mem-layout`.
416
417As the Service VM knows the size of these huge pages,
418GPA\ :sup:`service_vm` and GPA\ :sup:`user_vm`, it works with the hypervisor
419to complete the User VM's host-to-guest mapping using this pseudo code:
420
421.. code-block:: none
422
423   for x in allocated huge pages do
424      x.hpa = gpa2hpa_for_service_vm(x.service_vm_gpa)
425      host2guest_map_for_user_vm(x.hpa, x.user_vm_gpa, x.size)
426   end
427
428OVMF Bootloader
429=======================
430
431Open Virtual Machine Firmware (OVMF) is the virtual bootloader that supports
432the EFI boot of the User VM on the ACRN hypervisor platform.
433
434The VM Manager in the Service VM copies OVMF to the User VM memory while
435creating the User VM virtual BSP. The Service VM passes the start of OVMF and
436related information to HV. HV sets the guest RIP of the User VM virtual BSP as
437the start of OVMF and related guest registers, and launches the User VM virtual
438BSP. The OVMF starts running in the virtual real mode within the User VM.
439Conceptually, OVMF is part of the User VM runtime.
440
441Freedom From Interference
442*************************
443
444The hypervisor is critical for preventing inter-VM interference, using
445the following mechanisms:
446
447-  Each physical CPU is dedicated to one vCPU.
448
449   CPU sharing is in the TODO list, but talking about inter-VM interference,
450   sharing a physical CPU among multiple vCPUs gives rise to multiple
451   sources of interference such as the vCPU of one VM flushing the
452   L1 & L2 cache for another, or tremendous interrupts for one VM
453   delaying the execution of another. It also requires vCPU
454   scheduling in the hypervisor to consider more complexities such as
455   scheduling latency and vCPU priority, exposing more opportunities
456   for one VM to interfere with another.
457
458   To prevent such interference, ACRN hypervisor could adopt static
459   core partitioning by dedicating each physical CPU to one vCPU. The
460   physical CPU loops in idle when the vCPU is paused by I/O
461   emulation. This makes the vCPU scheduling deterministic and physical
462   resource sharing is minimized.
463
464-  Hardware mechanisms including EPT, VT-d, SMAP and SMEP are leveraged
465   to prevent unintended memory accesses.
466
467   Memory corruption can be a common failure mode. ACRN hypervisor properly
468   sets up the memory-related hardware mechanisms to ensure that:
469
470   1. The Service VM cannot access the memory of the hypervisor, unless explicitly
471      allowed.
472
473   2. The User VM cannot access the memory of the Service VM and the hypervisor.
474
475   3. The hypervisor does not unintendedly access the memory of the Service or User VM.
476
477-  The destination of external interrupts is set to be the physical core
478   where the VM that handles them is running.
479
480   External interrupts are always handled by the hypervisor in ACRN.
481   Excessive interrupts to one VM (say VM A) could slow down another
482   VM (VM B) if they are handled by the physical core running VM B
483   instead of VM A. Two mechanisms are designed to mitigate such
484   interference.
485
486   1. The destination of an external interrupt is set to the physical core
487      that runs the vCPU where virtual interrupts will be injected.
488
489   2. The hypervisor maintains statistics on the total number of received
490      interrupts to the Service VM via a hypercall, and has a delay mechanism to
491      temporarily block certain virtual interrupts from being injected.
492      This allows the Service VM to detect the occurrence of an interrupt storm and
493      control the interrupt injection rate when necessary.
494
495Boot Flow
496*********
497
498.. figure:: images/over-image85.png
499   :align: center
500
501.. figure:: images/over-image134.png
502   :align: center
503
504
505   ACRN Boot Flow
506
507Power Management
508****************
509
510CPU P-State & C-State
511=====================
512
513In ACRN, CPU P-state and C-state (Px/Cx) are controlled by the guest OS.
514The corresponding governors are managed in the Service VM or User VM for
515best power efficiency and simplicity.
516
517Guests should be able to process the ACPI P-state and C-state requests from
518OSPM. The needed ACPI objects for P-state and C-state management should be ready
519in an ACPI table.
520
521The hypervisor can restrict a guest's P-state and C-state requests (per customer
522requirement). MSR accesses of P-state requests could be intercepted by
523the hypervisor and forwarded to the host directly if the requested
524P-state is valid. Guest MWAIT or port I/O accesses of C-state control could
525be passed through to host with no hypervisor interception to minimize
526performance impacts.
527
528This diagram shows CPU P-state and C-state management blocks:
529
530.. figure:: images/over-image4.png
531   :align: center
532
533
534   CPU P-State and C-State Management Block Diagram
535
536System Power State
537==================
538
539ACRN supports ACPI standard defined power states: S3 and S5 in system
540level. For each guest, ACRN assumes the guest implements OSPM and controls its
541own power state accordingly. ACRN doesn't involve guest OSPM. Instead,
542it traps the power state transition request from the guest and emulates it.
543
544.. figure:: images/over-image21.png
545   :align: center
546   :name: overview-pm-block
547
548   ACRN Power Management Diagram Block
549
550:numref:`overview-pm-block` shows the basic diagram block for ACRN PM.
551The OSPM in each guest manages the guest power state transition. The
552Device Model running in the Service VM traps and emulates the power state
553transition of the User VM (Linux VM or Android VM in
554:numref:`overview-pm-block`). VM Manager knows all User VM power states and
555notifies the OSPM of the Service VM once
556the User VM is in the required power state.
557
558Then the OSPM of the Service VM starts the power state transition of the Service VM
559trapped to "Sx Agency" in ACRN, and it starts the power state
560transition.
561
562Some details about the ACPI table for the User VM and Service VM:
563
564-  The ACPI table in the User VM is emulated by the Device Model. The Device Model
565   knows which register the User VM writes to trigger power state
566   transitions. The Device Model must register an I/O handler for it.
567
568-  The ACPI table in the Service VM is passthrough. There is no ACPI parser
569   in ACRN HV. The power management related ACPI table is
570   generated offline and hard-coded in ACRN HV.
571