1.. _partition-mode-hld:
2
3Partition Mode
4##############
5
6ACRN is a type 1 hypervisor that supports running multiple guest operating
7systems (OS). Typically, the platform BIOS/bootloader boots ACRN, and
8ACRN loads single or multiple guest OSes. Refer to :ref:`hv-startup` for
9details on the start-up flow of the ACRN hypervisor.
10
11ACRN supports two modes of operation: sharing mode and partition mode.
12This document describes ACRN's high-level design for partition mode
13support.
14
15.. contents::
16   :depth: 2
17   :local:
18
19Introduction
20************
21
22In partition mode, ACRN provides guests with exclusive access to cores,
23memory, cache, and peripheral devices. Partition mode enables developers
24to dedicate resources exclusively among the guests. However, there is no
25support today in x86 hardware or in ACRN to partition resources such as
26peripheral buses (e.g., PCI). On x86 platforms that support Cache
27Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA), developers
28can partition Level 2 (L2) cache, Last Level Cache (LLC), and memory bandwidth
29among the guests. Refer to
30:ref:`hv_rdt` for more details on ACRN RDT high-level design and
31:ref:`rdt_configuration` for RDT configuration.
32
33
34ACRN expects static partitioning of resources either by code
35modification for guest configuration or through compile-time config
36options. All the devices exposed to the guests are either physical
37resources or are emulated in the hypervisor. There is no need for a
38Device Model and Service VM. :numref:`pmode2vms` shows a partition mode
39example of two VMs with exclusive access to physical resources.
40
41.. figure:: images/partition-image3.png
42   :align: center
43   :name: pmode2vms
44
45   Partition Mode Example with Two VMs
46
47Guest Info
48**********
49
50ACRN uses multi-boot info passed from the platform bootloader to know
51the location of each guest kernel in memory. ACRN creates a copy of each
52guest kernel into each of the guests' memory. Current implementation of
53ACRN requires developers to specify kernel parameters for the guests as
54part of the guest configuration. ACRN picks up kernel parameters from the guest
55configuration and copies them to the corresponding guest memory.
56
57.. figure:: images/partition-image18.png
58   :align: center
59
60   Guest Info
61
62ACRN Setup for Guests
63*********************
64
65Cores
66=====
67
68ACRN requires the developer to specify the number of guests and the
69cores dedicated for each guest. Also, the developer needs to specify
70the physical core used as the bootstrap processor (BSP) for each guest. As
71the processors are brought to life in the hypervisor, it checks if they are
72configured as BSP for any of the guests. If a processor is the BSP of any of
73the guests, ACRN proceeds to build the memory mapping for the guest,
74mptable, E820 entries, and zero page for the guest. As described in
75`Guest info`_, ACRN creates copies of guest kernel and kernel
76parameters into guest memory. :numref:`partBSPsetup` explains these
77events in chronological order.
78
79.. figure:: images/partition-image7.png
80   :align: center
81   :name: partBSPsetup
82
83   Event Order for Processor Setup
84
85Memory
86======
87
88For each guest in partition mode, the ACRN developer specifies the size of
89memory for the guest and the starting address in the host physical
90address in the guest configuration. There is no support for HIGHMEM for
91partition mode guests. The developer needs to take care of two aspects
92for assigning host memory to the guests:
93
941) Sum of guest PCI hole and guest "System RAM" is less than 4GB.
95
962) Pick the starting address in the host physical address and the
97   size so that it does not overlap with any reserved regions in
98   host E820.
99
100ACRN creates EPT mapping for the guest between GPA (0, memory size) and
101HPA (starting address in guest configuration, memory size).
102
103E820 and Zero Page Info
104=======================
105
106A default E820 is used for all the guests in partition mode. This table
107shows the reference E820 layout. Zero page is created with this
108E820 info for all the guests.
109
110+------------------------+
111| RAM                    |
112|                        |
113| 0 - 0xEFFFFH           |
114+------------------------+
115| RESERVED (MPTABLE)     |
116|                        |
117| 0xF0000H - 0x100000H   |
118+------------------------+
119| RAM                    |
120|                        |
121| 0x100000H - LOWMEM     |
122+------------------------+
123| RESERVED               |
124+------------------------+
125| PCI HOLE               |
126+------------------------+
127| RESERVED               |
128+------------------------+
129
130Platform Info - mptable
131=======================
132
133ACRN, in partition mode, uses mptable to convey platform info to each
134guest. Using this platform information, number of cores used for each
135guest, and whether the guest needs devices with INTX, ACRN builds
136mptable and copies it to the guest memory. In partition mode, ACRN uses
137physical APIC IDs to pass to the guests.
138
139I/O - Virtual Devices
140=====================
141
142Port I/O is supported for PCI device config space 0xcfc and 0xcf8, vUART
1430x3f8, vRTC 0x70 and 0x71, and vPIC ranges 0x20/21, 0xa0/a1, and
1440x4d0/4d1. MMIO is supported for vIOAPIC. ACRN exposes a virtual
145host-bridge at BDF (Bus Device Function) 0.0:0 to each guest. Access to
146256 bytes of config space for virtual host bridge is emulated.
147
148I/O - Passthrough Devices
149=========================
150
151ACRN, in partition mode, supports passing through PCI devices on the
152platform. All the passthrough devices are exposed as child devices under
153the virtual host bridge. ACRN does not support either passing through
154bridges or emulating virtual bridges. Passthrough devices should be
155statically allocated to each guest using the guest configuration. ACRN
156expects the developer to provide the virtual BDF to BDF of the
157physical device mapping for all the passthrough devices as part of each guest
158configuration.
159
160Runtime ACRN Support for Guests
161*******************************
162
163ACRN, in partition mode, supports an option to pass through LAPIC of the
164physical CPUs to the guest. ACRN expects developers to specify if the
165guest needs LAPIC passthrough using guest configuration. When the guest
166configures vLAPIC as x2APIC, and if the guest configuration has LAPIC
167passthrough enabled, ACRN passes the LAPIC to the guest. The guest can access
168the LAPIC hardware directly without hypervisor interception. During
169runtime of the guest, this option differentiates how ACRN supports
170inter-processor interrupt handling and device interrupt handling. This
171will be discussed in detail in the corresponding sections.
172
173.. figure:: images/partition-image16.png
174   :align: center
175
176   LAPIC Passthrough
177
178Guest SMP Boot Flow
179===================
180
181The core APIC IDs are reported to the guest using mptable info. SMP boot
182flow is similar to sharing mode. Refer to :ref:`vm-startup`
183for guest SMP boot flow in ACRN. Partition mode guests startup is the same as
184the Service VM startup in sharing mode.
185
186Inter-Processor Interrupt (IPI) Handling
187========================================
188
189Guests Without LAPIC Passthrough
190--------------------------------
191
192For guests without LAPIC passthrough, IPIs between guest CPUs are handled in
193the same way as sharing mode in ACRN. Refer to :ref:`virtual-interrupt-hld`
194for more details.
195
196Guests With LAPIC Passthrough
197-----------------------------
198
199ACRN supports passthrough if and only if the guest is using x2APIC mode
200for the vLAPIC. In LAPIC passthrough mode, writes to the Interrupt Command
201Register (ICR) x2APIC MSR are intercepted. The guest writes the IPI info,
202including vector, and destination APIC IDs to the ICR. Upon an IPI request
203from the guest, ACRN does a sanity check on the destination processors
204programmed into the ICR. If the destination is a valid target for the guest,
205ACRN sends an IPI with the same vector from the ICR to the physical CPUs
206corresponding to the destination processor info in the ICR.
207
208.. figure:: images/partition-image14.png
209   :align: center
210
211   IPI Handling for Guests With LAPIC Passthrough
212
213Passthrough Device Support
214==========================
215
216Configuration Space Access
217--------------------------
218
219ACRN emulates Configuration Space Address (0xcf8) I/O port and
220Configuration Space Data (0xcfc) I/O port for guests to access PCI
221devices configuration space. Within the config space of a device, Base
222Address registers (BAR), offsets starting from 0x10H to 0x24H, provide
223the information about the resources (I/O and MMIO) used by the PCI
224device. ACRN virtualizes the BAR registers and for the rest of the
225config space, forwards reads and writes to the physical config space of
226passthrough devices.  Refer to the `I/O`_ section below for more details.
227
228.. figure:: images/partition-image1.png
229   :align: center
230
231   Configuration Space Access
232
233DMA
234---
235
236ACRN developers need to statically define the passthrough devices for each
237guest using the guest configuration. For devices to DMA to/from guest
238memory directly, ACRN parses the list of passthrough devices for each
239guest and creates context entries in the VT-d remapping hardware. EPT
240page tables created for the guest are used for VT-d page tables.
241
242I/O
243---
244
245ACRN supports I/O for passthrough devices with two restrictions.
246
2471) Supports only MMIO. Thus, this requires developers to expose I/O BARs as
248   not present in the guest configuration.
249
2502) Supports only 32-bit MMIO BAR type.
251
252As the guest PCI sub-system scans the PCI bus and assigns a Guest Physical
253Address (GPA) to the MMIO BAR, ACRN maps the GPA to the address in the
254physical BAR of the passthrough device using EPT. The following timeline chart
255explains how PCI devices are assigned to the guest and how BARs are mapped upon
256guest initialization.
257
258.. figure:: images/partition-image13.png
259   :align: center
260
261   I/O for Passthrough Devices
262
263Interrupt Configuration
264-----------------------
265
266ACRN supports both legacy (INTx) and MSI interrupts for passthrough
267devices.
268
269INTx Support
270~~~~~~~~~~~~
271
272ACRN expects developers to identify the interrupt line info (0x3CH) from
273the physical BAR of the passthrough device and build an interrupt entry in
274the mptable for the corresponding guest. As the guest configures the vIOAPIC
275for the interrupt RTE, ACRN writes the info from the guest RTE into the
276physical IOAPIC RTE. Upon the guest kernel request to mask the interrupt,
277ACRN writes to the physical RTE to mask the interrupt at the physical
278IOAPIC. When the guest masks the RTE in vIOAPIC, ACRN masks the interrupt
279RTE in the physical IOAPIC. Level triggered interrupts are not
280supported.
281
282MSI Support
283~~~~~~~~~~~
284
285The guest reads/writes to the PCI configuration space to configure MSI
286interrupts using an address. Data and control registers are passed through to
287the physical BAR of the passthrough device. Refer to `Configuration
288Space Access`_ for details on how the PCI configuration space is emulated.
289
290Virtual Device Support
291======================
292
293ACRN provides read-only vRTC support for partition mode guests. Writes
294to the data port are discarded.
295
296For port I/O to ports other than vPIC, vRTC, or vUART, reads return 0xFF and
297writes are discarded.
298
299Interrupt Delivery
300==================
301
302Guests Without LAPIC Passthrough
303--------------------------------
304
305In ACRN partition mode, interrupts stay disabled after a vmexit.  The
306processor does not take interrupts when it is executing in VMX root
307mode. ACRN configures the processor to take vmexit upon external
308interrupt if the processor is executing in VMX non-root mode. Upon an
309external interrupt, after sending EOI to the physical LAPIC, ACRN
310injects the vector into the vLAPIC of the vCPU running on the
311processor. Guests using a Linux kernel use vectors less than 0xECh
312for device interrupts.
313
314.. figure:: images/partition-image20.png
315   :align: center
316
317   Interrupt Delivery for Guests Without LAPIC Passthrough
318
319Guests With LAPIC Passthrough
320-----------------------------
321
322For guests with LAPIC passthrough, ACRN does not configure vmexit upon
323external interrupts. There is no vmexit upon device interrupts and they are
324handled by the guest IDT.
325
326Hypervisor IPI Service
327======================
328
329ACRN needs IPIs for events such as flushing TLBs across CPUs, sending virtual
330device interrupts (e.g., vUART to vCPUs), and others.
331
332Guests Without LAPIC Passthrough
333--------------------------------
334
335Hypervisor IPIs work the same way as in sharing mode.
336
337Guests With LAPIC Passthrough
338-----------------------------
339
340Since external interrupts are passed through to the guest IDT, IPIs do not
341trigger vmexit. ACRN uses NMI delivery mode and the NMI exiting is
342chosen for vCPUs. At the time of NMI interrupt on the target processor,
343if the processor is in non-root mode, vmexit happens on the processor
344and the event mask is checked for servicing the events.
345
346Debug Console
347=============
348
349For details on how the hypervisor console works, refer to
350:ref:`hv-console`.
351
352For a guest console in partition mode, ACRN provides an option to pass
353``vmid`` as an argument to ``vm_console``. vmid is the same as the one
354developers use in the guest configuration.
355
356Guests Without LAPIC Passthrough
357--------------------------------
358
359Works the same way as sharing mode.
360
361Hypervisor Console
362==================
363
364ACRN uses the TSC deadline timer to provide a timer service. The hypervisor
365console uses a timer on CPU0 to poll characters on the serial device. To
366support LAPIC passthrough, the TSC deadline MSR is passed through and the local
367timer interrupt is also delivered to the guest IDT. Instead of the TSC
368deadline timer, ACRN uses the VMX preemption timer to poll the serial device.
369
370Guest Console
371=============
372
373ACRN exposes vUART to partition mode guests. vUART uses vPIC to inject an
374interrupt to the guest BSP. If the guest has more than one core,
375during runtime, vUART might need to inject an interrupt to the guest BSP from
376another core (other than BSP). As mentioned in section `Hypervisor IPI
377Service`_, ACRN uses NMI delivery mode for notifying the CPU running the BSP
378of the guest.
379