1.. _interrupt-hld:
2
3Physical Interrupt High-Level Design
4####################################
5
6Overview
7********
8
9The ACRN hypervisor implements a simple but fully functional framework
10to manage interrupts and exceptions, as shown in
11:numref:`interrupt-modules-overview`. In its native layer, it configures
12the physical PIC, IOAPIC, and LAPIC to support different interrupt
13sources from the local timer/IPI to the external INTx/MSI. In its virtual guest
14layer, it emulates virtual PIC, virtual IOAPIC, and virtual LAPIC/passthrough
15LAPIC. It provides full APIs, allowing virtual interrupt injection from
16emulated or passthrough devices. The contents in this section do not include
17the passthrough LAPIC case. For the passthrough LAPIC, refer to
18:ref:`lapic_passthru`
19
20.. figure:: images/interrupt-image3.png
21   :align: center
22   :width: 600px
23   :name: interrupt-modules-overview
24
25   ACRN Interrupt Modules Overview
26
27In the software modules view shown in :numref:`interrupt-sw-modules`,
28the ACRN hypervisor sets up the physical interrupt in its basic
29interrupt modules (e.g., IOAPIC/LAPIC/IDT). It dispatches the interrupt
30in the hypervisor interrupt flow control layer to the corresponding
31handlers; this could be predefined IPI notification, timer, or runtime
32registered passthrough devices. The ACRN hypervisor then uses its VM
33interfaces based on vPIC, vIOAPIC, and vMSI modules, to inject the
34necessary virtual interrupt into the specific VM, or directly deliver
35interrupt to the specific RT VM with passthrough LAPIC.
36
37.. figure:: images/interrupt-image2.png
38   :align: center
39   :width: 600px
40   :name: interrupt-sw-modules
41
42   ACRN Interrupt Software Modules Overview
43
44
45The hypervisor implements the following functionalities for handling
46physical interrupts:
47
48-  Configure interrupt-related hardware including IDT, PIC, LAPIC, and
49   IOAPIC on startup.
50
51-  Provide APIs to manipulate the registers of LAPIC and IOAPIC.
52
53-  Acknowledge physical interrupts.
54
55-  Set up a callback mechanism for the other components in the
56   hypervisor to request for an interrupt vector and register a
57   handler for that interrupt.
58
59HV owns all native physical interrupts and manages 256 vectors per CPU.
60All physical interrupts are first handled in VMX root-mode.  The
61"external-interrupt exiting" bit in VM-Execution controls field is set
62to support this. The ACRN hypervisor also initializes all the interrupt
63related modules like IDT, PIC, IOAPIC, and LAPIC.
64
65HV does not own any host devices (except UART). All devices are by
66default assigned to the Service VM. Any interrupts received by VM
67(Service VM or User VM) device drivers are virtual interrupts injected
68by HV (via vLAPIC).
69HV manages a Host-to-Guest mapping. When a native IRQ/interrupt occurs,
70HV decides whether this IRQ/interrupt should be forwarded to a VM and
71which VM to forward to (if any). Refer to
72:ref:`virt-interrupt-injection` and :ref:`interrupt-remapping` for
73more information.
74
75HV does not own any exceptions. Guest VMCS are configured so no VM Exit
76happens, with some exceptions such as #INT3 and #MC.  This is to
77simplify the design as HV does not support any exception handling
78itself. HV supports only static memory mapping, so there should be no
79#PF or #GP. If HV receives an exception indicating an error, an assert
80function is then executed with an error message printout, and the
81system then halts.
82
83Native interrupts can be generated from one of the following
84sources:
85
86-  GSI interrupts
87
88   -  PIC or Legacy devices IRQ (0~15)
89   -  IOAPIC pin
90
91-  PCI MSI/MSI-X vectors
92-  Inter CPU IPI
93-  LAPIC timer
94
95.. _physical-interrupt-initialization:
96
97Physical Interrupt Initialization
98*********************************
99
100After ACRN hypervisor gets control from the bootloader, it
101initializes all physical interrupt-related modules for all the CPUs. ACRN
102hypervisor creates a framework to manage the physical interrupt for
103hypervisor local devices, passthrough devices, and IPI between CPUs, as
104shown in :numref:`hv-interrupt-init`:
105
106.. figure:: images/interrupt-image66.png
107   :align: center
108   :name: hv-interrupt-init
109
110   Physical Interrupt Initialization
111
112IDT Initialization
113==================
114
115ACRN hypervisor builds its native IDT (interrupt descriptor table)
116during interrupt initialization and sets up the following handlers:
117
118-  On an exception, the hypervisor dumps its context and halts the current
119   physical processor (because physical exceptions are not expected).
120
121-  For external interrupts, HV may mask the interrupt (depending on the
122   trigger mode), followed by interrupt acknowledgement and dispatch
123   to the registered handler, if any.
124
125Most interrupts and exceptions are handled without a stack switch,
126except for machine-check, double fault, and stack fault exceptions which
127have their own stack set in TSS.
128
129PIC/IOAPIC Initialization
130=========================
131
132ACRN hypervisor masks all interrupts from the PIC. All legacy interrupts
133from PIC (<16) will be linked to IOAPIC, as shown in the connections in
134:numref:`hv-pic-config`.
135
136ACRN will pre-allocate vectors and set them for these legacy interrupts
137in IOAPIC RTEs. For others (>= 16), ACRN will set them with vector 0 in
138RTEs, and valid vectors will be dynamically allocated on demand.
139
140All external IOAPIC pins are categorized as GSI interrupt according to
141ACPI definition. HV supports multiple IOAPIC components. IRQ PIN to GSI
142mappings are maintained internally to determine GSI source IOAPIC.
143Native PIC is not used in the system.
144
145.. figure:: images/interrupt-image46.png
146   :align: center
147   :name: hv-pic-config
148
149   Hypervisor PIC/IOAPIC/LAPIC Configuration
150
151LAPIC Initialization
152====================
153
154Physical LAPICs are in x2APIC mode in ACRN hypervisor. The hypervisor
155initializes LAPIC for each physical CPU by masking all interrupts in the
156local vector table (LVT), clearing all ISRs, and enabling LAPIC.
157
158APIs are provided to access LAPIC for the other components in the
159hypervisor, aiming for further usage of local timer (TSC Deadline)
160program, IPI notification program, etc. See :ref:`hv_interrupt-data-api`
161for a complete list.
162
163HV Interrupt Vectors and Delivery Mode
164======================================
165
166The interrupt vectors are assigned as shown here:
167
168**Vector 0-0x1F**
169   are exceptions that are not handled by HV. If
170   such an exception does occur, the system then halts.
171
172**Vector: 0x20-0x2F**
173   are allocated statically for legacy IRQ0-15.
174
175**Vector: 0x30-0xDF**
176   are dynamically allocated vectors for PCI devices
177   INTx or MSI/MIS-X usage. According to different interrupt delivery mode
178   (FLAT or PER_CPU mode), an interrupt will be assigned to a vector for
179   all the CPUs or a particular CPU.
180
181**Vector: 0xE0-0xFE**
182   are high priority vectors reserved by HV for
183   dedicated purposes. For example, 0xEF is used for timer, 0xF0 is used
184   for IPI.
185
186.. list-table::
187   :widths: 30 70
188   :header-rows: 1
189
190   * - Vectors
191     - Usage
192
193   * - 0x0-0x14
194     - Exceptions: NMI, INT3, page fault, GP, debug.
195
196   * - 0x15-0x1F
197     - Reserved
198
199   * - 0x20-0x2F
200     - Statically allocated for external IRQ (IRQ0-IRQ15)
201
202   * - 0x30-0xDF
203     - Dynamically allocated for IOAPIC IRQ from PCI INTx/MSI
204
205   * - 0xE0-0xFE
206     - Static allocated for HV
207
208   * - 0xEF
209     - Timer
210
211   * - 0xF0
212     - IPI
213
214   * - 0xF2
215     - Posted Interrupt
216
217   * - 0xF3
218     - Hypervisor Callback HSM
219
220   * - 0xF4
221     - Performance Monitoring Interrupt
222
223   * - 0xFF
224     - SPURIOUS_APIC_VECTOR
225
226Interrupts from either IOAPIC or MSI can be delivered to a target CPU.
227By default, they are configured as Lowest Priority (FLAT mode), meaning they
228are delivered to a CPU core that is idle or executing the lowest
229priority ISR. There is no guarantee a device's interrupt will be
230delivered to a specific Guest's CPU. Timer interrupts are an exception -
231these are always delivered to the CPU which programs the LAPIC timer.
232
233x86-64 supports per CPU IDTs, but ACRN uses a global shared IDT,
234with which the interrupt/IRQ to vector mapping is the same on all CPUs. Vector
235allocation for CPUs is shown here:
236
237.. figure:: images/interrupt-image89.png
238   :align: center
239
240   FLAT Mode Vector Allocation
241
242IRQ Descriptor Table
243====================
244
245ACRN hypervisor maintains a global IRQ Descriptor Table shared among the
246physical CPUs, so the same vector will link to the same IRQ number for
247all CPUs.
248
249The *irq_desc[]* array's index represents IRQ number. A *handle_irq*
250will be called from *interrupt_dispatch* to commonly handle edge/level
251triggered IRQ and call the registered *action_fn*.
252
253Another reverse mapping from vector to IRQ is used in addition to the
254IRQ descriptor table which maintains the mapping from IRQ to vector.
255
256On initialization, the descriptor of the legacy IRQs are initialized with
257proper vectors and the corresponding reverse mapping is set up.
258The descriptor of other IRQs are filled with an invalid
259vector which will be updated on IRQ allocation.
260
261For example, if local timer registers an interrupt with IRQ number 254 and
262vector 0xEF, then this date will be set up:
263
264.. code-block:: c
265
266   irq_desc[254].irq = 254
267   irq_desc[254].vector = 0xEF
268   vector_to_irq[0xEF] = 254
269
270External Interrupt Handling
271***************************
272
273CPU runs under VMX non-root mode and inside Guest VMs.
274``MSR_IA32_VMX_PINBASED_CTLS.bit[0]`` and
275``MSR_IA32_VMX_EXIT_CTLS.bit[15]`` are set to allow vCPU VM Exit to HV
276whenever there are interrupts to that physical CPU under
277non-root mode. HV ACKs the interrupts in VMX non-root and saves the
278interrupt vector to the relevant VM Exit field for HV IRQ processing.
279
280Note that as discussed above, an external interrupt causing vCPU VM Exit
281to HV does not mean that the interrupt belongs to that Guest VM. When
282CPU executes VM Exit into root-mode, interrupt handling will be enabled
283and the interrupt will be delivered and processed as quickly as possible
284inside HV. HV may emulate a virtual interrupt and inject to Guest if
285necessary.
286
287Interrupt and IRQ processing flow diagrams are shown below:
288
289.. figure:: images/interrupt-image48.png
290   :align: center
291   :name: phy-interrupt-processing
292
293   Processing of Physical Interrupts
294
295When a physical interrupt is raised and delivered to a physical CPU, the
296CPU may be running under either VMX root mode or non-root mode.
297
298- If the CPU is running under VMX root mode, the interrupt is handled
299  following the standard native IRQ flow: interrupt gate to
300  dispatch_interrupt(), IRQ handler, and finally the registered callback.
301- If the CPU is running under VMX non-root mode, an external interrupt
302  calls a VM exit for reason "external-interrupt", and then the VM
303  exit processing flow will call dispatch_interrupt() to dispatch and
304  handle the interrupt.
305
306After an interrupt occurs from either path shown in
307:numref:`phy-interrupt-processing`, ACRN hypervisor will jump to
308dispatch_interrupt. This function gets the vector of the generated
309interrupt from the context, gets IRQ number from vector_to_irq[], and
310then gets the corresponding irq_desc.
311
312Though there is only one generic IRQ handler for registered interrupt,
313there are three different handling flows according to flags:
314
315-  ``!IRQF_LEVEL``
316-  ``IRQF_LEVEL && !IRQF_PT``
317
318   To avoid continuous interrupt triggers, it masks the IOAPIC pin and
319   unmask it only after IRQ action callback is executed
320
321-  ``IRQF_LEVEL && IRQF_PT``
322
323   For passthrough devices, to avoid continuous interrupt triggers, it masks
324   the IOAPIC pin and leaves it unmasked until corresponding vIOAPIC
325   pin gets an explicit EOI ACK from guest.
326
327Since interrupts are not shared for multiple devices, there is only one
328IRQ action registered for each interrupt.
329
330The IRQ number inside HV is a software concept to identify GSI and
331Vectors. Each GSI will be mapped to one IRQ. The GSI number is usually the same
332as the IRQ number. IRQ numbers greater than max GSI (nr_gsi) number are dynamically
333assigned. For example, HV allocates an interrupt vector to a PCI device,
334an IRQ number is then assigned to that vector. When the vector later
335reaches a CPU, the corresponding IRQ action function is located and executed.
336
337See :numref:`request-irq` for request IRQ control flow for different
338conditions:
339
340.. figure:: images/interrupt-image76.png
341   :align: center
342   :name: request-irq
343
344   Request IRQ for Different Conditions
345
346.. _ipi-management:
347
348IPI Management
349**************
350
351The only purpose of IPI use in HV is to kick a vCPU out of non-root mode
352and enter to HV mode. This requires I/O request and virtual interrupt
353injection be distributed to different IPI vectors. The I/O request uses
354IPI vector 0xF3 upcall. The virtual interrupt injection uses IPI vector 0xF0.
355
3560xF3 upcall
357   A Guest vCPU VM Exit exits due to EPT violation or IO instruction trap.
358   It requires Device Module to emulate the MMIO/PortIO instruction.
359   However it could be that the Service VM vCPU0 is still in non-root
360   mode. So an IPI (0xF3 upcall vector) should be sent to the physical CPU0
361   (with non-root mode as vCPU0 inside the Service VM) to force vCPU0 to VM Exit due
362   to the external interrupt. The virtual upcall vector is then injected to
363   the Service VM, and the vCPU0 inside the Service VM then will pick up the IO request and do
364   emulation for other Guest.
365
3660xF0 IPI flow
367   If Device Module inside the Service VM needs to inject an interrupt to other Guest
368   such as vCPU1, it will issue an IPI first to kick CPU1 (assuming CPU1 is
369   running on vCPU1) to root-hv_interrupt-data-apmode. CPU1 will inject the
370   interrupt before VM Enter.
371
372.. _hv_interrupt-data-api:
373
374Data Structures and Interfaces
375******************************
376
377IOAPIC
378======
379
380The following APIs are external interfaces for IOAPIC related
381operations.
382
383.. doxygengroup:: ioapic_ext_apis
384   :project: Project ACRN
385   :content-only:
386
387
388LAPIC
389=====
390
391The following APIs are external interfaces for LAPIC related operations.
392
393.. doxygengroup:: lapic_ext_apis
394   :project: Project ACRN
395   :content-only:
396
397
398IPI
399===
400
401The following APIs are external interfaces for IPI related operations.
402
403.. doxygengroup:: ipi_ext_apis
404   :project: Project ACRN
405   :content-only:
406
407
408Physical Interrupt
409==================
410
411The following APIs are external interfaces for physical interrupt
412related operations.
413
414.. doxygengroup:: phys_int_ext_apis
415   :project: Project ACRN
416   :content-only:
417
418