1.. _hv-startup:
2
3Hypervisor Startup
4##################
5
6This section is an overview of the ACRN hypervisor startup.
7The ACRN hypervisor
8compiles to a 32-bit multiboot-compliant ELF file.
9The bootloader (ABL/SBL or GRUB) loads the hypervisor according to the
10addresses specified in the ELF header. The bootstrap processor (BSP) starts the
11hypervisor
12with an initial state compliant to the multiboot 1 specification, after the
13bootloader prepares full configurations including ACPI, E820, etc.
14
15The HV startup has two parts: the native startup followed by
16VM startup.
17
18Multiboot Header
19****************
20
21The ACRN hypervisor is built with a multiboot header, which presents
22``MULTIBOOT_HEADER_MAGIC`` and ``MULTIBOOT_HEADER_FLAGS`` at the beginning
23of the image. It sets bit 6 in ``MULTIBOOT_HEADER_FLAGS``, which requests the
24bootloader pass memory map information (such as E820 entries) through the
25Multiboot Information (MBI) structure.
26
27Native Startup
28**************
29
30.. figure:: images/hld-image107.png
31   :align: center
32   :name: hvstart-nativeflow
33
34   Hypervisor Native Startup Flow
35
36Native startup sets up a baseline environment for HV, including basic
37memory and interrupt initialization as shown in
38:numref:`hvstart-nativeflow`. Here is a short
39description for the flow:
40
41-  **BSP Startup:** The starting point for the bootstrap processor.
42
43-  **Relocation**: Relocate the hypervisor image if the hypervisor image
44   is not placed at the assumed base address.
45
46-  **UART Init:** Initialize a pre-configured UART device used
47   as the base physical console for HV and Service VM.
48
49-  **Memory Init:** Initialize memory type and cache policy, and create
50   MMU page table mapping for HV.
51
52-  **Scheduler Init:** Initialize the scheduler framework, which provides the
53   capability to switch different threads (such as vcpu vs. idle thread) on a
54   physical CPU, and to support CPU sharing.
55
56-  **Interrupt Init:** Initialize interrupts and exceptions for native HV
57   including IDT and ``do_IRQ`` infrastructure; a timer interrupt
58   framework is then built. The native/physical interrupts will go
59   through this ``do_IRQ`` infrastructure then distribute to special
60   targets (HV or VMs).
61
62-  **Start AP:** BSP triggers the ``INIT-SIPI-SIPI`` IPI sequence to start other
63   native APs (application processor). Each AP initializes its
64   own memory and interrupts, notifies the BSP on completion, and
65   enters the default idle loop.
66
67-  **Shell Init:** Start a command shell for HV accessible via the UART.
68
69Symbols in the hypervisor are placed with an assumed base address, but
70the bootloader may not place the hypervisor at that specified base. In
71this case, the hypervisor will relocate itself to where the bootloader
72loads it.
73
74Here is a summary of CPU and memory initial states that are set up after
75the native startup.
76
77CPU
78   ACRN hypervisor brings all physical processors to 64-bit IA32e
79   mode, with the assumption that the BSP starts in protection mode where
80   segmentation and paging sets an identical mapping of the first 4G
81   addresses without permission restrictions. The control registers and
82   some MSRs are set as follows:
83
84   -  ``cr0``: The following features are enabled: paging, write protection,
85      protection mode, numeric error and co-processor monitoring.
86
87   -  ``cr3``: Refer to the initial state of memory.
88
89   -  ``cr4``: The following features are enabled: physical address extension,
90      machine-check, FXSAVE/FXRSTOR, SMEP, VMX operation and unmask
91      SIMD FP exception. The other features are disabled.
92
93   -  ``MSR_IA32_EFER``: Only IA32e mode is enabled.
94
95   -  ``MSR_IA32_FS_BASE``: The address of stack canary, used for detecting
96      stack smashing.
97
98   -  ``MSR_IA32_TSC_AUX``: A unique logical ID is set for each physical
99      processor.
100
101   -  ``stack``: Each physical processor has a separate stack.
102
103Memory
104   All physical processors are in 64-bit IA32e mode after
105   startup. The GDT holds four entries, one unused, one for code and
106   another for data, both of which have a base of all 0's and a limit of
107   all 1's, and the other for 64-bit TSS. The TSS only holds three stack
108   pointers (for machine-check, double fault and stack fault) in the
109   interrupt stack table (IST) which are different across physical
110   processors. LDT is disabled.
111
112Refer to :ref:`physical-interrupt-initialization` for a detailed description of
113interrupt-related initial states, including IDT and physical PICs.
114
115After the BSP detects that all APs are up, it continues to enter guest mode.
116Likewise, after one AP completes its initialization, it starts entering guest
117mode as well. When the BSP and APs enter guest mode, they try to launch
118predefined VMs whose vBSP is associated with this physical core. These
119predefined VMs are configured in ``vm config`` and may be a
120pre-launched Safety VM or Service VM.
121
122.. _vm-startup:
123
124VM Startup
125**********
126
127The Service VM or a pre-launched VM is created and launched on the physical
128CPU that is configured as its vBSP. Meanwhile, for the physical CPUs that
129are configured as vAPs for dedicated VMs, they enter the default idle loop
130(refer to :ref:`VCPU_lifecycle` for details), waiting for any vCPU to be
131scheduled to them.
132
133:numref:`hvstart-vmflow` illustrates a high-level execution flow of creating and
134launching a VM, applicable to pre-launched User VMs, Service VM, and
135post-launched User VMs. One major difference in the creation of post-launched
136User VMs vs. pre-launched User VMs or Service VM is that the pre-launched User
137VMs and Service VM are created by the hypervisor, while post-launched User VMs
138are created by the Device Model (DM) in the Service VM. The main steps include:
139
140-  **Create VM**: A VM structure is allocated and initialized. A unique
141   VM ID is picked, EPT is initialized, E820 table for this VM is prepared,
142   I/O bitmap is set up, virtual PIC/IOAPIC/PCI/UART is initialized, EPC for
143   virtual SGX is prepared, guest PM IO is set up, IOMMU for PT dev support
144   is enabled, virtual CPUID entries are filled, and vCPUs configured in this VM's
145   ``vm config`` are prepared. For a post-launched User VM, the EPT page table
146   and E820 table are prepared by the DM instead of the hypervisor.
147
148-  **Prepare vCPUs:** Create the vCPUs, assign the physical processor that the
149   vCPU is pinned to (a unique-per-VM vCPU ID and a globally unique VPID),
150   initialize its virtual LAPIC and MTRR, and set up its vCPU thread object for
151   vCPU scheduling. The vCPU number and affinity are defined in the
152   corresponding ``vm config`` for this VM.
153
154-  **Build vACPI:** For the Service VM, the hypervisor customizes a virtual ACPI
155   table based on the native ACPI table (this is in the TODO). For a
156   pre-launched User VM, the hypervisor builds a simple ACPI table with
157   necessary information such as MADT. For a post-launched User VM, the DM
158   builds its ACPI table dynamically.
159
160-  **Software Load:** Prepare for each VM's software configuration according to
161   guest OS requirements, which may include kernel entry address, ramdisk
162   address, bootargs, or zero page for launching bzImage. This is done by the
163   hypervisor for pre-launched User VMs or Service VM. The VM will start from
164   the standard real mode or protected mode, which is not related to the native
165   environment. For post-launched User VMs, the VM's software configuration is
166   done by DM.
167
168-  **Start VM:** The vBSP of vCPUs in this VM is triggered to start scheduling.
169
170-  **Schedule vCPUs:** The vCPUs are scheduled to the corresponding
171   physical processors for execution.
172
173-  **Init VMCS:** Initialize vCPU's VMCS for its host state, guest
174   state, execution control, entry control, and exit control. It's
175   the last configuration before vCPU runs.
176
177-  **vCPU thread:** vCPU starts to run. For the vBSP of vCPUs, it will
178   start running the configured kernel image. For any vAP of vCPUs, it will wait
179   for the ``INIT-SIPI-SIPI`` IPI sequence trigger from its vBSP.
180
181.. figure:: images/hld-image104.png
182   :align: center
183   :name: hvstart-vmflow
184
185   Hypervisor VM Startup Flow
186
187Software configuration for Service VM (bzimage software load as example):
188
189-  **ACPI**: HV passes the entire ACPI table from the bootloader to the Service
190   VM directly. Legacy mode is supported as the ACPI table
191   is loaded at F-Segment.
192
193-  **E820**: HV passes the E820 table from the bootloader through the zero page
194   after the HV reserved memory (32M, for example) and pre-launched User VM
195   owned memory are filtered out.
196
197-  **Zero Page**: HV prepares the zero page at the high end of Service
198   VM memory, which is determined by the Service VM guest FIT binary build. The
199   zero page includes the configuration for ramdisk, bootargs, and E820
200   entries. The zero page address will be set to the vBSP RSI register
201   before the vCPU runs.
202
203-  **Entry address**: HV copies the Service VM OS kernel image to
204   ``kernel_load_addr``, which it can get from the ``pref_addr`` field in the
205   bzimage header. The entry address will be calculated based on
206   ``kernel_load_addr``, and will be set to the vBSP RIP register before the
207   vCPU runs.
208
209Software configuration for post-launched User VMs (OVMF software load as
210example):
211
212-  **ACPI**: the DM builds the virtual ACPI table and puts it at the User VM's
213   F-Segment. Refer to :ref:`hld-io-emulation` for details.
214
215-  **E820**: the DM builds the virtual E820 table and passes it to
216   the virtual bootloader. Refer to :ref:`hld-io-emulation` for details.
217
218-  **Entry address**: the DM copies the User VM OS kernel (OVMF) image to
219   ``OVMF_NVSTORAGE_OFFSET`` - normally is @(4G - 2M), and sets the entry
220   address to 0xFFFFFFF0. As the vBSP will trigger the virtual bootloader
221   (OVMF) to run from real mode, its CS base will be set to 0xFFFF0000, and
222   RIP register will be set to 0xFFF0.
223
224Software configuration for pre-launched User VMs (raw software load as example):
225
226-  **ACPI**: the hypervisor builds the virtual ACPI table and puts it at
227   this VM's F-Segment.
228
229-  **E820**: the hypervisor builds the virtual E820 table and passes it to
230   the VM according to different software loaders. For a raw software load, it's
231   not used.
232
233-  **Entry address**: the hypervisor copies the User VM OS kernel image to
234   ``kernel_load_addr`` which is set by ``vm config``, and sets the entry
235   address to ``kernel_entry_addr`` which is set by ``vm config`` as well.
236
237Here is the initial mode of vCPUs:
238
239
240+----------------------------------+----------------------------------------------------------+
241|  VM and Processor Type           |    Initial Mode                                          |
242+=======================+==========+==========================================================+
243| Service VM            |   BSP    |   Same as physical BSP, or Real Mode if                  |
244|                       |          |   Service VM boots with OVMF                             |
245|                       +----------+----------------------------------------------------------+
246|                       |     AP   |   Real Mode                                              |
247+-----------------------+----------+----------------------------------------------------------+
248| Post-launched User VM |    BSP   |   Real Mode                                              |
249|                       +----------+----------------------------------------------------------+
250|                       |    AP    |   Real Mode                                              |
251+-----------------------+----------+----------------------------------------------------------+
252| Pre-launched User VM  |    BSP   |   Real Mode or Protected Mode                            |
253|                       +----------+----------------------------------------------------------+
254|                       |     AP   |   Real Mode                                              |
255+-----------------------+----------+----------------------------------------------------------+
256
257