1.. _hv-startup: 2 3Hypervisor Startup 4################## 5 6This section is an overview of the ACRN hypervisor startup. 7The ACRN hypervisor 8compiles to a 32-bit multiboot-compliant ELF file. 9The bootloader (ABL/SBL or GRUB) loads the hypervisor according to the 10addresses specified in the ELF header. The bootstrap processor (BSP) starts the 11hypervisor 12with an initial state compliant to the multiboot 1 specification, after the 13bootloader prepares full configurations including ACPI, E820, etc. 14 15The HV startup has two parts: the native startup followed by 16VM startup. 17 18Multiboot Header 19**************** 20 21The ACRN hypervisor is built with a multiboot header, which presents 22``MULTIBOOT_HEADER_MAGIC`` and ``MULTIBOOT_HEADER_FLAGS`` at the beginning 23of the image. It sets bit 6 in ``MULTIBOOT_HEADER_FLAGS``, which requests the 24bootloader pass memory map information (such as E820 entries) through the 25Multiboot Information (MBI) structure. 26 27Native Startup 28************** 29 30.. figure:: images/hld-image107.png 31 :align: center 32 :name: hvstart-nativeflow 33 34 Hypervisor Native Startup Flow 35 36Native startup sets up a baseline environment for HV, including basic 37memory and interrupt initialization as shown in 38:numref:`hvstart-nativeflow`. Here is a short 39description for the flow: 40 41- **BSP Startup:** The starting point for the bootstrap processor. 42 43- **Relocation**: Relocate the hypervisor image if the hypervisor image 44 is not placed at the assumed base address. 45 46- **UART Init:** Initialize a pre-configured UART device used 47 as the base physical console for HV and Service VM. 48 49- **Memory Init:** Initialize memory type and cache policy, and create 50 MMU page table mapping for HV. 51 52- **Scheduler Init:** Initialize the scheduler framework, which provides the 53 capability to switch different threads (such as vcpu vs. idle thread) on a 54 physical CPU, and to support CPU sharing. 55 56- **Interrupt Init:** Initialize interrupts and exceptions for native HV 57 including IDT and ``do_IRQ`` infrastructure; a timer interrupt 58 framework is then built. The native/physical interrupts will go 59 through this ``do_IRQ`` infrastructure then distribute to special 60 targets (HV or VMs). 61 62- **Start AP:** BSP triggers the ``INIT-SIPI-SIPI`` IPI sequence to start other 63 native APs (application processor). Each AP initializes its 64 own memory and interrupts, notifies the BSP on completion, and 65 enters the default idle loop. 66 67- **Shell Init:** Start a command shell for HV accessible via the UART. 68 69Symbols in the hypervisor are placed with an assumed base address, but 70the bootloader may not place the hypervisor at that specified base. In 71this case, the hypervisor will relocate itself to where the bootloader 72loads it. 73 74Here is a summary of CPU and memory initial states that are set up after 75the native startup. 76 77CPU 78 ACRN hypervisor brings all physical processors to 64-bit IA32e 79 mode, with the assumption that the BSP starts in protection mode where 80 segmentation and paging sets an identical mapping of the first 4G 81 addresses without permission restrictions. The control registers and 82 some MSRs are set as follows: 83 84 - ``cr0``: The following features are enabled: paging, write protection, 85 protection mode, numeric error and co-processor monitoring. 86 87 - ``cr3``: Refer to the initial state of memory. 88 89 - ``cr4``: The following features are enabled: physical address extension, 90 machine-check, FXSAVE/FXRSTOR, SMEP, VMX operation and unmask 91 SIMD FP exception. The other features are disabled. 92 93 - ``MSR_IA32_EFER``: Only IA32e mode is enabled. 94 95 - ``MSR_IA32_FS_BASE``: The address of stack canary, used for detecting 96 stack smashing. 97 98 - ``MSR_IA32_TSC_AUX``: A unique logical ID is set for each physical 99 processor. 100 101 - ``stack``: Each physical processor has a separate stack. 102 103Memory 104 All physical processors are in 64-bit IA32e mode after 105 startup. The GDT holds four entries, one unused, one for code and 106 another for data, both of which have a base of all 0's and a limit of 107 all 1's, and the other for 64-bit TSS. The TSS only holds three stack 108 pointers (for machine-check, double fault and stack fault) in the 109 interrupt stack table (IST) which are different across physical 110 processors. LDT is disabled. 111 112Refer to :ref:`physical-interrupt-initialization` for a detailed description of 113interrupt-related initial states, including IDT and physical PICs. 114 115After the BSP detects that all APs are up, it continues to enter guest mode. 116Likewise, after one AP completes its initialization, it starts entering guest 117mode as well. When the BSP and APs enter guest mode, they try to launch 118predefined VMs whose vBSP is associated with this physical core. These 119predefined VMs are configured in ``vm config`` and may be a 120pre-launched Safety VM or Service VM. 121 122.. _vm-startup: 123 124VM Startup 125********** 126 127The Service VM or a pre-launched VM is created and launched on the physical 128CPU that is configured as its vBSP. Meanwhile, for the physical CPUs that 129are configured as vAPs for dedicated VMs, they enter the default idle loop 130(refer to :ref:`VCPU_lifecycle` for details), waiting for any vCPU to be 131scheduled to them. 132 133:numref:`hvstart-vmflow` illustrates a high-level execution flow of creating and 134launching a VM, applicable to pre-launched User VMs, Service VM, and 135post-launched User VMs. One major difference in the creation of post-launched 136User VMs vs. pre-launched User VMs or Service VM is that the pre-launched User 137VMs and Service VM are created by the hypervisor, while post-launched User VMs 138are created by the Device Model (DM) in the Service VM. The main steps include: 139 140- **Create VM**: A VM structure is allocated and initialized. A unique 141 VM ID is picked, EPT is initialized, E820 table for this VM is prepared, 142 I/O bitmap is set up, virtual PIC/IOAPIC/PCI/UART is initialized, EPC for 143 virtual SGX is prepared, guest PM IO is set up, IOMMU for PT dev support 144 is enabled, virtual CPUID entries are filled, and vCPUs configured in this VM's 145 ``vm config`` are prepared. For a post-launched User VM, the EPT page table 146 and E820 table are prepared by the DM instead of the hypervisor. 147 148- **Prepare vCPUs:** Create the vCPUs, assign the physical processor that the 149 vCPU is pinned to (a unique-per-VM vCPU ID and a globally unique VPID), 150 initialize its virtual LAPIC and MTRR, and set up its vCPU thread object for 151 vCPU scheduling. The vCPU number and affinity are defined in the 152 corresponding ``vm config`` for this VM. 153 154- **Build vACPI:** For the Service VM, the hypervisor customizes a virtual ACPI 155 table based on the native ACPI table (this is in the TODO). For a 156 pre-launched User VM, the hypervisor builds a simple ACPI table with 157 necessary information such as MADT. For a post-launched User VM, the DM 158 builds its ACPI table dynamically. 159 160- **Software Load:** Prepare for each VM's software configuration according to 161 guest OS requirements, which may include kernel entry address, ramdisk 162 address, bootargs, or zero page for launching bzImage. This is done by the 163 hypervisor for pre-launched User VMs or Service VM. The VM will start from 164 the standard real mode or protected mode, which is not related to the native 165 environment. For post-launched User VMs, the VM's software configuration is 166 done by DM. 167 168- **Start VM:** The vBSP of vCPUs in this VM is triggered to start scheduling. 169 170- **Schedule vCPUs:** The vCPUs are scheduled to the corresponding 171 physical processors for execution. 172 173- **Init VMCS:** Initialize vCPU's VMCS for its host state, guest 174 state, execution control, entry control, and exit control. It's 175 the last configuration before vCPU runs. 176 177- **vCPU thread:** vCPU starts to run. For the vBSP of vCPUs, it will 178 start running the configured kernel image. For any vAP of vCPUs, it will wait 179 for the ``INIT-SIPI-SIPI`` IPI sequence trigger from its vBSP. 180 181.. figure:: images/hld-image104.png 182 :align: center 183 :name: hvstart-vmflow 184 185 Hypervisor VM Startup Flow 186 187Software configuration for Service VM (bzimage software load as example): 188 189- **ACPI**: HV passes the entire ACPI table from the bootloader to the Service 190 VM directly. Legacy mode is supported as the ACPI table 191 is loaded at F-Segment. 192 193- **E820**: HV passes the E820 table from the bootloader through the zero page 194 after the HV reserved memory (32M, for example) and pre-launched User VM 195 owned memory are filtered out. 196 197- **Zero Page**: HV prepares the zero page at the high end of Service 198 VM memory, which is determined by the Service VM guest FIT binary build. The 199 zero page includes the configuration for ramdisk, bootargs, and E820 200 entries. The zero page address will be set to the vBSP RSI register 201 before the vCPU runs. 202 203- **Entry address**: HV copies the Service VM OS kernel image to 204 ``kernel_load_addr``, which it can get from the ``pref_addr`` field in the 205 bzimage header. The entry address will be calculated based on 206 ``kernel_load_addr``, and will be set to the vBSP RIP register before the 207 vCPU runs. 208 209Software configuration for post-launched User VMs (OVMF software load as 210example): 211 212- **ACPI**: the DM builds the virtual ACPI table and puts it at the User VM's 213 F-Segment. Refer to :ref:`hld-io-emulation` for details. 214 215- **E820**: the DM builds the virtual E820 table and passes it to 216 the virtual bootloader. Refer to :ref:`hld-io-emulation` for details. 217 218- **Entry address**: the DM copies the User VM OS kernel (OVMF) image to 219 ``OVMF_NVSTORAGE_OFFSET`` - normally is @(4G - 2M), and sets the entry 220 address to 0xFFFFFFF0. As the vBSP will trigger the virtual bootloader 221 (OVMF) to run from real mode, its CS base will be set to 0xFFFF0000, and 222 RIP register will be set to 0xFFF0. 223 224Software configuration for pre-launched User VMs (raw software load as example): 225 226- **ACPI**: the hypervisor builds the virtual ACPI table and puts it at 227 this VM's F-Segment. 228 229- **E820**: the hypervisor builds the virtual E820 table and passes it to 230 the VM according to different software loaders. For a raw software load, it's 231 not used. 232 233- **Entry address**: the hypervisor copies the User VM OS kernel image to 234 ``kernel_load_addr`` which is set by ``vm config``, and sets the entry 235 address to ``kernel_entry_addr`` which is set by ``vm config`` as well. 236 237Here is the initial mode of vCPUs: 238 239 240+----------------------------------+----------------------------------------------------------+ 241| VM and Processor Type | Initial Mode | 242+=======================+==========+==========================================================+ 243| Service VM | BSP | Same as physical BSP, or Real Mode if | 244| | | Service VM boots with OVMF | 245| +----------+----------------------------------------------------------+ 246| | AP | Real Mode | 247+-----------------------+----------+----------------------------------------------------------+ 248| Post-launched User VM | BSP | Real Mode | 249| +----------+----------------------------------------------------------+ 250| | AP | Real Mode | 251+-----------------------+----------+----------------------------------------------------------+ 252| Pre-launched User VM | BSP | Real Mode or Protected Mode | 253| +----------+----------------------------------------------------------+ 254| | AP | Real Mode | 255+-----------------------+----------+----------------------------------------------------------+ 256 257