1# VM interface 2 3This page provides an overview of the interface Hafnium provides to VMs. Hafnium 4makes a distinction between the 'primary VM', which controls scheduling and has 5more direct access to some hardware, and 'secondary VMs' which exist mostly to 6provide services to the primary VM, and have a more paravirtualised interface. 7The intention is that the primary VM can run a mostly unmodified operating 8system (such as Linux) with the addition of a Hafnium driver which 9[fulfils certain expectations](SchedulerExpectations.md), while secondary VMs 10will run more specialised trusted OSes or bare-metal code which is designed with 11Hafnium in mind. 12 13The interface documented here is what is planned for the first release of 14Hafnium, not necessarily what is currently implemented. 15 16[TOC] 17 18## CPU scheduling 19 20The primary VM will have one vCPU for each physical CPU, and control the 21scheduling. 22 23Secondary VMs will have a configurable number of vCPUs, scheduled on arbitrary 24physical CPUs at the whims of the primary VM scheduler. 25 26All VMs will start with a single active vCPU. Subsequent vCPUs can be started 27through PSCI. 28 29## PSCI 30 31The primary VM will be able to control the physical CPUs through the following 32PSCI 1.1 calls, which will be forwarded to the underlying implementation in EL3: 33 34* PSCI_VERSION 35* PSCI_FEATURES 36* PSCI_SYSTEM_OFF 37* PSCI_SYSTEM_RESET 38* PSCI_AFFINITY_INFO 39* PSCI_CPU_SUSPEND 40* PSCI_CPU_OFF 41* PSCI_CPU_ON 42 43All other PSCI calls are unsupported. 44 45Secondary VMs will be able to control their vCPUs through the following PSCI 1.1 46calls, which will be implemented by Hafnium: 47 48* PSCI_VERSION 49* PSCI_FEATURES 50* PSCI_AFFINITY_INFO 51* PSCI_CPU_SUSPEND 52* PSCI_CPU_OFF 53* PSCI_CPU_ON 54 55All other PSCI calls are unsupported. 56 57## Hardware timers 58 59The primary VM will have access to both the physical and virtual EL1 timers 60through the usual control registers (`CNT[PV]_TVAL_EL0` and `CNT[PV]_CTL_EL0`). 61 62Secondary VMs will have access to the virtual timer only, which will be emulated 63with help from the kernel driver in the primary VM. 64 65## Interrupts 66 67The primary VM will have direct access to control the physical GIC, and receive 68all interrupts (other than anything already trapped by TrustZone). It will be 69responsible for forwarding any necessary interrupts to secondary VMs. The 70Interrupt Translation Service (ITS) will be disabled by Hafnium so that it 71cannot be used to circumvent access controls. 72 73Secondary VMs will have access to a simple paravirtualized interrupt controller 74through two hypercalls: one to enable or disable a given virtual interrupt ID, 75and one to get and acknowledge the next pending interrupt. There is no concept 76of interrupt priorities or a distinction between edge and level triggered 77interrupts. Secondary VMs may also inject interrupts into their own vCPUs. 78 79## Performance counters 80 81VMs will be blocked from accessing performance counter registers (for the 82performance monitor extensions described in chapter D5 of the Armv8-A reference 83manual) in production, to prevent them from being used as a side channel to leak 84data between VMs. 85 86Hafnium may allow VMs to use them in debug builds. 87 88## Debug registers 89 90VMs will be blocked from accessing debug registers in production builds, to 91prevent them from being used to circumvent access controls. 92 93Hafnium may allow VMs to use these registers in debug builds. 94 95## RAS Extension registers 96 97Secondary VMs will be blocked from using registers associated with the RAS 98Extension. 99 100## Asynchronous message passing 101 102VMs will be able to send messages of up to 4 KiB to each other asynchronously, 103with no queueing, as specified by FF-A. 104 105## Memory 106 107VMs will statically be given access to mutually-exclusive regions of the 108physical address space at boot. This includes MMIO space for controlling 109devices, plus a fixed amount of RAM for secondaries, and all remaining address 110space to the primary. Note that this means that only one VM can control any 111given page of MMIO registers for a device. 112 113VMs may choose to donate or share their memory with other VMs at runtime. Any 114given page may be shared with at most 2 VMs at once (including the original 115owning VM). Memory which has been donated or shared may not be forcefully 116reclaimed, but the VM with which it was shared may choose to return it. 117 118## Cache 119 120VMs will be blocked from using cache maintenance instructions that operate by 121set/way. These operations are difficult to virtualize, and could expose the 122system to side-channel attacks. 123 124## Logging 125 126VMs may send a character to a shared log by means of a hypercall or SMC call. 127These log messages will be buffered per VM to make complete lines, then output 128to a Hafnium-owned UART and saved in a shared ring buffer which may be extracted 129from RAM dumps. VM IDs will be prepended to these logs. 130 131This log API is intended for use in early bringup and low-level debugging. No 132sensitive data should be logged through it. Higher level logs can be sent to the 133primary VM through the asynchronous message passing mechanism described above, 134or through shared memory. 135 136## Configuration 137 138Hafnium will read configuration from a flattened device tree blob (FDT). This 139may either be the same device tree used for the other details of the system or a 140separate minimal one just for Hafnium. This will include at least: 141 142* The available RAM. 143* The number of secondary VMs, how many vCPUs each should have, how much 144 memory to assign to each of them, and where to load their initial images. 145 (Most likely the initial image will be a minimal loader supplied with 146 Hafnium which will validate and load the rest of the image from the primary 147 later on.) 148* Which devices exist on the system, their details (MMIO regions, interrupts 149 and SYSMMU details), and which VM each is assigned to. 150 * A single physical device may be split into multiple logical ‘devices’ 151 from Hafnium’s point of view if necessary to have different VMs own 152 different parts of it. 153* A whitelist of which SMC calls each VM is allowed to make. 154 155## Failure handling 156 157If a secondary VM tries to do something it shouldn't, Hafnium will either inject 158a fault or kill it and inform the primary VM. The primary VM may choose to 159restart the system or to continue without the secondary VM. 160 161If the primary VM tries to do something it shouldn't, Hafnium will either inject 162a fault or restart the system. 163 164## TrustZone communication 165 166The primary VM will be able to communicate with a TEE running in TrustZone 167either through FF-A messages or through whitelisted SMC calls, and through 168shared memory. 169 170## Other SMC calls 171 172Other than the PSCI calls described above and those used to communicate with 173Hafnium, all other SMC calls will be blocked by default. Hafnium will allow SMC 174calls to be whitelisted on a per-VM, per-function ID basis, as part of the 175static configuration described above. These whitelisted SMC calls will be 176forwarded to the EL3 handler with the client ID (as described by the SMCCC) set 177to the calling VM's ID. 178