1########################### 2Hyperlaunch Design Document 3########################### 4 5.. sectnum:: :depth: 4 6 7This post is a Request for Comment on the included v4 of a design document that 8describes Hyperlaunch: a new method of launching the Xen hypervisor, relating 9to dom0less and work from the Hyperlaunch project. We invite discussion of this 10on this list, at the monthly Xen Community Calls, and at dedicated meetings on 11this topic in the Xen Working Group which will be announced in advance on the 12Xen Development mailing list. 13 14 15.. contents:: :depth: 3 16 17 18Introduction 19============ 20 21This document describes the design and motivation for the funded development of 22a new, flexible system for launching the Xen hypervisor and virtual machines 23named: "Hyperlaunch". 24 25The design enables seamless transition for existing systems that require a 26dom0, and provides a new general capability to build and launch alternative 27configurations of virtual machines, including support for static partitioning 28and accelerated start of VMs during host boot, while adhering to the principles 29of least privilege. It incorporates the existing dom0less functionality, 30extended to fold in the new developments from the Hyperlaunch project, with 31support for both x86 and Arm platform architectures, building upon and 32replacing the earlier 'late hardware domain' feature for disaggregation of 33dom0. 34 35Hyperlaunch is designed to be flexible and reusable across multiple use cases, 36and our aim is to ensure that it is capable, widely exercised, comprehensively 37tested, and well understood by the Xen community. 38 39Document Structure 40================== 41 42This is the primary design document for Hyperlaunch, to provide an overview of 43the feature. Separate additional documents will cover specific aspects of 44Hyperlaunch in further detail, including: 45 46 - The Device Tree specification for Hyperlaunch metadata 47 - New Domain Roles for Xen and the Xen Security Modules (XSM) policy 48 - Passthrough of PCI devices with Hyperlaunch 49 50Approach 51======== 52 53Born out of improving support for Dynamic Root of Trust for Measurement (DRTM), 54the Hyperlaunch project is focused on restructuring the system launch of Xen. 55The Hyperlaunch design provides a security architecture that builds on the 56principles of Least Privilege and Strong Isolation, achieving this through the 57disaggregation of system functions. It enables this with the introduction of a 58boot domain that works in conjunction with the hypervisor to provide the 59ability to launch multiple domains as part of host boot while maintaining a 60least privilege implementation. 61 62While the Hyperlaunch project inception was and continues to be driven by a 63focus on security through disaggregation, there are multiple use cases with a 64non-security focus that require or benefit from the ability to launch multiple 65domains at host boot. This was proven by the need that drove the implementation 66of the dom0less capability in the Arm branch of Xen. 67 68Hyperlaunch is designed to be flexible and reusable across multiple use cases, 69and our aim is to ensure that it is capable, widely exercised, comprehensively 70tested, and provides a robust foundation for current and emerging system launch 71requirements of the Xen community. 72 73 74Objectives 75---------- 76 77* In general strive to maintain compatibility with existing Xen behavior 78* A default build of the hypervisor should be capable of booting both legacy-compatible and new styles of launch: 79 80 * classic Xen boot: starting a single, privileged Dom0 81 * classic Xen boot with late hardware domain: starting a Dom0 that transitions hardware access/control to another domain 82 * a dom0less boot: starting multiple domains without privilege assignment controls 83 * Hyperlaunch: starting one or more VMs, with flexible configuration 84 85* Preferred that it be managed via KCONFIG options to govern inclusion of support for each style 86* The selection between classic boot and Hyperlaunch boot should be automatic 87 88 * Preferred that it not require a kernel command line parameter for selection 89 90* It should not require modification to boot loaders 91* It should provide a user friendly interface for its configuration and management 92* It must provide a method for building systems that fallback to console access in the event of misconfiguration 93* It should be able to boot an x86 Xen environment without the need for a Dom0 domain 94 95 96Requirements and Design 97======================= 98 99Hyperlaunch is defined as the ability of a hypervisor to construct and start 100one or more virtual machines at system launch in a specific way. A hypervisor 101can support one or both modes of configuration, Hyperlaunch Static and 102Hyperlaunch Dynamic. The Hyperlaunch Static mode functions as a static 103partitioning hypervisor ensuring only the virtual machines started at system 104launch are running on the system. The Hyperlaunch Dynamic mode functions as a 105dynamic hypervisor allowing for additional virtual machines to be started after 106the initial virtual machines have started. The Xen hypervisor is capable of 107both modes of configuration from the same binary and when paired with its XSM 108flask, provides strong controls that enable fine grained system partitioning. 109 110Hypervisor Launch Landscape 111--------------------------- 112 113This comparison table presents the distinctive capabilities of Hyperlaunch with 114reference to existing launch configurations currently available in Xen and 115other hypervisors. 116 117:: 118 119 +---------------+-----------+------------+-----------+-------------+---------------------+ 120 | **Xen Dom0** | **Linux** | **Late** | **Jail** | **Xen** | **Xen Hyperlaunch** | 121 | **(Classic)** | **KVM** | **HW Dom** | **house** | **dom0less**+---------+-----------+ 122 | | | | | | Static | Dynamic | 123 +===============+===========+============+===========+=============+=========+===========+ 124 | Hypervisor able to launch multiple VMs during host boot | 125 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 126 | | | | Y | Y | Y | Y | 127 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 128 | Hypervisor supports Static Partitioning | 129 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 130 | | | | Y | Y | Y | | 131 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 132 | Able to launch VMs dynamically after host boot | 133 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 134 | Y | Y | Y* | Y | Y* | | Y | 135 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 136 | Supports strong isolation between all VMs started at host boot | 137 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 138 | | | | Y | Y | Y | Y | 139 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 140 | Enables flexible sequencing of VM start during host boot | 141 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 142 | | | | | | Y | Y | 143 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 144 | Prevent all-powerful static root domain being launched at boot | 145 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 146 | | | | | Y* | Y | Y | 147 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 148 | Operates without a Highly-privileged management VM (eg. Dom0) | 149 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 150 | | | Y* | | Y* | Y | Y | 151 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 152 | Operates without a privileged toolstack VM (Control Domain) | 153 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 154 | | | | | Y* | Y | | 155 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 156 | Extensible VM configuration applied before launch of VMs at host boot | 157 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 158 | | | | | | Y | Y | 159 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 160 | Flexible granular assignment of permissions and functions to VMs | 161 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 162 | | | | | | Y | Y | 163 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 164 | Supports extensible VM measurement architecture for DRTM and attestation | 165 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 166 | | | | | | Y | Y | 167 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 168 | PCI passthrough configured at host boot | 169 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 170 | | | | | | Y | Y | 171 +---------------+-----------+------------+-----------+-------------+---------+-----------+ 172 173 174Domain Construction 175------------------- 176 177An important aspect of the Hyperlaunch architecture is that the hypervisor 178performs domain construction for all the Initial Domains, ie. it builds each 179domain that is described in the Launch Control Module. More specifically, the 180hypervisor will perform the function of *domain creation* for each Initial 181Domain: it allocates the unique domain identifier assigned to the virtual 182machine and records essential metadata about it in the internal data structure 183that enables scheduling the domain to run. It will also perform *basic domain 184construction*: build the initial page tables with data from the kernel and 185initial ramdisk supplied, and as appropriate for the domain type, populate the 186p2m table and ACPI tables. 187 188Subsequent to this, the boot domain can apply additional configuration to the 189initial domains from the data in the LCM, in *extended domain construction*. 190 191The benefits of this structure include: 192 193* Security: Contrains the permissions required by the boot domain: it does not 194 require the capability to create domains in this structure. This aligns with 195 the principles of least privilege. 196* Flexibility: Enables policy-based dynamic assignment of hardware by the boot 197 domain, customizable according to use-case and able to adapt to hardware 198 discovery 199* Compatibility: Supports reuse of familiar tools with use-case customized boot 200 domains. 201* Commonality: Reuses the same logic for initial basic domain building across 202 diverse Xen deployments. 203 204 * It aligns the x86 initial domain construction with the existing Arm 205 dom0less feature for construction of multiple domains at boot. 206 207 * The boot domain implementation may vary significantly with different 208 deployment use cases, whereas the hypervisor implementation is common. 209 210* Correctness: Increases confidence in the implementation of domain 211 construction, since it is performed by the hypervisor in well maintained and 212 centrally tested logic. 213* Performance: Enables launch for configurations where a fast start of 214 multiple domains at boot is a requirement. 215* Capability: Supports launch of advanced configurations where a sequenced 216 start of multiple domains is required, or multiple domains are involved in 217 startup of the running system configuration 218 219 * eg. for PCI passthrough on systems where the toolstack runs in a separate 220 domain to the hardware management. 221 222Please, see the ‘Hyperlaunch Device Tree’ design document, which describes the 223configuration module that is provided to the hypervisor by the bootloader. 224 225The hypervisor determines how these domains are started as host boot completes: 226in some systems the Boot Domain acts upon the extended boot configuration 227supplied as part of launch, performing configuration tasks for preparing the 228other domains for the hypervisor to commence running them. 229 230Common Boot Configurations 231-------------------------- 232 233When looking across those that have expressed interest or discussed a need for 234launching multiple domains at host boot, the Hyperlaunch approach is to provide 235the means to start nearly any combination of domains. Below is an enumerated 236selection of common boot configurations for reference in the following section. 237 238Dynamic Launch with a Highly-Privileged Domain 0 239^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 240 241Hyperlaunch Classic: Dom0 242 This configuration mimics the classic Xen start and domain construction 243 where a single domain is constructed with all privileges and functions for 244 managing hardware and running virtualization toolstack software. 245 246Hyperlaunch Classic: Extended Launch Dom0 247 This configuration is where a Dom0 is started via a Boot Domain that runs 248 first. This is for cases where some preprocessing in a less privileged domain 249 is required before starting the all-privileged Domain 0. 250 251Hyperlaunch Classic: Basic Cloud 252 This configuration constructs a Dom0 that is started in parallel with some 253 number of workload domains. 254 255Hyperlaunch Classic: Cloud 256 This configuration builds a Dom0 and some number of workload domains, launched 257 via a Boot Domain that runs first. 258 259 260Static Launch Configurations: without a Domain 0 or a Control Domain 261^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 262 263Hyperlaunch Static: Basic 264 Simple static partitioning where all domains that can be run on this system are 265 built and started during host boot and where no domain is started with the 266 Control Domain permissions, thus making it not possible to create/start any 267 further new domains. 268 269Hyperlaunch Static: Standard 270 This is a variation of the “Hyperlaunch Static: Basic” static partitioning 271 configuration with the introduction of a Boot Domain. This configuration allows 272 for use of a Boot Domain to be able to apply extended configuration 273 to the Initial Domains before they are started and 274 sequence the order in which they start. 275 276Hyperlaunch Static: Disaggregated 277 This is a variation of the “Hyperlaunch Static: Standard” configuration with 278 the introduction of a Boot Domain and an illustration that some functions can 279 be disaggregated to dedicated domains. 280 281Dynamic Launch of Disaggregated System Configurations 282^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 283 284Hyperlaunch Dynamic: Hardware Domain 285 This configuration mimics the existing Xen feature late hardware domain with 286 the one difference being that the hardware domain is constructed by the 287 hypervisor at startup instead of later by Dom0. 288 289Hyperlaunch Dynamic: Flexible Disaggregation 290 This configuration is similar to the “Hyperlaunch Classic: Dom0” configuration 291 except that it includes starting a separate hardware domain during Xen startup. 292 It is also similar to “Hyperlaunch Dynamic: Hardware Domain” configuration, but 293 it launches via a Boot Domain that runs first. 294 295Hyperlaunch Dynamic: Full Disaggregation 296 In this configuration it is demonstrated how it is possible to start a fully 297 disaggregated system: the virtualization toolstack runs in a Control Domain, 298 separate from the domains responsible for managing hardware, XenStore, the Xen 299 Console and Crash functions, each launched via a Boot Domain. 300 301 302Example Use Cases and Configurations 303^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 304 305The following example use cases can be matched to configurations listed in the 306previous section. 307 308Use case: Modern cloud hypervisor 309""""""""""""""""""""""""""""""""" 310 311**Option:** Hyperlaunch Classic: Cloud 312 313This configuration will support strong isolation for virtual TPM domains and 314measured launch in support of attestation to infrastructure management, while 315allowing the use of existing Dom0 virtualization toolstack software. 316 317Use case: Edge device with security or safety requirements 318"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" 319 320**Option:** Hyperlaunch Static: Boot 321 322This configuration runs without requiring a highly-privileged Dom0, and enables 323extended VM configuration to be applied to the Initial VMs prior to launching 324them, optionally in a sequenced start. 325 326Use case: Client hypervisor 327""""""""""""""""""""""""""" 328 329**Option:** Hyperlaunch Dynamic: Flexible Disaggregation 330 331**Option:** Hyperlaunch Dynamic: Full Disaggregation 332 333These configurations enable dynamic client workloads, strong isolation for the 334domain running the virtualization toolstack software and each domain managing 335hardware, with PCI passthrough performed during host boot and support for 336measured launch. 337 338Hyperlaunch Disaggregated Launch 339-------------------------------- 340 341 342Existing in Xen today are two primary permissions, *control domain* and 343*hardware domain*, and two functions, *console domain* and *xenstore domain*, 344that can be assigned to a domain. Traditionally all of these permissions and 345functions are all assigned to Dom0 at start and can then be delegated to other 346domains created by the toolstack in Dom0. With Hyperlaunch it becomes possible 347to assign these permissions and functions to any domain for which there is a 348definition provided at startup. 349 350Additionally, two further functions are introduced: the *recovery domain*, 351intended to assist with recovery from failures encountered starting VMs during 352host boot, and the *boot domain*, for performing aspects of domain construction 353during startup. 354 355Supporting the booting of each of the above common boot configurations is 356accomplished by considering the set of initial domains and the assignment of 357Xen’s permissions and functions, including the ones introduced by Hyperlaunch, 358to these domains. A discussion of these will be covered later but for now they 359are laid out in a table with a mapping to the common boot configurations. This 360table is not intended to be an exhaustive list of configurations and does not 361account for flask policy specified functions that are use case specific. 362 363In the table each number represents a separate domain being 364constructed by the Hyperlaunch construction path as Xen starts, and the 365designator, ``{n}`` signifies that there may be “n” additional domains that may 366be constructed that do not have any special role for a general Xen system. 367 368:: 369 370 +-------------------+------------------+-----------------------------------+ 371 | Configuration | Permission | Function | 372 | +------+------+----+------+--------+--------+----------+ 373 | | None | Ctrl | HW | Boot |Recovery| Console| Xenstore | 374 +===================+======+======+====+======+========+========+==========+ 375 | Classic: Dom0 | | 0 | 0 | | 0 | 0 | 0 | 376 +-------------------+------+------+----+------+--------+--------+----------+ 377 | Classic: Extended | | 1 | 1 | 0 | 1 | 1 | 1 | 378 | Launch Dom0 | | | | | | | | 379 +-------------------+------+------+----+------+--------+--------+----------+ 380 | Classic: | {n} | 0 | 0 | | 0 | 0 | 0 | 381 | Basic Cloud | | | | | | | | 382 +-------------------+------+------+----+------+--------+--------+----------+ 383 | Classic: Cloud | {n} | 1 | 1 | 0 | 1 | 1 | 1 | 384 +-------------------+------+------+----+------+--------+--------+----------+ 385 | Static: Basic | {n} | | 0 | | 0 | 0 | 0 | 386 +-------------------+------+------+----+------+--------+--------+----------+ 387 | Static: Standard | {n} | | 1 | 0 | 1 | 1 | 1 | 388 +-------------------+------+------+----+------+--------+--------+----------+ 389 | Static: | {n} | | 2 | 0 | 3 | 4 | 1 | 390 | Disaggregated | | | | | | | | 391 +-------------------+------+------+----+------+--------+--------+----------+ 392 | Dynamic: | | 0 | 1 | | 0 | 0 | 0 | 393 | Hardware Domain | | | | | | | | 394 +-------------------+------+------+----+------+--------+--------+----------+ 395 | Dynamic: Flexible | {n} | 1 | 2 | 0 | 1 | 1 | 1 | 396 | Disaggregation | | | | | | | | 397 +-------------------+------+------+----+------+--------+--------+----------+ 398 | Dynamic: Full | {n} | 2 | 3 | 0 | 4 | 5 | 1 | 399 | Disaggregation | | | | | | | | 400 +-------------------+------+------+----+------+--------+--------+----------+ 401 402Overview of Hyperlaunch Flow 403---------------------------- 404 405Before delving into Hyperlaunch, a good basis to start with is an understanding 406of the current process to create a domain. A way to view this process starts 407with the core configuration which is the information the hypervisor requires to 408make the call to `domain_create`, followed by basic construction to provide the 409memory image to run, including the kernel and ramdisk. A subsequent step 410applies the extended configuration used by the toolstack to provide a domain 411with any additional configuration information. Until the extended configuration 412is completed, a domain has access to no resources except its allocated vcpus 413and memory. The exception to this is Dom0, which the hypervisor explicitly 414grants control and access to all system resources, except for those that only 415the hypervisor should have control over. This exception for Dom0 is driven by 416the system structure with a monolithic Dom0 domain predating introduction of 417support for disaggregation into Xen, and the corresponding default assignment 418of multiple roles within the Xen system to Dom0. 419 420While not a different domain creation path, there does exist the Hardware 421Domain (hwdom), sometimes also referred to as late-Dom0. It is an early effort 422to disaggregate Dom0’s roles into a separate control domain and hardware 423domain. This capability is activated by the passing of a domain id to the 424`hardware_dom` kernel command line parameter, and the Xen hypervisor will then 425flag that domain id as the hardware domain. Later when the toolstack constructs 426a domain with that domain id as the requested domid, the hypervisor will 427transfer all device I/O from Dom0 to this domain. In addition it will also 428transfer the “host shutdown on domain shutdown” flag from Dom0 to the hardware 429domain. It is worth mentioning that this approach for disaggregation was 430created in this manner due to the inability of Xen to launch more than one 431domain at startup. 432 433Hyperlaunch Xen startup 434^^^^^^^^^^^^^^^^^^^^^^^ 435 436The Hyperlaunch approach’s primary focus is on how to assign the roles 437traditionally granted to Dom0 to one or more domains at host boot. While the 438statement is simple to make, the implications are not trivial by any means. 439This also explains why the Hyperlaunch approach is orthogonal to the existing 440dom0less capability. The dom0less capability focuses on enabling the launch of 441multiple domains in parallel with Dom0 at host boot. A corollary for dom0less 442is that for systems that don’t require Dom0 after all guest domains have 443started, they are able to do the host boot without a Dom0. Though it should be 444noted that it may be possible to start Dom0 at a later point. Whereas with 445Hyperlaunch, its approach of separating Dom0’s roles requires the ability to 446launch multiple domains at host boot. The direct consequences from this 447approach are profound and provide a myriad of possible configurations for which 448a sample of common boot configurations were already presented. 449 450To enable the Hyperlaunch approach a new alternative path for host boot within 451the hypervisor must be introduced. This alternative path effectively branches 452just before the current point of Dom0 construction and begins an alternate 453means of system construction. The determination if this alternate path should 454be taken is through the inspection of the boot chain. If the bootloader has 455loaded a specific configuration, as described later, it will enable Xen to 456detect that a Hyperlaunch configuration has been provided. Once a Hyperlaunch 457configuration is detected, this alternate path can be thought of as occurring 458in phases: domain creation, domain preparation, and launch finalization. 459 460Domain Creation 461""""""""""""""" 462 463The domain creation phase begins with Xen parsing the bootloader provided 464material, to understand the content of the modules provided. It will then load 465any microcode or XSM policy it discovers. For each domain configuration Xen 466finds, it parses the configuration to construct the necessary domain definition 467to instantiate an instance of the domain and leave it in a paused state. When 468all domain configurations have been instantiated as domains, if one of them is 469flagged as the Boot Domain, that domain will be unpaused starting the domain 470preparation phase. If there is no Boot Domain defined, then the domain 471preparation phase will be skipped and Xen will trigger the launch finalization 472phase. 473 474Domain Preparation Phase 475"""""""""""""""""""""""" 476 477The domain preparation phase is an optional check point for the execution of a 478workload specific domain, the Boot Domain. While the Boot Domain is the first 479domain to run and has some degree of control over the system, it is extremely 480restricted in both system resource access and hypervisor operations. Its 481purpose is to: 482 483* Access the configuration provided by the bootloader 484* Finalize the configuration of the domains 485* Conduct any setup and launch related operations 486* Do an ordered unpause of domains that require an ordered start 487 488When the Boot Domain has completed, it will notify the hypervisor that it is 489done triggering the launch finalization phase. 490 491 492Launch Finalization 493""""""""""""""""""" 494 495The hypervisor handles the launch finalization phase which is equivalent to the 496clean up phase. As such the steps taken by the hypervisor, not necessarily in 497implementation order, are as follows, 498 499* Free the boot module chain 500* If a Boot Domain was used, reclaim Boot Domain resources 501* Unpause any domains still in a paused state 502* Boot Domain uses a reserved function thus can never be respawned 503 504While the focus thus far has been on how the Hyperlaunch capability will work, 505it is worth mentioning what it does not do or limit from occurring. It does not 506stop or inhibit the assigning of the control domain role which gives the domain 507the ability to create, start, stop, restart, and destroy domains or the 508hardware domain role which gives access to all I/O devices except those that 509the hypervisor has reserved for itself. In particular it is still possible to 510construct a domain with all the privileged roles, i.e. a Dom0, with or without 511the domain id being zero. In fact what limitations are imposed now become fully 512configurable without the risk of circumvention by an all privileged domain. 513 514Structuring of Hyperlaunch 515-------------------------- 516 517The structure of Hyperlaunch is built around the existing capabilities of the 518host boot protocol. This approach was driven by the objective not to require 519modifications to the boot loader. The only requirement is that the boot loader 520supports the Multiboot2 (MB2) protocol. For UEFI boot, our recommendation is to 521use GRUB.efi to load Xen and the initial domain materials via the multiboot2 522method. On Arm platforms, Hyperlaunch is compatible with the existing interface 523for boot into the hypervisor. 524 525 526x86 Multiboot2 527^^^^^^^^^^^^^^ 528 529The MB2 protocol has no concept of a manifest to tell the initial kernel what 530is contained in the chain, leaving it to the kernel to impose a loading 531convention, use magic number identification, or both. When considering the 532passing of multiple kernels, ramdisks, and domain configuration along with any 533existing modules already passed, there is no sane convention that could be 534imposed and magic number identification is nearly impossible when considering 535the objective not to impose unnecessary complication to the hypervisor. 536 537As it was alluded to previously, a manifest describing the contents in the MB2 538chain and how they relate within a Xen context is needed. To address this need 539the Launch Control Module (LCM) was designed to provide such a manifest. The 540LCM was designed to have a specific set of properties, 541 542* minimize the complexity of the parsing logic required by the hypervisor 543* allow for expanding and optional configuration fragments without breaking 544 backwards compatibility 545 546To enable automatic detection of a Hyperlaunch configuration, the LCM must be 547the first MB2 module in the MB2 module chain. The LCM is implemented using the 548Device Tree as defined in the Hyperlaunch Device Tree design document. With the 549LCM implemented in Device Tree, it has a magic number that enables the 550hypervisor to detect its presence when used in a Multiboot2 module chain. The 551hypervisor can confirm that it is a proper LCM Device Tree by checking for a 552compliant Hyperlaunch Device Tree. The Hyperlaunch Device Tree nodes are 553designed to allow, 554 555* for the hypervisor to parse only those entries it understands, 556* for packing custom information for a custom boot domain, 557* the ability to use a new LCM with an older hypervisor, 558* and the ability to use an older LCM with a new hypervisor. 559 560Arm Device Tree 561^^^^^^^^^^^^^^^ 562 563As discussed the LCM is in Device Tree format and was designed to co-exist in 564the Device Tree ecosystem, and in particular in parallel with dom0less Device 565Tree entries. On Arm, Xen is already designed to boot from a host Device Tree 566description (dtb) file and the LCM entries can be embedded into this host dtb 567file. This makes detecting the LCM entries and supporting Hyperlaunch on Arm 568relatively straight forward. Relative to the described x86 approach, at the 569point where Xen inspects the first MB2 module, on Arm Xen will check if the top 570level LCM node exists in the host dtb file. If the LCM node does exist, then at 571that point it will enter into the same code path as the x86 entry would go. 572 573Xen hypervisor 574^^^^^^^^^^^^^^ 575 576It was previously discussed at a higher level of the new host boot flow that 577will be introduced. Within this new flow is the configuration parsing and 578domain creation phase which will be expanded upon here. The hypervisor will 579inspect the LCM for a config node and if found will iterate through all modules 580nodes. The module nodes are used to identify if any modules contain microcode 581or an XSM policy. As it processes domain nodes, it will construct the domain 582using the node properties and the modules nodes. Once it has completed 583iterating through all the entries in the LCM, if a constructed domain has the 584Boot Domain attribute, it will then be unpaused. Otherwise the hypervisor will 585start the launch finalization phase. 586 587Boot Domain 588^^^^^^^^^^^ 589 590Traditionally domain creation was controlled by the user within the Dom0 591environment whereby custom toolstacks could be implemented to impose 592requirements on the process. The Boot Domain is a means to enable the user to 593continue to maintain a degree of that control over domain creation but within a 594limited privilege environment. The Boot Domain will have access to the LCM and 595the boot chain along with access to a subset of the hypercall operations. When 596the Boot Domain is finished it will notify the hypervisor through a hypercall 597op. 598 599Recovery Domain 600^^^^^^^^^^^^^^^ 601 602With the existing Dom0 host boot path, when a failure occurs there are several 603assumptions that can safely be made to get the user to a console for 604troubleshooting. With the Hyperlaunch host boot path those assumptions can no 605longer be made, thus a means is needed to get the user to a console in the case 606of a recoverable failure. The recovery domain is configured by a domain 607configuration entry in the LCM, in the same manner as the other initial 608domains, and it will not be unpaused at launch finalization unless a failure is 609encountered starting the initial domains. 610 611Xen has existing support for a Crash Environment where memory can be reserved 612at host boot and a kernel loaded into it, to be jumped into at any point while 613the system is running when a crash is detected. The Recovery Domain 614functionality is a separate, complementary capability. The Crash Environment 615replaces the previously active hypervisor and running guests, and enables a 616process for mounting disks to write out log information prior to rebooting the 617system. In contrast, the Recovery Domain is able to use the functionality of 618the Xen hypervisor, that is still present and running, to perform recovery 619handling for errors encountered with starting the initial domains. 620 621Deferred Design 622""""""""""""""" 623 624To be determined: 625 626* Define what is detected as a crash 627* Explain how crash detection is performed and which components are involved 628* Explain how the recovery domain is unpaused 629* Explain how and when the resources assigned to the recovery domain are reclaimed 630* Define what the recovery domain is able to do 631* Determine what permissions the recovery domain requires to perform its job 632 633 634Control Domain 635^^^^^^^^^^^^^^ 636 637The concept of the Control Domain already exists within Xen as a boolean, 638`is_privileged`, that governs access to many of the privileged interfaces of 639the hypervisor that support a domain running a virtualization system toolstack. 640Hyperlaunch will allow the `is_privileged` flag to be set on any domain that is 641created at launch, rather than only a Dom0. It may potentially be set on 642multiple domains. 643 644Hardware Domain 645^^^^^^^^^^^^^^^ 646 647The Hardware Domain is also an existing concept for Xen that is enabled through 648the `is_hardware_domain` check. With Hyperlaunch the previous process of I/O 649accesses being assigned to Dom0 for later transfer to the hardware domain would 650no longer be required. Instead during the configuration phase the Xen 651hypervisor would directly assign the I/O accesses to the domain with the 652hardware domain permission bit enabled. 653 654Console Domain 655^^^^^^^^^^^^^^ 656 657Traditionally the Xen console is assigned to the control domain and then 658reassignable by the toolstack to another domain. With Hyperlaunch it becomes 659possible to construct a boot configuration where there is no control domain or 660have a use case where the Xen console needs to be isolated. As such it becomes 661necessary to be able to designate which of the initial domains should be 662assigned the Xen console. Therefore Hyperlaunch introduces the ability to 663specify an initial domain which the console is assigned along with a convention 664of ordered assignment for when there is no explicit assignment. 665 666Communication of Domain Configurations 667====================================== 668 669There are several standard methods for an Operating System to access machine 670configuration and environment information: ACPI is common on x86 systems, 671whereas Device Tree is more typical on Arm platforms. There are currently 672implementations of both in Xen. 673 674* For dom0less, guest Device Trees are dynamically constructed by the 675 hypervisor to convey domain configuration data 676 677* For PVH dom0 on x86, ACPI tables are built by the hypervisor before the 678 domain is started 679 680Note that both of these mechanisms convey static data that is fixed prior to 681the point of domain construction. Hyperlaunch will retain both the existing 682ACPI and Device Tree methods. 683 684Communication of data between a Boot Domain and a Control Domain is of note 685since they may not be running concurrently: the method used will depend on 686their specific implementations, but one option available is to use Xen’s hypfs 687for transfer of basic data to support system bootstrap. 688 689------------------------------------------------------------------------------- 690 691Appendix 692======== 693 694Appendix 1: Flow Sequence of Steps of a Hyperlaunch Boot 695-------------------------------------------------------- 696 697Provided here is an ordered flow of a Hyperlaunch with a highlight logic 698decision points. Not all branch points are recorded, specifically for the 699variety of error conditions that may occur. :: 700 701 1. Hypervisor Startup: 702 2a. (x86) Inspect first module provided by the bootloader 703 a. Is the module an LCM 704 i. YES: proceed with the Hyperlaunch host boot path 705 ii. NO: proceed with a Dom0 host boot path 706 2b. (Arm) Inspect host dtb for `/chosen/hypervisor` node 707 a. Is the LCM present 708 i. YES: proceed with the Hyperlaunch host boot path 709 ii. NO: proceed with a Dom0/dom0less host boot path 710 3. Iterate through the LCM entries looking for the module description 711 entry 712 a. Check if any of the modules are microcode or policy and if so, 713 load 714 4. Iterate through the LCM entries processing all domain description 715 entries 716 a. Use the details from the Basic Configuration to call 717 `domain_create` 718 b. Record if a domain is flagged as the Boot Domain 719 c. Record if a domain is flagged as the Recovery Domain 720 5. Was a Boot Domain created 721 a. YES: 722 i. Attach console to Boot Domain 723 ii. Unpause Boot Domain 724 iii. Goto Boot Domain (step 6) 725 b. NO: Goto Launch Finalization (step 10) 726 6. Boot Domain: 727 7. Boot Domain comes online and may do any of the following actions 728 a. Process the LCM 729 b. Validate the MB2 chain 730 c. Make additional configuration settings for staged domains 731 d. Unpause any precursor domains 732 e. Set any runtime configurations 733 8. Boot Domain does any necessary cleanup 734 9. Boot Domain make hypercall op call to signal it is finished 735 i. Hypervisor reclaims all Boot Domain resources 736 ii. Hypervisor records that the Boot Domain ran 737 ii. Goto Launch Finalization (step 9) 738 10. Launch Finalization 739 11. If a configured domain was flagged to have the console, the 740 hypervisor assigns it 741 12. The hypervisor clears the LCM and bootloader loaded module, 742 reclaiming the memory 743 13. The hypervisor iterates through domains unpausing any domain not 744 flagged as the recovery domain 745 746 747Appendix 2: Considerations in Naming the Hyperlaunch Feature 748------------------------------------------------------------ 749 750* The term “Launch” is preferred over “Boot” 751 752 * Multiple individual component boots can occur in the new system start 753 process; Launch is preferable for describing the whole process 754 * Fortunately there is consensus in the current group of stakeholders 755 that the term “Launch” is good and appropriate 756 757* The names we define must support becoming meaningful and simple to use 758 outside the Xen community 759 760 * They must be able to be resolved quickly via search engine to a clear 761 explanation (eg. Xen marketing material, documentation or wiki) 762 * We prefer that the terms be helpful for marketing communications 763 * Consequence: avoid the term “domain” which is Xen-specific and 764 requires a definition to be provided each time when used elsewhere 765 766 767* There is a need to communicate that Xen is capable of being used as a Static 768 Partitioning hypervisor 769 770 * The community members using and maintaining dom0less are the current 771 primary stakeholders for this 772 773* There is a need to communicate that the new launch functionality provides new 774 capabilities not available elsewhere, and is more than just supporting Static 775 Partitioning 776 777 * No other hypervisor known to the authors of this document is capable 778 of providing what Hyperlaunch will be able to do. The launch sequence is 779 designed to: 780 781 * Remove dependency on a single, highly-privileged initial domain 782 * Allow the initial domains started to be independent and fully 783 isolated from each other 784 * Support configurations where no further VMs can be launched 785 once the initial domains have started 786 * Use a standard, extensible format for conveying VM 787 configuration data 788 * Ensure that domain building of all initial domains is 789 performed by the hypervisor from materials supplied by the 790 bootloader 791 * Enable flexible configuration to be applied to all initial 792 domains by an optional Boot Domain, that runs with limited 793 privilege, before any other domain starts and obtains the VM 794 configuration data from the bootloader materials via the 795 hypervisor 796 * Enable measurements of all of the boot materials prior to 797 their use, in a sequence with minimized privilege 798 * Support use-case-specific customized Boot Domains 799 * Complement the hypervisor’s existing ability to enforce 800 policy-based Mandatory Access Control 801 802 803* “Static” and “Dynamic” have different and important meanings in different 804 communities 805 806 * Static and Dynamic Partitioning describe the ability to create new 807 virtual machines, or not, after the initial host boot process 808 completes 809 * Static and Dynamic Root of Trust describe the nature of the trust 810 chain for a measured launch. In this case Static is referring to the 811 fact that the trust chain is fixed and non-repeatable until the next 812 host reboot or shutdown. Whereas Dynamic in this case refers to the 813 ability to conduct the measured launch at any time and potentially 814 multiple times before the next host reboot or shutdown. 815 816 * We will be using Hyperlaunch with both Static and Dynamic 817 Roots of Trust, to launch both Static and Dynamically 818 Partitioned Systems, and being clear about exactly which 819 combination is being started will be very important (eg. for 820 certification processes) 821 822 * Consequence: uses of “Static” and “Dynamic” need to be qualified if 823 they are incorporated into the naming of this functionality 824 825 * This can be done by adding the preceding, stronger branded 826 term: “Hyperlaunch”, before “Static” or “Dynamic” 827 * ie. “Hyperlaunch Static” describes launch of a 828 Statically Partitioned system 829 * and “Hyperlaunch Dynamic” describes launch of a 830 Dynamically Partitioned system. 831 * In practice, this means that “Hyperlaunch Static” describes 832 starting a Static Partitioned system where no new domains can 833 be started later (ie. no VM has the Control Domain 834 permission), whereas “Hyperlaunch Dynamic” will launch some 835 VM with the Control Domain permission, able to create VMs 836 dynamically at a later point. 837 838**Naming Proposal:** 839 840* New Term: “Hyperlaunch” : the ability of a hypervisor to construct and start 841 one or more virtual machines at system launch, in the following manner: 842 843 * The hypervisor must build all of the domains that it starts at host 844 boot 845 846 * Similar to the way the dom0 domain is built by the hypervisor 847 today, and how dom0less works: it will run a loop to build 848 them all, driven from the configuration provided 849 * This is a requirement for ensuring that there is Strong 850 Isolation between each of the initial VMs 851 852 * A single file contains the VM configs (“Launch Control Module”: LCM, 853 in Device Tree binary format) is provided to the hypervisor 854 855 * The hypervisor parses it and builds domains 856 * If the LCM config says that a Boot Domain should run first, 857 then the LCM file itself is made available to the Boot Domain 858 for it to parse and act on, to invoke operations via the 859 hypervisor to apply additional configuration to the other VMs 860 (ie. executing a privilege-constrained toolstack) 861 862* New Term: “Hyperlaunch Static”: starts a Static Partitioned system, where 863 only the virtual machines started at system launch are running on the system 864 865* New Term: “Hyperlaunch Dynamic”: starts a system where virtual machines may 866 be dynamically added after the initial virtual machines have started. 867 868 869In the default configuration, Xen will be capable of both styles of Hyperlaunch 870from the same hypervisor binary, when paired with its XSM flask, provides 871strong controls that enable fine grained system partitioning. 872 873 874* Retiring Term: “DomB”: will no longer be used to describe the optional first 875 domain that is started. It is replaced with the more general term: “Boot 876 Domain”. 877 878* Retiring Term: “Dom0less”: it is to be replaced with “Hyperlaunch Static” 879 880 881Appendix 3: Terminology 882----------------------- 883 884To help ensure clarity in reading this document, the following is the 885definition of terminology used within this document. 886 887 888Basic Configuration 889 the minimal information the hypervisor requires to instantiate a domain instance 890 891 892Boot Domain 893 a domain with limited privileges launched by the hypervisor during a 894 Multiple Domain Boot that runs as the first domain started. In the Hyperlaunch 895 architecture, it is responsible for assisting with higher level operations of 896 the domain setup process. 897 898 899Classic Launch 900 a backwards-compatible host boot that ends with the launch of a single domain (Dom0) 901 902 903Console Domain 904 a domain that has the Xen console assigned to it 905 906 907Control Domain 908 a privileged domain that has been granted Control Domain permissions which 909 are those that are required by the Xen toolstack for managing other domains. 910 These permissions are a subset of those that are granted to Dom0. 911 912 913Device Tree 914 a standardized data structure, with defined file formats, for describing 915 initial system configuration 916 917 918Disaggregation 919 the separation of system roles and responsibilities across multiple 920 connected components that work together to provide functionality 921 922 923Dom0 924 the highly-privileged, first and only domain started at host boot on a 925 conventional Xen system 926 927 928Dom0less 929 an existing feature of Xen on Arm that provides Multiple Domain Boot 930 931 932Domain 933 a running instance of a virtual machine; (as the term is commonly used in 934 the Xen Community) 935 936DomB 937 the former name for Hyperlaunch 938 939 940Extended Configuration 941 any configuration options for a domain beyond its Basic Configuration 942 943 944Hardware Domain 945 a privileged domain that has been granted permissions to access and manage 946 host hardware. These permissions are a subset of those that are granted to 947 Dom0. 948 949 950Host Boot 951 the system startup of Xen using the configuration provided by the bootloader 952 953 954Hyperlaunch 955 a flexible host boot that ends with the launch of one or more domains 956 957 958Initial Domain 959 a domain that is described in the LCM that is run as part of a multiple 960 domain boot. This includes the Boot Domain, Recovery Domain and all Launched 961 Domains. 962 963 964Late Hardware Domain 965 a Hardware Domain that is launched after host boot has already completed 966 with a running Dom0. When the Late Hardware Domain is started, Dom0 967 relinquishes and transfers the permissions to access and manage host hardware 968 to it.. 969 970 971Launch Control Module (LCM) 972 A file supplied to the hypervisor by the bootloader that contains 973 configuration data for the hypervisor and the initial set of virtual machines 974 to be run at boot 975 976 977Launched Domain 978 a domain, aside from the boot domain and recovery domain, that is started as 979 part of a multiple domain boot and remains running once the boot process is 980 complete 981 982 983Multiple Domain Boot 984 a system configuration where the hypervisor and multiple virtual machines 985 are all launched when the host system hardware boots 986 987 988Recovery Domain 989 an optional fallback domain that the hypervisor may start in the event of a 990 detectable error encountered during the multiple domain boot process 991 992 993System Device Tree 994 this is the product of an Arm community project to extend Device Tree to 995 cover more aspects of initial system configuration 996 997 998Appendix 4: Copyright License 999----------------------------- 1000 1001This work is licensed under a Creative Commons Attribution 4.0 International 1002License. A copy of this license may be obtained from the Creative Commons 1003website (https://creativecommons.org/licenses/by/4.0/legalcode). 1004 1005| Contributions by: 1006| Christopher Clark are Copyright © 2021 Star Lab Corporation 1007| Daniel P. Smith are Copyright © 2021 Apertus Solutions, LLC 1008