1###########################
2Hyperlaunch Design Document
3###########################
4
5.. sectnum:: :depth: 4
6
7This post is a Request for Comment on the included v4 of a design document that
8describes Hyperlaunch: a new method of launching the Xen hypervisor, relating
9to dom0less and work from the Hyperlaunch project. We invite discussion of this
10on this list, at the monthly Xen Community Calls, and at dedicated meetings on
11this topic in the Xen Working Group which will be announced in advance on the
12Xen Development mailing list.
13
14
15.. contents:: :depth: 3
16
17
18Introduction
19============
20
21This document describes the design and motivation for the funded development of
22a new, flexible system for launching the Xen hypervisor and virtual machines
23named: "Hyperlaunch".
24
25The design enables seamless transition for existing systems that require a
26dom0, and provides a new general capability to build and launch alternative
27configurations of virtual machines, including support for static partitioning
28and accelerated start of VMs during host boot, while adhering to the principles
29of least privilege. It incorporates the existing dom0less functionality,
30extended to fold in the new developments from the Hyperlaunch project, with
31support for both x86 and Arm platform architectures, building upon and
32replacing the earlier 'late hardware domain' feature for disaggregation of
33dom0.
34
35Hyperlaunch is designed to be flexible and reusable across multiple use cases,
36and our aim is to ensure that it is capable, widely exercised, comprehensively
37tested, and well understood by the Xen community.
38
39Document Structure
40==================
41
42This is the primary design document for Hyperlaunch, to provide an overview of
43the feature. Separate additional documents will cover specific aspects of
44Hyperlaunch in further detail, including:
45
46  - The Device Tree specification for Hyperlaunch metadata
47  - New Domain Roles for Xen and the Xen Security Modules (XSM) policy
48  - Passthrough of PCI devices with Hyperlaunch
49
50Approach
51========
52
53Born out of improving support for Dynamic Root of Trust for Measurement (DRTM),
54the Hyperlaunch project is focused on restructuring the system launch of Xen.
55The Hyperlaunch design provides a security architecture that builds on the
56principles of Least Privilege and Strong Isolation, achieving this through the
57disaggregation of system functions. It enables this with the introduction of a
58boot domain that works in conjunction with the hypervisor to provide the
59ability to launch multiple domains as part of host boot while maintaining a
60least privilege implementation.
61
62While the Hyperlaunch project inception was and continues to be driven by a
63focus on security through disaggregation, there are multiple use cases with a
64non-security focus that require or benefit from the ability to launch multiple
65domains at host boot. This was proven by the need that drove the implementation
66of the dom0less capability in the Arm branch of Xen.
67
68Hyperlaunch is designed to be flexible and reusable across multiple use cases,
69and our aim is to ensure that it is capable, widely exercised, comprehensively
70tested, and provides a robust foundation for current and emerging system launch
71requirements of the Xen community.
72
73
74Objectives
75----------
76
77* In general strive to maintain compatibility with existing Xen behavior
78* A default build of the hypervisor should be capable of booting both legacy-compatible and new styles of launch:
79
80        * classic Xen boot: starting a single, privileged Dom0
81        * classic Xen boot with late hardware domain: starting a Dom0 that transitions hardware access/control to another domain
82        * a dom0less boot: starting multiple domains without privilege assignment controls
83        * Hyperlaunch: starting one or more VMs, with flexible configuration
84
85* Preferred that it be managed via KCONFIG options to govern inclusion of support for each style
86* The selection between classic boot and Hyperlaunch boot should be automatic
87
88        * Preferred that it not require a kernel command line parameter for selection
89
90* It should not require modification to boot loaders
91* It should provide a user friendly interface for its configuration and management
92* It must provide a method for building systems that fallback to console access in the event of misconfiguration
93* It should be able to boot an x86 Xen environment without the need for a Dom0 domain
94
95
96Requirements and Design
97=======================
98
99Hyperlaunch is defined as the ability of a hypervisor to construct and start
100one or more virtual machines at system launch in a specific way. A hypervisor
101can support one or both modes of configuration, Hyperlaunch Static and
102Hyperlaunch Dynamic. The Hyperlaunch Static mode functions as a static
103partitioning hypervisor ensuring only the virtual machines started at system
104launch are running on the system. The Hyperlaunch Dynamic mode functions as a
105dynamic hypervisor allowing for additional virtual machines to be started after
106the initial virtual machines have started. The Xen hypervisor is capable of
107both modes of configuration from the same binary and when paired with its XSM
108flask, provides strong controls that enable fine grained system partitioning.
109
110Hypervisor Launch Landscape
111---------------------------
112
113This comparison table presents the distinctive capabilities of Hyperlaunch with
114reference to existing launch configurations currently available in Xen and
115other hypervisors.
116
117::
118
119 +---------------+-----------+------------+-----------+-------------+---------------------+
120 | **Xen Dom0**  | **Linux** | **Late**   | **Jail**  | **Xen**     | **Xen Hyperlaunch** |
121 | **(Classic)** | **KVM**   | **HW Dom** | **house** | **dom0less**+---------+-----------+
122 |               |           |            |           |             | Static  | Dynamic   |
123 +===============+===========+============+===========+=============+=========+===========+
124 | Hypervisor able to launch multiple VMs during host boot                                |
125 +---------------+-----------+------------+-----------+-------------+---------+-----------+
126 |               |           |            |     Y     |       Y     |    Y    |     Y     |
127 +---------------+-----------+------------+-----------+-------------+---------+-----------+
128 | Hypervisor supports Static Partitioning                                                |
129 +---------------+-----------+------------+-----------+-------------+---------+-----------+
130 |               |           |            |     Y     |       Y     |    Y    |           |
131 +---------------+-----------+------------+-----------+-------------+---------+-----------+
132 | Able to launch VMs dynamically after host boot                                         |
133 +---------------+-----------+------------+-----------+-------------+---------+-----------+
134 |       Y       |     Y     |      Y*    |     Y     |       Y*    |         |     Y     |
135 +---------------+-----------+------------+-----------+-------------+---------+-----------+
136 | Supports strong isolation between all VMs started at host boot                         |
137 +---------------+-----------+------------+-----------+-------------+---------+-----------+
138 |               |           |            |     Y     |       Y     |    Y    |     Y     |
139 +---------------+-----------+------------+-----------+-------------+---------+-----------+
140 | Enables flexible sequencing of VM start during host boot                               |
141 +---------------+-----------+------------+-----------+-------------+---------+-----------+
142 |               |           |            |           |             |    Y    |     Y     |
143 +---------------+-----------+------------+-----------+-------------+---------+-----------+
144 | Prevent all-powerful static root domain being launched at boot                         |
145 +---------------+-----------+------------+-----------+-------------+---------+-----------+
146 |               |           |            |           |       Y*    |    Y    |     Y     |
147 +---------------+-----------+------------+-----------+-------------+---------+-----------+
148 | Operates without a Highly-privileged management VM (eg. Dom0)                          |
149 +---------------+-----------+------------+-----------+-------------+---------+-----------+
150 |               |           |      Y*    |           |       Y*    |    Y    |     Y     |
151 +---------------+-----------+------------+-----------+-------------+---------+-----------+
152 | Operates without a privileged toolstack VM (Control Domain)                            |
153 +---------------+-----------+------------+-----------+-------------+---------+-----------+
154 |               |           |            |           |       Y*    |    Y    |           |
155 +---------------+-----------+------------+-----------+-------------+---------+-----------+
156 | Extensible VM configuration applied before launch of VMs at host boot                  |
157 +---------------+-----------+------------+-----------+-------------+---------+-----------+
158 |               |           |            |           |             |    Y    |     Y     |
159 +---------------+-----------+------------+-----------+-------------+---------+-----------+
160 | Flexible granular assignment of permissions and functions to VMs                       |
161 +---------------+-----------+------------+-----------+-------------+---------+-----------+
162 |               |           |            |           |             |    Y    |     Y     |
163 +---------------+-----------+------------+-----------+-------------+---------+-----------+
164 | Supports extensible VM measurement architecture for DRTM and attestation               |
165 +---------------+-----------+------------+-----------+-------------+---------+-----------+
166 |               |           |            |           |             |    Y    |     Y     |
167 +---------------+-----------+------------+-----------+-------------+---------+-----------+
168 | PCI passthrough configured at host boot                                                |
169 +---------------+-----------+------------+-----------+-------------+---------+-----------+
170 |               |           |            |           |             |    Y    |     Y     |
171 +---------------+-----------+------------+-----------+-------------+---------+-----------+
172
173
174Domain Construction
175-------------------
176
177An important aspect of the Hyperlaunch architecture is that the hypervisor
178performs domain construction for all the Initial Domains,  ie. it builds each
179domain that is described in the Launch Control Module. More specifically, the
180hypervisor will perform the function of *domain creation* for each Initial
181Domain: it allocates the unique domain identifier assigned to the virtual
182machine and records essential metadata about it in the internal data structure
183that enables scheduling the domain to run. It will also perform *basic domain
184construction*: build the initial page tables with data from the kernel and
185initial ramdisk supplied, and as appropriate for the domain type, populate the
186p2m table and ACPI tables.
187
188Subsequent to this, the boot domain can apply additional configuration to the
189initial domains from the data in the LCM, in *extended domain construction*.
190
191The benefits of this structure include:
192
193* Security: Contrains the permissions required by the boot domain: it does not
194  require the capability to create domains in this structure. This aligns with
195  the principles of least privilege.
196* Flexibility: Enables policy-based dynamic assignment of hardware by the boot
197  domain, customizable according to use-case and able to adapt to hardware
198  discovery
199* Compatibility: Supports reuse of familiar tools with use-case customized boot
200  domains.
201* Commonality: Reuses the same logic for initial basic domain building across
202  diverse Xen deployments.
203
204  * It aligns the x86 initial domain construction with the existing Arm
205    dom0less feature for construction of multiple domains at boot.
206
207  * The boot domain implementation may vary significantly with different
208    deployment use cases, whereas the hypervisor implementation is common.
209
210* Correctness: Increases confidence in the implementation of domain
211  construction, since it is performed by the hypervisor in well maintained and
212  centrally tested logic.
213* Performance: Enables launch for configurations where a fast start of
214  multiple domains at boot is a requirement.
215* Capability: Supports launch of advanced configurations where a sequenced
216  start of multiple domains is required, or multiple domains are involved in
217  startup of the running system configuration
218
219  * eg. for PCI passthrough on systems where the toolstack runs in a separate
220    domain to the hardware management.
221
222Please, see the ‘Hyperlaunch Device Tree’ design document, which describes the
223configuration module that is provided to the hypervisor by the bootloader.
224
225The hypervisor determines how these domains are started as host boot completes:
226in some systems the Boot Domain acts upon the extended boot configuration
227supplied as part of launch, performing configuration tasks for preparing the
228other domains for the hypervisor to commence running them.
229
230Common Boot Configurations
231--------------------------
232
233When looking across those that have expressed interest or discussed a need for
234launching multiple domains at host boot, the Hyperlaunch approach is to provide
235the means to start nearly any combination of domains. Below is an enumerated
236selection of common boot configurations for reference in the following section.
237
238Dynamic Launch with a Highly-Privileged Domain 0
239^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
240
241Hyperlaunch Classic: Dom0
242        This configuration mimics the classic Xen start and domain construction
243        where a single domain is constructed with all privileges and functions for
244        managing hardware and running virtualization toolstack software.
245
246Hyperlaunch Classic: Extended Launch Dom0
247        This configuration is where a Dom0 is started via a Boot Domain that runs
248        first. This is for cases where some preprocessing in a less privileged domain
249        is required before starting the all-privileged Domain 0.
250
251Hyperlaunch Classic: Basic Cloud
252        This configuration constructs a Dom0 that is started in parallel with some
253        number of workload domains.
254
255Hyperlaunch Classic: Cloud
256        This configuration builds a Dom0 and some number of workload domains, launched
257        via a Boot Domain that runs first.
258
259
260Static Launch Configurations: without a Domain 0 or a Control Domain
261^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262
263Hyperlaunch Static: Basic
264        Simple static partitioning where all domains that can be run on this system are
265        built and started during host boot and where no domain is started with the
266        Control Domain permissions, thus making it not possible to create/start any
267        further new domains.
268
269Hyperlaunch Static: Standard
270        This is a variation of the “Hyperlaunch Static: Basic” static partitioning
271        configuration with the introduction of a Boot Domain. This configuration allows
272        for use of a Boot Domain to be able to apply extended configuration
273        to the Initial Domains before they are started and
274        sequence the order in which they start.
275
276Hyperlaunch Static: Disaggregated
277        This is a variation of the “Hyperlaunch Static: Standard” configuration with
278        the introduction of a Boot Domain and an illustration that some functions can
279        be disaggregated to dedicated domains.
280
281Dynamic Launch of Disaggregated System Configurations
282^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
283
284Hyperlaunch Dynamic: Hardware Domain
285        This configuration mimics the existing Xen feature late hardware domain with
286        the one difference being that the hardware domain is constructed by the
287        hypervisor at startup instead of later by Dom0.
288
289Hyperlaunch Dynamic: Flexible Disaggregation
290        This configuration is similar to the “Hyperlaunch Classic: Dom0” configuration
291        except that it includes starting a separate hardware domain during Xen startup.
292        It is also similar to “Hyperlaunch Dynamic: Hardware Domain” configuration, but
293        it launches via a Boot Domain that runs first.
294
295Hyperlaunch Dynamic: Full Disaggregation
296        In this configuration it is demonstrated how it is possible to start a fully
297        disaggregated system: the virtualization toolstack runs in a Control Domain,
298        separate from the domains responsible for managing hardware, XenStore, the Xen
299        Console and Crash functions, each launched via a Boot Domain.
300
301
302Example Use Cases and Configurations
303^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
304
305The following example use cases can be matched to configurations listed in the
306previous section.
307
308Use case: Modern cloud hypervisor
309"""""""""""""""""""""""""""""""""
310
311**Option:** Hyperlaunch Classic: Cloud
312
313This configuration will support strong isolation for virtual TPM domains and
314measured launch in support of attestation to infrastructure management, while
315allowing the use of existing Dom0 virtualization toolstack software.
316
317Use case: Edge device with security or safety requirements
318""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
319
320**Option:** Hyperlaunch Static: Boot
321
322This configuration runs without requiring a highly-privileged Dom0, and enables
323extended VM configuration to be applied to the Initial VMs prior to launching
324them, optionally in a sequenced start.
325
326Use case: Client hypervisor
327"""""""""""""""""""""""""""
328
329**Option:** Hyperlaunch Dynamic: Flexible Disaggregation
330
331**Option:** Hyperlaunch Dynamic: Full Disaggregation
332
333These configurations enable dynamic client workloads, strong isolation for the
334domain running the virtualization toolstack software and each domain managing
335hardware, with PCI passthrough performed during host boot and support for
336measured launch.
337
338Hyperlaunch Disaggregated Launch
339--------------------------------
340
341
342Existing in Xen today are two primary permissions, *control domain* and
343*hardware domain*, and two functions, *console domain* and *xenstore domain*,
344that can be assigned to a domain. Traditionally all of these permissions and
345functions are all assigned to Dom0 at start and can then be delegated to other
346domains created by the toolstack in Dom0. With Hyperlaunch it becomes possible
347to assign these permissions and functions to any domain for which there is a
348definition provided at startup.
349
350Additionally, two further functions are introduced: the *recovery domain*,
351intended to assist with recovery from failures encountered starting VMs during
352host boot, and the *boot domain*, for performing aspects of domain construction
353during startup.
354
355Supporting the booting of each of the above common boot configurations is
356accomplished by considering the set of initial domains and the assignment of
357Xen’s permissions and functions, including the ones introduced by Hyperlaunch,
358to these domains. A discussion of these will be covered later but for now they
359are laid out in a table with a mapping to the common boot configurations. This
360table is not intended to be an exhaustive list of configurations and does not
361account for flask policy specified functions that are use case specific.
362
363In the table each number represents a separate domain being
364constructed by the Hyperlaunch construction path as Xen starts, and the
365designator, ``{n}`` signifies that there may be “n” additional domains that may
366be constructed that do not have any special role for a general Xen system.
367
368::
369
370 +-------------------+------------------+-----------------------------------+
371 | Configuration     |    Permission    |            Function               |
372 |                   +------+------+----+------+--------+--------+----------+
373 |                   | None | Ctrl | HW | Boot |Recovery| Console| Xenstore |
374 +===================+======+======+====+======+========+========+==========+
375 | Classic: Dom0     |      |  0   | 0  |      |   0    |   0    |    0     |
376 +-------------------+------+------+----+------+--------+--------+----------+
377 | Classic: Extended |      |  1   | 1  |  0   |   1    |   1    |    1     |
378 | Launch Dom0       |      |      |    |      |        |        |          |
379 +-------------------+------+------+----+------+--------+--------+----------+
380 | Classic:          | {n}  |  0   | 0  |      |   0    |   0    |    0     |
381 | Basic Cloud       |      |      |    |      |        |        |          |
382 +-------------------+------+------+----+------+--------+--------+----------+
383 | Classic: Cloud    | {n}  |  1   | 1  |  0   |   1    |   1    |    1     |
384 +-------------------+------+------+----+------+--------+--------+----------+
385 | Static: Basic     | {n}  |      | 0  |      |   0    |   0    |    0     |
386 +-------------------+------+------+----+------+--------+--------+----------+
387 | Static: Standard  | {n}  |      | 1  |  0   |   1    |   1    |    1     |
388 +-------------------+------+------+----+------+--------+--------+----------+
389 | Static:           | {n}  |      | 2  |  0   |   3    |   4    |    1     |
390 | Disaggregated     |      |      |    |      |        |        |          |
391 +-------------------+------+------+----+------+--------+--------+----------+
392 | Dynamic:          |      |  0   | 1  |      |   0    |   0    |    0     |
393 | Hardware Domain   |      |      |    |      |        |        |          |
394 +-------------------+------+------+----+------+--------+--------+----------+
395 | Dynamic: Flexible | {n}  |  1   | 2  |  0   |   1    |   1    |    1     |
396 | Disaggregation    |      |      |    |      |        |        |          |
397 +-------------------+------+------+----+------+--------+--------+----------+
398 | Dynamic: Full     | {n}  |  2   | 3  |  0   |   4    |   5    |    1     |
399 | Disaggregation    |      |      |    |      |        |        |          |
400 +-------------------+------+------+----+------+--------+--------+----------+
401
402Overview of Hyperlaunch Flow
403----------------------------
404
405Before delving into Hyperlaunch, a good basis to start with is an understanding
406of the current process to create a domain. A way to view this process starts
407with the core configuration which is the information the hypervisor requires to
408make the call to `domain_create`, followed by basic construction to provide the
409memory image to run, including the kernel and ramdisk. A subsequent step
410applies the extended configuration used by the toolstack to provide a domain
411with any additional configuration information. Until the extended configuration
412is completed, a domain has access to no resources except its allocated vcpus
413and memory. The exception to this is Dom0, which the hypervisor explicitly
414grants control and access to all system resources, except for those that only
415the hypervisor should have control over.  This exception for Dom0 is driven by
416the system structure with a monolithic Dom0 domain predating introduction of
417support for disaggregation into Xen, and the corresponding default assignment
418of multiple roles within the Xen system to Dom0.
419
420While not a different domain creation path, there does exist the Hardware
421Domain (hwdom), sometimes also referred to as late-Dom0. It is an early effort
422to disaggregate Dom0’s roles into a separate control domain and hardware
423domain. This capability is activated by the passing of a domain id to the
424`hardware_dom` kernel command line parameter, and the Xen hypervisor will then
425flag that domain id as the hardware domain. Later when the toolstack constructs
426a domain with that domain id as the requested domid, the hypervisor will
427transfer all device I/O from Dom0 to this domain. In addition it will also
428transfer the “host shutdown on domain shutdown” flag from Dom0 to the hardware
429domain. It is worth mentioning that this approach for disaggregation was
430created in this manner due to the inability of Xen to launch more than one
431domain at startup.
432
433Hyperlaunch Xen startup
434^^^^^^^^^^^^^^^^^^^^^^^
435
436The Hyperlaunch approach’s primary focus is on how to assign the roles
437traditionally granted to Dom0 to one or more domains at host boot. While the
438statement is simple to make, the implications are not trivial by any means.
439This also explains why the Hyperlaunch approach is orthogonal to the existing
440dom0less capability. The dom0less capability focuses on enabling the launch of
441multiple domains in parallel with Dom0 at host boot. A corollary for dom0less
442is that for systems that don’t require Dom0 after all guest domains have
443started, they are able to do the host boot without a Dom0. Though it should be
444noted that it may be possible to start  Dom0 at a later point. Whereas with
445Hyperlaunch, its approach of separating Dom0’s roles requires the ability to
446launch multiple domains at host boot. The direct consequences from this
447approach are profound and provide a myriad of possible configurations for which
448a sample of common boot configurations were already presented.
449
450To enable the Hyperlaunch approach a new alternative path for host boot within
451the hypervisor must be introduced. This alternative path effectively branches
452just before the current point of Dom0 construction and begins an alternate
453means of system construction. The determination if this alternate path should
454be taken is through the inspection of the boot chain. If the bootloader has
455loaded a specific configuration, as described later, it will enable Xen to
456detect that a Hyperlaunch configuration has been provided. Once a Hyperlaunch
457configuration is detected, this alternate path can be thought of as occurring
458in phases: domain creation, domain preparation, and launch finalization.
459
460Domain Creation
461"""""""""""""""
462
463The domain creation phase begins with Xen parsing the bootloader provided
464material, to understand the content of the modules provided. It will then load
465any microcode or XSM policy it discovers. For each domain configuration Xen
466finds, it parses the configuration to construct the necessary domain definition
467to instantiate an instance of the domain and leave it in a paused state. When
468all domain configurations have been instantiated as domains, if one of them is
469flagged as the Boot Domain, that domain will be unpaused starting the domain
470preparation phase. If there is no Boot Domain defined, then the domain
471preparation phase will be skipped and Xen will trigger the launch finalization
472phase.
473
474Domain Preparation Phase
475""""""""""""""""""""""""
476
477The domain preparation phase is an optional check point for the execution of a
478workload specific domain, the Boot Domain. While the Boot Domain is the first
479domain to run and has some degree of control over the system, it is extremely
480restricted in both system resource access and hypervisor operations. Its
481purpose is to:
482
483* Access the configuration provided by the bootloader
484* Finalize the configuration of the domains
485* Conduct any setup and launch related operations
486* Do an ordered unpause of domains that require an ordered start
487
488When the Boot Domain has completed, it will notify the hypervisor that it is
489done triggering the launch finalization phase.
490
491
492Launch Finalization
493"""""""""""""""""""
494
495The hypervisor handles the launch finalization phase which is equivalent to the
496clean up phase. As such the steps taken by the hypervisor, not necessarily in
497implementation order, are as follows,
498
499* Free the boot module chain
500* If a Boot Domain was used, reclaim Boot Domain resources
501* Unpause any domains still in a paused state
502* Boot Domain uses a reserved function thus can never be respawned
503
504While the focus thus far has been on how the Hyperlaunch capability will work,
505it is worth mentioning what it does not do or limit from occurring. It does not
506stop or inhibit the assigning of the control domain role which gives the domain
507the ability to create, start, stop, restart, and destroy domains or the
508hardware domain role which gives access to all I/O devices except those that
509the hypervisor has reserved for itself. In particular it is still possible to
510construct a domain with all the privileged roles, i.e. a Dom0, with or without
511the domain id being zero. In fact what limitations are imposed now become fully
512configurable without the risk of circumvention by an all privileged domain.
513
514Structuring of Hyperlaunch
515--------------------------
516
517The structure of Hyperlaunch is built around the existing capabilities of the
518host boot protocol. This approach was driven by the objective not to require
519modifications to the boot loader. The only requirement is that the boot loader
520supports the Multiboot2 (MB2) protocol. For UEFI boot, our recommendation is to
521use GRUB.efi to load Xen and the initial domain materials via the multiboot2
522method. On Arm platforms, Hyperlaunch is compatible with the existing interface
523for boot into the hypervisor.
524
525
526x86 Multiboot2
527^^^^^^^^^^^^^^
528
529The MB2 protocol has no concept of a manifest to tell the initial kernel what
530is contained in the chain, leaving it to the kernel to impose a loading
531convention, use magic number identification, or both. When considering the
532passing of multiple kernels, ramdisks, and domain configuration along with any
533existing modules already passed, there is no sane convention that could be
534imposed and magic number identification is nearly impossible when considering
535the objective not to impose unnecessary complication to the hypervisor.
536
537As it was alluded to previously, a manifest describing the contents in the MB2
538chain and how they relate within a Xen context is needed. To address this need
539the Launch Control Module (LCM) was designed to provide such a manifest. The
540LCM was designed to have a specific set of properties,
541
542* minimize the complexity of the parsing logic required by the hypervisor
543* allow for expanding and optional configuration fragments without breaking
544  backwards compatibility
545
546To enable automatic detection of a Hyperlaunch configuration, the LCM must be
547the first MB2 module in the MB2 module chain. The LCM is implemented using the
548Device Tree as defined in the Hyperlaunch Device Tree design document. With the
549LCM implemented in Device Tree, it has a magic number that enables the
550hypervisor to detect its presence when used in a Multiboot2 module chain. The
551hypervisor can confirm that it is a proper LCM Device Tree by checking for a
552compliant Hyperlaunch Device Tree. The Hyperlaunch Device Tree nodes are
553designed to allow,
554
555* for the hypervisor to parse only those entries it understands,
556* for packing custom information for a custom boot domain,
557* the ability to use a new LCM with an older hypervisor,
558* and the ability to use an older LCM with a new hypervisor.
559
560Arm Device Tree
561^^^^^^^^^^^^^^^
562
563As discussed the LCM is in Device Tree format and was designed to co-exist in
564the Device Tree ecosystem, and in particular in parallel with dom0less Device
565Tree entries. On Arm, Xen is already designed to boot from a host Device Tree
566description (dtb) file and the LCM entries can be embedded into this host dtb
567file. This makes detecting the LCM entries and supporting Hyperlaunch on Arm
568relatively straight forward. Relative to the described x86 approach, at the
569point where Xen inspects the first MB2 module, on Arm Xen will check if the top
570level LCM node exists in the host dtb file. If the LCM node does exist, then at
571that point it will enter into the same code path as the x86 entry would go.
572
573Xen hypervisor
574^^^^^^^^^^^^^^
575
576It was previously discussed at a higher level of the new host boot flow that
577will be introduced. Within this new flow is the configuration parsing and
578domain creation phase which will be expanded upon here. The hypervisor will
579inspect the LCM for a config node and if found will iterate through all modules
580nodes. The module nodes are used to identify if any modules contain microcode
581or an XSM policy. As it processes domain nodes, it will construct the domain
582using the node properties and the modules nodes. Once it has completed
583iterating through all the entries in the LCM, if a constructed domain has the
584Boot Domain attribute, it will then be unpaused. Otherwise the hypervisor will
585start the launch finalization phase.
586
587Boot Domain
588^^^^^^^^^^^
589
590Traditionally domain creation was controlled by the user within the Dom0
591environment whereby custom toolstacks could be implemented to impose
592requirements on the process. The Boot Domain is a means to enable the user to
593continue to maintain a degree of that control over domain creation but within a
594limited privilege environment. The Boot Domain will have access to the LCM and
595the boot chain along with access to a subset of the hypercall operations. When
596the Boot Domain is finished it will notify the hypervisor through a hypercall
597op.
598
599Recovery Domain
600^^^^^^^^^^^^^^^
601
602With the existing Dom0 host boot path, when a failure occurs there are several
603assumptions that can safely be made to get the user to a console for
604troubleshooting. With the Hyperlaunch host boot path those assumptions can no
605longer be made, thus a means is needed to get the user to a console in the case
606of a recoverable failure. The recovery domain is configured by a domain
607configuration entry in the LCM, in the same manner as the other initial
608domains, and it will not be unpaused at launch finalization unless a failure is
609encountered starting the initial domains.
610
611Xen has existing support for a Crash Environment where memory can be reserved
612at host boot and a kernel loaded into it, to be jumped into at any point while
613the system is running when a crash is detected. The Recovery Domain
614functionality is a separate, complementary capability. The Crash Environment
615replaces the previously active hypervisor and running guests, and enables a
616process for mounting disks to write out log information prior to rebooting the
617system. In contrast, the Recovery Domain is able to use the functionality of
618the Xen hypervisor, that is still present and running, to perform recovery
619handling for errors encountered with starting the initial domains.
620
621Deferred Design
622"""""""""""""""
623
624To be determined:
625
626* Define what is detected as a crash
627* Explain how crash detection is performed and which components are involved
628* Explain how the recovery domain is unpaused
629* Explain how and when the resources assigned to the recovery domain are reclaimed
630* Define what the recovery domain is able to do
631* Determine what permissions the recovery domain requires to perform its job
632
633
634Control Domain
635^^^^^^^^^^^^^^
636
637The concept of the Control Domain already exists within Xen as a boolean,
638`is_privileged`, that governs access to many of the privileged interfaces of
639the hypervisor that support a domain running a virtualization system toolstack.
640Hyperlaunch will allow the `is_privileged` flag to be set on any domain that is
641created at launch, rather than only a Dom0. It may potentially be set on
642multiple domains.
643
644Hardware Domain
645^^^^^^^^^^^^^^^
646
647The Hardware Domain is also an existing concept for Xen that is enabled through
648the `is_hardware_domain` check. With Hyperlaunch the previous process of I/O
649accesses being assigned to Dom0 for later transfer to the hardware domain would
650no longer be required. Instead during the configuration phase the Xen
651hypervisor would directly assign the I/O accesses to the domain with the
652hardware domain permission bit enabled.
653
654Console Domain
655^^^^^^^^^^^^^^
656
657Traditionally the Xen console is assigned to the control domain and then
658reassignable by the toolstack to another domain. With Hyperlaunch it becomes
659possible to construct a boot configuration where there is no control domain or
660have a use case where the Xen console needs to be isolated. As such it becomes
661necessary to be able to designate which of the initial domains should be
662assigned the Xen console. Therefore Hyperlaunch introduces the ability to
663specify an initial domain which the console is assigned along with a convention
664of ordered assignment for when there is no explicit assignment.
665
666Communication of Domain Configurations
667======================================
668
669There are several standard methods for an Operating System to access machine
670configuration and environment information: ACPI is common on x86 systems,
671whereas Device Tree is more typical on Arm platforms. There are currently
672implementations of both in Xen.
673
674* For dom0less, guest Device Trees are dynamically constructed by the
675  hypervisor to convey domain configuration data
676
677* For PVH dom0 on x86, ACPI tables are built by the hypervisor before the
678  domain is started
679
680Note that both of these mechanisms convey static data that is fixed prior to
681the point of domain construction. Hyperlaunch will retain both the existing
682ACPI and Device Tree methods.
683
684Communication of data between a Boot Domain and a Control Domain is of note
685since they may not be running concurrently: the method used will depend on
686their specific implementations, but one option available is to use Xen’s hypfs
687for transfer of basic data to support system bootstrap.
688
689-------------------------------------------------------------------------------
690
691Appendix
692========
693
694Appendix 1: Flow Sequence of Steps of a Hyperlaunch Boot
695--------------------------------------------------------
696
697Provided here is an ordered flow of a Hyperlaunch with a highlight logic
698decision points. Not all branch points are recorded, specifically for the
699variety of error conditions that may occur. ::
700
701  1. Hypervisor Startup:
702  2a. (x86) Inspect first module provided by the bootloader
703      a. Is the module an LCM
704          i. YES: proceed with the Hyperlaunch host boot path
705          ii. NO: proceed with a Dom0 host boot path
706  2b. (Arm) Inspect host dtb for `/chosen/hypervisor` node
707      a. Is the LCM present
708          i. YES: proceed with the Hyperlaunch host boot path
709          ii. NO: proceed with a Dom0/dom0less host boot path
710  3. Iterate through the LCM entries looking for the module description
711     entry
712      a. Check if any of the modules are microcode or policy and if so,
713         load
714  4. Iterate through the LCM entries processing all domain description
715     entries
716      a. Use the details from the Basic Configuration to call
717         `domain_create`
718      b. Record if a domain is flagged as the Boot Domain
719      c. Record if a domain is flagged as the Recovery Domain
720  5. Was a Boot Domain created
721      a. YES:
722          i. Attach console to Boot Domain
723          ii. Unpause Boot Domain
724          iii. Goto Boot Domain (step 6)
725      b. NO: Goto Launch Finalization (step 10)
726  6. Boot Domain:
727  7. Boot Domain comes online and may do any of the following actions
728      a. Process the LCM
729      b. Validate the MB2 chain
730      c. Make additional configuration settings for staged domains
731      d. Unpause any precursor domains
732      e. Set any runtime configurations
733  8. Boot Domain does any necessary cleanup
734  9. Boot Domain make hypercall op call to signal it is finished
735      i. Hypervisor reclaims all Boot Domain resources
736      ii. Hypervisor records that the Boot Domain ran
737      ii. Goto Launch Finalization (step 9)
738  10. Launch Finalization
739  11. If a configured domain was flagged to have the console, the
740      hypervisor assigns it
741  12. The hypervisor clears the LCM and bootloader loaded module,
742      reclaiming the memory
743  13. The hypervisor iterates through domains unpausing any domain not
744      flagged as the recovery domain
745
746
747Appendix 2: Considerations in Naming the Hyperlaunch Feature
748------------------------------------------------------------
749
750* The term “Launch” is preferred over “Boot”
751
752        * Multiple individual component boots can occur in the new system start
753          process; Launch is preferable for describing the whole process
754        * Fortunately there is consensus in the current group of stakeholders
755          that the term “Launch” is good and appropriate
756
757* The names we define must support becoming meaningful and simple to use
758  outside the Xen community
759
760        * They must be able to be resolved quickly via search engine to a clear
761          explanation (eg. Xen marketing material, documentation or wiki)
762        * We prefer that the terms be helpful for marketing communications
763        * Consequence: avoid the term “domain” which is Xen-specific and
764          requires a definition to be provided each time when used elsewhere
765
766
767* There is a need to communicate that Xen is  capable of being used as a Static
768  Partitioning hypervisor
769
770        * The community members using and maintaining dom0less are the current
771          primary stakeholders for this
772
773* There is a need to communicate that the new launch functionality provides new
774  capabilities not available elsewhere, and is more than just supporting Static
775  Partitioning
776
777        * No other hypervisor known to the authors of this document is capable
778          of providing what Hyperlaunch will be able to do. The launch sequence is
779          designed to:
780
781                * Remove dependency on a single, highly-privileged initial domain
782                * Allow the initial domains started to be independent and fully
783                  isolated from each other
784                * Support configurations where no further VMs can be launched
785                  once the initial domains have started
786                * Use a standard, extensible format for conveying VM
787                  configuration data
788                * Ensure that domain building of all initial domains is
789                  performed by the hypervisor from materials supplied by the
790                  bootloader
791                * Enable flexible configuration to be applied to all initial
792                  domains by an optional Boot Domain, that runs with limited
793                  privilege, before any other domain starts and obtains the VM
794                  configuration data from the bootloader materials via the
795                  hypervisor
796                * Enable measurements of all of the boot materials prior to
797                  their use, in a sequence with minimized privilege
798                * Support use-case-specific customized Boot Domains
799                * Complement the hypervisor’s existing ability to enforce
800                  policy-based Mandatory Access Control
801
802
803* “Static” and “Dynamic” have different and important meanings in different
804  communities
805
806        * Static and Dynamic Partitioning describe the ability to create new
807          virtual machines, or not, after the initial host boot process
808          completes
809        * Static and Dynamic Root of Trust describe the nature of the trust
810          chain for a measured launch. In this case Static is referring to the
811          fact that the trust chain is fixed and non-repeatable until the next
812          host reboot or shutdown. Whereas Dynamic in this case refers to the
813          ability to conduct the measured launch at any time and potentially
814          multiple times before the next host reboot or shutdown.
815
816                * We will be using Hyperlaunch with both Static and Dynamic
817                  Roots of Trust, to launch both Static and Dynamically
818                  Partitioned Systems, and being clear about exactly which
819                  combination is being started will be very important (eg. for
820                  certification processes)
821
822        * Consequence: uses of “Static” and “Dynamic” need to be qualified if
823          they are incorporated into the naming of this functionality
824
825                * This can be done by adding the preceding, stronger branded
826                  term: “Hyperlaunch”, before “Static” or “Dynamic”
827                * ie. “Hyperlaunch Static” describes launch of a
828                  Statically Partitioned system
829                * and “Hyperlaunch Dynamic” describes launch of a
830                  Dynamically Partitioned system.
831                * In practice, this means that “Hyperlaunch Static” describes
832                  starting a Static Partitioned system where no new domains can
833                  be started later (ie. no VM has the Control Domain
834                  permission), whereas “Hyperlaunch Dynamic” will launch some
835                  VM with the Control Domain permission, able to create VMs
836                  dynamically at a later point.
837
838**Naming Proposal:**
839
840* New Term: “Hyperlaunch” : the ability of a hypervisor to construct and start
841  one or more virtual machines at system launch, in the following manner:
842
843        * The hypervisor must build all of the domains that it starts at host
844          boot
845
846                * Similar to the way the dom0 domain is built by the hypervisor
847                  today, and how dom0less works: it will run a loop to build
848                  them all, driven from the configuration provided
849                * This is a requirement for ensuring that there is Strong
850                  Isolation between each of the initial VMs
851
852        * A single file contains the VM configs (“Launch Control Module”: LCM,
853          in Device Tree binary format) is provided to the hypervisor
854
855                * The hypervisor parses it and builds domains
856                * If the LCM config says that a Boot Domain should run first,
857                  then the LCM file itself is made available to the Boot Domain
858                  for it to parse and act on, to invoke operations via the
859                  hypervisor to apply additional configuration to the other VMs
860                  (ie. executing a privilege-constrained toolstack)
861
862* New Term: “Hyperlaunch Static”: starts a Static Partitioned system, where
863  only the virtual machines started at system launch are running on the system
864
865* New Term: “Hyperlaunch Dynamic”: starts a system where virtual machines may
866  be dynamically added after the initial virtual machines have started.
867
868
869In the default configuration, Xen will be capable of both styles of Hyperlaunch
870from the same hypervisor binary, when paired with its XSM flask, provides
871strong controls that enable fine grained system partitioning.
872
873
874* Retiring Term: “DomB”: will no longer be used to describe the optional first
875  domain that is started. It is replaced with the more general term: “Boot
876  Domain”.
877
878* Retiring Term: “Dom0less”: it is to be replaced with “Hyperlaunch Static”
879
880
881Appendix 3: Terminology
882-----------------------
883
884To help ensure clarity in reading this document, the following is the
885definition of terminology used within this document.
886
887
888Basic Configuration
889    the minimal information the hypervisor requires to instantiate a domain instance
890
891
892Boot Domain
893    a domain with limited privileges launched by the hypervisor during a
894    Multiple Domain Boot that runs as the first domain started. In the Hyperlaunch
895    architecture, it is responsible for assisting with higher level operations of
896    the domain setup process.
897
898
899Classic Launch
900    a backwards-compatible host boot that ends with the launch of a single domain (Dom0)
901
902
903Console Domain
904    a domain that has the Xen console assigned to it
905
906
907Control Domain
908    a privileged domain that has been granted Control Domain permissions which
909    are those that are required by the Xen toolstack for managing other domains.
910    These permissions are a subset of those that are granted to Dom0.
911
912
913Device Tree
914    a standardized data structure, with defined file formats, for describing
915    initial system configuration
916
917
918Disaggregation
919    the separation of system roles and responsibilities across multiple
920    connected components that work together to provide functionality
921
922
923Dom0
924    the highly-privileged, first and only domain started at host boot on a
925    conventional Xen system
926
927
928Dom0less
929    an existing feature of Xen on Arm that provides Multiple Domain Boot
930
931
932Domain
933    a running instance of a virtual machine; (as the term is commonly used in
934    the Xen Community)
935
936DomB
937     the former name for Hyperlaunch
938
939
940Extended Configuration
941    any configuration options for a domain beyond its Basic Configuration
942
943
944Hardware Domain
945    a privileged domain that has been granted permissions to access and manage
946    host hardware. These permissions are a subset of those that are granted to
947    Dom0.
948
949
950Host Boot
951    the system startup of Xen using the configuration provided by the bootloader
952
953
954Hyperlaunch
955    a flexible host boot that ends with the launch of one or more domains
956
957
958Initial Domain
959    a domain that is described in the LCM that is run as part of a multiple
960    domain boot. This includes the Boot Domain, Recovery Domain and all Launched
961    Domains.
962
963
964Late Hardware Domain
965    a Hardware Domain that is launched after host boot has already completed
966    with a running Dom0. When the Late Hardware Domain is started, Dom0
967    relinquishes and transfers the permissions to access and manage host hardware
968    to it..
969
970
971Launch Control Module (LCM)
972    A file supplied to the hypervisor by the bootloader that contains
973    configuration data for the hypervisor and the initial set of virtual machines
974    to be run at boot
975
976
977Launched Domain
978    a domain, aside from the boot domain and recovery domain, that is started as
979    part of a multiple domain boot and remains running once the boot process is
980    complete
981
982
983Multiple Domain Boot
984    a system configuration where the hypervisor and multiple virtual machines
985    are all launched when the host system hardware boots
986
987
988Recovery Domain
989    an optional fallback domain that the hypervisor may start in the event of a
990    detectable error encountered during the multiple domain boot process
991
992
993System Device Tree
994    this is the product of an Arm community project to extend Device Tree to
995    cover more aspects of initial system configuration
996
997
998Appendix 4: Copyright License
999-----------------------------
1000
1001This work is licensed under a Creative Commons Attribution 4.0 International
1002License. A copy of this license may be obtained from the Creative Commons
1003website (https://creativecommons.org/licenses/by/4.0/legalcode).
1004
1005| Contributions by:
1006| Christopher Clark are Copyright © 2021 Star Lab Corporation
1007| Daniel P. Smith are Copyright  © 2021 Apertus Solutions, LLC
1008