1# Xen Hypervisor Command Line Options
2
3This document covers the command line options which the Xen
4Hypervisor.
5
6## Types of parameter
7
8Most parameters take the form `option=value`.  Different options on
9the command line should be space delimited.  All options are case
10sensitive, as are all values unless explicitly noted.
11
12### Boolean (`<boolean>`)
13
14All boolean option may be explicitly enabled using a `value` of
15> `yes`, `on`, `true`, `enable` or `1`
16
17They may be explicitly disabled using a `value` of
18> `no`, `off`, `false`, `disable` or `0`
19
20In addition, a boolean option may be enabled by simply stating its
21name, and may be disabled by prefixing its name with `no-`.
22
23####Examples
24
25Enable noreboot mode
26> `noreboot=true`
27
28Disable x2apic support (if present)
29> `x2apic=off`
30
31Enable synchronous console mode
32> `sync_console`
33
34Explicitly specifying any value other than those listed above is
35undefined, as is stacking a `no-` prefix with an explicit value.
36
37### Integer (`<integer>`)
38
39An integer parameter will default to decimal and may be prefixed with
40a `-` for negative numbers.  Alternatively, a hexadecimal number may be
41used by prefixing the number with `0x`, or an octal number may be used
42if a leading `0` is present.
43
44Providing a string which does not validly convert to an integer is
45undefined.
46
47### Size (`<size>`)
48
49A size parameter may be any integer, with a single size suffix
50
51* `T` or `t`: TiB (2^40)
52* `G` or `g`: GiB (2^30)
53* `M` or `m`: MiB (2^20)
54* `K` or `k`: KiB (2^10)
55* `B` or `b`: Bytes
56
57Without a size suffix, the default will be kilo.  Providing a suffix
58other than those listed above is undefined.
59
60### String
61
62Many parameters are more complicated and require more intricate
63configuration.  The detailed description of each individual parameter
64specify which values are valid.
65
66### List
67
68Some options take a comma separated list of values.
69
70### Combination
71
72Some parameters act as combinations of the above, most commonly a mix
73of Boolean and String.  These are noted in the relevant sections.
74
75## Parameter details
76
77### acpi
78> `= force | ht | noirq | <boolean> | verbose`
79
80**String**, or **Boolean** to disable.
81
82By default, Xen will scan the DMI data and blacklist certain systems
83which are known to have broken ACPI setups.  Providing `acpi=force`
84will cause Xen to ignore the blacklist and attempt to use all ACPI
85features.
86
87Using `acpi=ht` causes Xen to parse the ACPI tables enough to
88enumerate all CPUs, but will not use other ACPI features.  This is not
89common, and only has an effect if your system is blacklisted.
90
91The `acpi=noirq` option causes Xen to not parse the ACPI MADT table
92looking for IO-APIC entries.  This is also not common, and any system
93which requires this option to function should be blacklisted.
94Additionally, this will not prevent Xen from finding IO-APIC entries
95from the MP tables.
96
97Further, any of the boolean false options can be used to disable ACPI
98usage entirely.
99
100Because responsibility for ACPI processing is shared between Xen and
101the domain 0 kernel this option is automatically propagated to the
102domain 0 command line.
103
104Finally, `acpi=verbose` will enable per-processor information logging
105which may otherwise be too noisy in particular on large systems.
106
107### acpi_apic_instance
108> `= <integer>`
109
110Specify which ACPI MADT table to parse for APIC information, if more
111than one is present.
112
113### acpi_pstate_strict (x86)
114> `= <boolean>`
115
116> Default: `false`
117
118Enforce checking that P-state transitions by the ACPI cpufreq driver
119actually result in the nominated frequency to be established. A warning
120message will be logged if that isn't the case.
121
122### acpi_skip_timer_override (x86)
123> `= <boolean>`
124
125Instruct Xen to ignore timer-interrupt override.
126
127### acpi_sleep (x86)
128> `= s3_bios | s3_mode`
129
130`s3_bios` instructs Xen to invoke video BIOS initialization during S3
131resume.
132
133`s3_mode` instructs Xen to set up the boot time (option `vga=`) video
134mode during S3 resume.
135
136### allow_unsafe (x86)
137> `= <boolean>`
138
139> Default: `false`
140
141Force boot on potentially unsafe systems. By default Xen will refuse
142to boot on systems with the following errata:
143
144* AMD Erratum 121. Processors with this erratum are subject to a guest
145  triggerable Denial of Service. Override only if you trust all of
146  your PV guests.
147
148### altp2m (Intel)
149> `= <boolean>`
150
151> Default: `false`
152
153Permit multiple copies of host p2m.
154
155### apic (x86)
156> `= bigsmp | default`
157
158Override Xen's logic for choosing the APIC driver.  By default, if
159there are more than 8 CPUs, Xen will switch to `bigsmp` over
160`default`.
161
162### apicv (Intel)
163> `= <boolean>`
164
165> Default: `true`
166
167Permit Xen to use APIC Virtualisation Extensions.  This is an optimisation
168available as part of VT-x, and allows hardware to take care of the guests APIC
169handling, rather than requiring emulation in Xen.
170
171### apic_verbosity (x86)
172> `= verbose | debug`
173
174Increase the verbosity of the APIC code from the default value.
175
176### arat (x86)
177> `= <boolean>`
178
179> Default: `true`
180
181Permit Xen to use "Always Running APIC Timer" support on compatible hardware
182in combination with cpuidle.  This option is only expected to be useful for
183developers wishing Xen to fall back to older timing methods on newer hardware.
184
185### argo
186    = List of [ <bool>, mac-permissive=<bool> ]
187
188Controls for the Argo hypervisor-mediated interdomain communication service.
189
190The functionality that this option controls is only available when Xen has been
191compiled with the build setting for Argo enabled in the build configuration.
192
193Argo is a interdomain communication mechanism, where Xen acts as the central
194point of authority.  Guests may register memory rings to recieve messages,
195query the status of other domains, and send messages by hypercall, all subject
196to appropriate auditing by Xen.  Argo is disabled by default.
197
198*   The `mac-permissive` boolean controls whether wildcard receive rings may be
199    registered (`mac-permissive=1`) or may not be registered
200    (`mac-permissive=0`).
201
202    This option is disabled by default, to protect domains from a DoS by a
203    buggy or malicious other domain spamming the ring.
204
205### asid (x86)
206> `= <boolean>`
207
208> Default: `true`
209
210Permit Xen to use Address Space Identifiers.  This is an optimisation which
211tags the TLB entries with an ID per vcpu.  This allows for guest TLB flushes
212to be performed without the overhead of a complete TLB flush.
213
214### async-show-all (x86)
215> `= <boolean>`
216
217> Default: `false`
218
219Forces all CPUs' full state to be logged upon certain fatal asynchronous
220exceptions (watchdog NMIs and unexpected MCEs).
221
222### ats (x86)
223> `= <boolean>`
224
225> Default: `false`
226
227Permits Xen to set up and use PCI Address Translation Services.  This is a
228performance optimisation for PCI Passthrough.
229
230**WARNING: Xen cannot currently safely use ATS because of its synchronous wait
231loops for Queued Invalidation completions.**
232
233### availmem
234> `= <size>`
235
236> Default: `0` (no limit)
237
238Specify a maximum amount of available memory, to which Xen will clamp
239the e820 table.
240
241### badpage
242> `= List of [ <integer> | <integer>-<integer> ]`
243
244Specify that certain pages, or certain ranges of pages contain bad
245bytes and should not be used.  For example, if your memory tester says
246that byte `0x12345678` is bad, you would place `badpage=0x12345` on
247Xen's command line.
248
249### bootscrub
250> `= idle | <boolean>`
251
252> Default: `idle`
253
254Scrub free RAM during boot.  This is a safety feature to prevent
255accidentally leaking sensitive VM data into other VMs if Xen crashes
256and reboots.
257
258In `idle` mode, RAM is scrubbed in background on all CPUs during idle-loop
259with a guarantee that memory allocations always provide scrubbed pages.
260This option reduces boot time on machines with a large amount of RAM while
261still providing security benefits.
262
263### bootscrub_chunk
264> `= <size>`
265
266> Default: `128M`
267
268Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock
269and not running softirqs. Reduce this if softirqs are not being run frequently
270enough. Setting this to a high value may cause boot failure, particularly if
271the NMI watchdog is also enabled.
272
273### cet
274    = List of [ shstk=<bool>, ibt=<bool> ]
275
276    Applicability: x86
277
278Controls for the use of Control-flow Enforcement Technology.  CET is group a
279of hardware features designed to combat Return-oriented Programming (ROP, also
280call/jmp COP/JOP) attacks.
281
282CET is incompatible with 32bit PV guests.  If any CET sub-options are active,
283they will override the `pv=32` boolean to `false`.  Backwards compatibility
284can be maintained with the pv-shim mechanism.
285
286*   The `shstk=` boolean controls whether Xen uses Shadow Stacks for its own
287    protection.
288
289    The option is available when `CONFIG_XEN_SHSTK` is compiled in, and
290    generally defaults to `true` on hardware supporting CET-SS.  Specifying
291    `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
292    is available in hardware.
293
294    Some hardware suffers from an issue known as Supervisor Shadow Stack
295    Fracturing.  On such hardware, Xen will default to not using Shadow Stacks
296    when virtualised.  Specifying `cet=shstk` will override this heuristic and
297    enable Shadow Stacks unilaterally.
298
299*   The `ibt=` boolean controls whether Xen uses Indirect Branch Tracking for
300    its own protection.
301
302    The option is available when `CONFIG_XEN_IBT` is compiled in, and defaults
303    to `true` on hardware supporting CET-IBT.  Specifying `cet=no-ibt` will
304    cause Xen not to use Indirect Branch Tracking even when support is
305    available in hardware.
306
307### clocksource (x86)
308> `= pit | hpet | acpi | tsc`
309
310If set, override Xen's default choice for the platform timer.
311Having TSC as platform timer requires being explicitly set. This is because
312TSC can only be safely used if CPU hotplug isn't performed on the system. On
313some platforms, the "maxcpus" option may need to be used to further adjust
314the number of allowed CPUs.  When running on platforms that can guarantee a
315monotonic TSC across sockets you may want to adjust the "tsc" command line
316parameter to "stable:socket".
317
318### cmci-threshold (Intel)
319> `= <integer>`
320
321> Default: `2`
322
323Specify the event count threshold for raising Corrected Machine Check
324Interrupts.  Specifying zero disables CMCI handling.
325
326### cmos-rtc-probe (x86)
327> `= <boolean>`
328
329> Default: `false`
330
331Flag to indicate whether to probe for a CMOS Real Time Clock irrespective of
332ACPI indicating none to be there.
333
334### com1 (x86)
335### com2 (x86)
336> `= <baud>[/<base-baud>][,[DPS][,[<io-base>|pci|amt][,[<irq>|msi][,[<port-bdf>][,[<bridge-bdf>]]]]]]`
337
338Both option `com1` and `com2` follow the same format.
339
340* `<baud>` may be either an integer baud rate, or the string `auto` if
341  the bootloader or other earlier firmware has already set it up.
342* Optionally, the base baud rate (usually the highest baud rate the
343  device can communicate at) can be specified.
344* `DPS` represents the number of data bits, the parity, and the number
345  of stop bits.
346  * `D` is an integer between 5 and 8 for the number of data bits.
347  * `P` is a single character representing the type of parity:
348      * `n` No
349      * `o` Odd
350      * `e` Even
351      * `m` Mark
352      * `s` Space
353  * `S` is an integer 1 or 2 for the number of stop bits.
354* `<io-base>` is an integer which specifies the IO base port for UART
355  registers.
356* `<irq>` is the IRQ number to use, or `0` to use the UART in poll
357  mode only, or `msi` to set up a Message Signaled Interrupt.
358* `<port-bdf>` is the PCI location of the UART, in
359  `<bus>:<device>.<function>` notation.
360* `<bridge-bdf>` is the PCI bridge behind which is the UART, in
361  `<bus>:<device>.<function>` notation.
362* `pci` indicates that Xen should scan the PCI bus for the UART,
363  avoiding Intel AMT devices.
364* `amt` indicated that Xen should scan the PCI bus for the UART,
365  including Intel AMT devices if present.
366
367A typical setup for most situations might be `com1=115200,8n1`
368
369In addition to the above positional specification for UART parameters,
370name=value pair specfications are also supported. This is used to add
371flexibility for UART devices which require additional UART parameter
372configurations.
373
374The comma separation still delineates positional parameters. Hence,
375unless the parameter is explicitly specified with name=value option, it
376will be considered a positional parameter.
377
378The syntax consists of
379com1=(comma-separated positional parameters),(comma separated name-value pairs)
380
381The accepted name keywords for name=value pairs are:
382
383* `baud` - accepts integer baud rate (eg. 115200) or `auto`
384* `bridge`- Similar to bridge-bdf in positional parameters.
385            Used to determine the PCI bridge to access the UART device.
386            Notation is xx:xx.x `<bus>:<device>.<function>`
387* `clock-hz`- accepts large integers to setup UART clock frequencies.
388              Do note - these values are multiplied by 16.
389* `data-bits` - integer between 5 and 8
390* `dev` - accepted values are `pci` OR `amt`. If this option
391          is used to specify if the serial device is pci-based. The io_base
392          cannot be specified when `dev=pci` or `dev=amt` is used.
393* `io-base` - accepts integer which specified IO base port for UART registers
394* `irq` - IRQ number to use
395* `parity` - accepted values are same as positional parameters
396* `port` - Used to specify which port the PCI serial device is located on
397           Notation is xx:xx.x `<bus>:<device>.<function>`
398* `reg-shift` - register shifts required to set UART registers
399* `reg-width` - register width required to set UART registers
400                (only accepts 1 and 4)
401* `stop-bits` - only accepts 1 or 2 for the number of stop bits
402
403The following are examples of correct specifications:
404
405    com1=115200,8n1,0x3f8,4
406    com1=115200,8n1,0x3f8,4,reg-width=4,reg-shift=2
407    com1=baud=115200,parity=n,stop-bits=1,io-base=0x3f8,reg-width=4
408
409### conring_size
410> `= <size>`
411
412> Default: `conring_size=16k`
413
414Specify the size of the console ring buffer.
415
416### console
417> `= List of [ vga | com1[H,L] | com2[H,L] | pv | dbgp | ehci | xhci | none ]`
418
419> Default: `console=com1,vga`
420
421Specify which console(s) Xen should use.
422
423`vga` indicates that Xen should try and use the vga graphics adapter.
424
425`com1` and `com2` indicates that Xen should use serial ports 1 and 2
426respectively.  Optionally, these arguments may be followed by an `H` or
427`L`.  `H` indicates that transmitted characters will have their MSB
428set, while received characters must have their MSB set.  `L` indicates
429the converse; transmitted and received characters will have their MSB
430cleared.  This allows a single port to be shared by two subsystems
431(e.g. console and debugger).
432
433`pv` indicates that Xen should use Xen's PV console. This option is
434only available when used together with `pv-in-pvh`.
435
436`dbgp` or `ehci` indicates that Xen should use a USB2 debug port.
437
438`xhci` indicates that Xen should use a USB3 debug port.
439
440`none` indicates that Xen should not use a console.  This option only
441makes sense on its own.
442
443### console_timestamps
444> `= none | date | datems | boot | raw`
445
446> Default: `none`
447
448> Can be modified at runtime
449
450Specify which timestamp format Xen should use for each console line.
451
452* `none`: No timestamps
453* `date`: Date and time information
454    * `[YYYY-MM-DD HH:MM:SS]`
455* `datems`: Date and time, with milliseconds
456    * `[YYYY-MM-DD HH:MM:SS.mmm]`
457* `boot`: Seconds and microseconds since boot
458    * `[SSSSSS.uuuuuu]`
459+ `raw`: Raw platform ticks, architecture and implementation dependent
460    * `[XXXXXXXXXXXXXXXX]`
461
462For compatibility with the older boolean parameter, specifying
463`console_timestamps` alone will enable the `date` option.
464
465### console_to_ring
466> `= <boolean>`
467
468> Default: `false`
469
470Flag to indicate whether all guest console output should be copied
471into the console ring buffer.
472
473### conswitch
474> `= <switch char>[x]`
475
476> Default: `conswitch=a`
477
478> Can be modified at runtime
479
480Specify which character should be used to switch serial input between
481Xen and dom0.  The required sequence is CTRL-&lt;switch char&gt; three
482times.
483
484The optional trailing `x` indicates that Xen should not automatically
485switch the console input to dom0 during boot.  Any other value,
486including omission, causes Xen to automatically switch to the dom0
487console during dom0 boot.  Use `conswitch=ax` to keep the default switch
488character, but for xen to keep the console.
489
490### core_parking
491> `= power | performance`
492
493> Default: `power`
494
495### cpu_type (x86)
496> `= arch_perfmon`
497
498If set, force use of the performance counters for oprofile, rather than detecting
499available support.
500
501### cpufreq
502> `= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]]`
503
504> Default: `xen`
505
506Indicate where the responsibility for driving power states lies.  Note that the
507choice of `dom0-kernel` is deprecated and not supported by all Dom0 kernels.
508
509* Default governor policy is ondemand.
510* `<maxfreq>` and `<minfreq>` are integers which represent max and min processor frequencies
511  respectively.
512* `verbose` option can be included as a string or also as `verbose=<integer>`
513  for `xen`.  It is a boolean for `hwp`.
514* `hwp` selects Hardware-Controlled Performance States (HWP) on supported Intel
515  hardware.  HWP is a Skylake+ feature which provides better CPU power
516  management.  The default is disabled.  If `hwp` is selected, but hardware
517  support is not available, Xen will fallback to cpufreq=xen.
518* `<hdc>` is a boolean to enable Hardware Duty Cycling (HDC).  HDC enables the
519  processor to autonomously force physical package components into idle state.
520  The default is enabled, but the option only applies when `hwp` is enabled.
521
522There is also support for `;`-separated fallback options:
523`cpufreq=hwp;xen,verbose`.  This first tries `hwp` and falls back to `xen` if
524unavailable.  Note: The `verbose` suboption is handled globally.  Setting it
525for either the primary or fallback option applies to both irrespective of where
526it is specified.
527
528Note: grub2 requires to escape or quote ';', so `"cpufreq=hwp;xen"` should be
529specified within double quotes inside grub.cfg.  Refer to the grub2
530documentation for more information.
531
532### cpuid (x86)
533> `= List of comma separated booleans`
534
535This option allows for fine tuning of the facilities Xen will use, after
536accounting for hardware capabilities as enumerated via CPUID.
537
538Unless otherwise noted, options only have any effect in their negative form,
539to hide the named feature(s).  Ignoring a feature using this mechanism will
540cause Xen not to use the feature, nor offer them as usable to guests.
541
542Currently accepted:
543
544The Speculation Control hardware features `srbds-ctrl`, `md-clear`, `ibrsb`,
545`stibp`, `ibpb`, `l1d-flush` and `ssbd` are used by default if available and
546applicable.  They can all be ignored.
547
548`rdrand` and `rdseed` have multiple interactions.
549
550*   For Special Register Buffer Data Sampling (SRBDS, XSA-320, CVE-2020-0543),
551    RDRAND and RDSEED can be ignored.
552
553    Due to the absence of microcode to address SRBDS on IvyBridge client
554    hardware, the RDRAND feature is hidden by default for guests, unless
555    `rdrand` is used in its positive form.  Irrespective of the setting here,
556    VMs can use RDRAND if explicitly enabled in guest config file, and VMs
557    already using RDRAND can migrate in.
558
559*   The RDRAND feature is disabled by default on AMD Fam15/16 systems, due to
560    possible malfunctions after ACPI S3 suspend/resume.  `rdrand` may be used
561    in its positive form to override Xen's default behaviour on these systems,
562    and make the feature fully usable.
563
564### cpuid_mask_cpu
565> `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
566
567> Applicability: AMD
568
569If none of the other **cpuid_mask_\*** options are given, Xen has a set of
570pre-configured masks to make the current processor appear to be
571family/revision specified.
572
573See below for general information on masking.
574
575**Warning: This option is not fully effective on Family 15h processors or
576later.**
577
578### cpuid_mask_ecx
579### cpuid_mask_edx
580### cpuid_mask_ext_ecx
581### cpuid_mask_ext_edx
582### cpuid_mask_l7s0_eax
583### cpuid_mask_l7s0_ebx
584### cpuid_mask_thermal_ecx
585### cpuid_mask_xsave_eax
586> `= <integer>`
587
588> Applicability: x86.  Default: `~0` (all bits set)
589
590The availability of these options are model specific.  Some processors don't
591support any of them, and no processor supports all of them.  Xen will ignore
592options on processors which are lacking support.
593
594These options can be used to alter the features visible via the `CPUID`
595instruction.  Settings applied here take effect globally, including for Xen
596and all guests.
597
598Note: Since Xen 4.7, it is no longer necessary to mask a host to create
599migration safety in heterogeneous scenarios.  All necessary CPUID settings
600should be provided in the VM configuration file.  Furthermore, it is
601recommended not to use this option, as doing so causes an unnecessary
602reduction of features at Xen's disposal to manage guests.
603
604### cpuidle (x86)
605> `= <boolean>`
606
607### cpuinfo (x86)
608> `= <boolean>`
609
610### crash-debug-debugkey
611### crash-debug-hwdom
612### crash-debug-kexeccmd
613### crash-debug-panic
614### crash-debug-watchdog
615> `= <string>`
616
617> Can be modified at runtime
618
619Specify debug-key actions in cases of crashes. Each of the parameters applies
620to a different crash reason. The `<string>` is a sequence of debug key
621characters, with `+` having the special meaning of a 10 millisecond pause.
622
623`crash-debug-debugkey` will be used for crashes induced by the `C` debug
624key (i.e. manually induced crash).
625
626`crash-debug-hwdom` denotes a crash of dom0.
627
628`crash-debug-kexeccmd` is an explicit request of dom0 to continue with the
629kdump kernel via kexec. Only available on hypervisors built with CONFIG_KEXEC.
630
631`crash-debug-panic` is a crash of the hypervisor.
632
633`crash-debug-watchdog` is a crash due to the watchdog timer expiring.
634
635It should be noted that dumping diagnosis data to the console can fail in
636multiple ways (missing data, hanging system, ...) depending on the reason
637of the crash, which might have left the hypervisor in a bad state. In case
638a debug-key action leads to another crash recursion will be avoided, so no
639additional debug-key actions will be performed in this case. A crash in the
640early boot phase will not result in any debug-key action, as the system
641might not yet be in a state where the handlers can work.
642
643So e.g. `crash-debug-watchdog=0+0r` would dump dom0 state twice with 10
644milliseconds between the two state dumps, followed by the run queues of the
645hypervisor, if the system crashes due to a watchdog timeout.
646
647Depending on the reason of the system crash it might happen that triggering
648some debug key action will result in a hang instead of dumping data and then
649doing a reboot or crash dump.
650
651### crashinfo_maxaddr
652> `= <size>`
653
654> Default: `4G`
655
656Specify the maximum address to allocate certain structures, if used in
657combination with the **low_crashinfo** command line option.
658
659### crashkernel
660> `= <ramsize-range>:<size>[,...][{@,<}<offset>]`
661> `= <size>[{@,<}<offset>]`
662> `= <size>,below=offset`
663
664Specify sizes and optionally placement of the crash kernel reservation
665area.  The `<ramsize-range>:<size>` pairs indicate how much memory to
666set aside for a crash kernel (`<size>`) for a given range of installed
667RAM (`<ramsize-range>`).  Each `<ramsize-range>` is of the form
668`<start>-[<end>]`.
669
670A trailing `@<offset>` specifies the exact address this area should be
671placed at, whereas `<` in place of `@` just specifies an upper bound of
672the address range the area should fall into.
673
674< and below are synonyomous, the latter being useful for grub2 systems
675which would otherwise require escaping of the < option
676
677
678### credit2_balance_over
679> `= <integer>`
680
681### credit2_balance_under
682> `= <integer>`
683
684### credit2_cap_period_ms
685> `= <integer>`
686
687> Default: `10`
688
689Domains subject to a cap receive a replenishment of their runtime budget
690once every cap period interval. Default is 10 ms. The amount of budget
691they receive depends on their cap. For instance, a domain with a 50% cap
692will receive 50% of 10 ms, so 5 ms.
693
694### credit2_load_precision_shift
695> `= <integer>`
696
697> Default: `18`
698
699Specify the number of bits to use for the fractional part of the
700values involved in Credit2 load tracking and load balancing math.
701
702### credit2_load_window_shift
703> `= <integer>`
704
705> Default: `30`
706
707Specify the number of bits to use to represent the length of the
708window (in nanoseconds) we use for load tracking inside Credit2.
709This means that, with the default value (30), we use
7102^30 nsec ~= 1 sec long window.
711
712Load tracking is done by means of a variation of exponentially
713weighted moving average (EWMA). The window length defined here
714is what tells for how long we give value to previous history
715of the load itself. In fact, after a full window has passed,
716what happens is that we discard all previous history entirely.
717
718A short window will make the load balancer quick at reacting
719to load changes, but also short-sighted about previous history
720(and hence, e.g., long term load trends). A long window will
721make the load balancer thoughtful of previous history (and
722hence capable of capturing, e.g., long term load trends), but
723also slow in responding to load changes.
724
725The default value of `1 sec` is rather long.
726
727### credit2_runqueue
728> `= cpu | core | socket | node | all`
729
730> Default: `socket`
731
732Specify how host CPUs are arranged in runqueues. Runqueues are kept
733balanced with respect to the load generated by the vCPUs running on
734them. Smaller runqueues (as in with `core`) means more accurate load
735balancing (for instance, it will deal better with hyperthreading),
736but also more overhead.
737
738Available alternatives, with their meaning, are:
739* `cpu`: one runqueue per each logical pCPUs of the host;
740* `core`: one runqueue per each physical core of the host;
741* `socket`: one runqueue per each physical socket (which often,
742            but not always, matches a NUMA node) of the host;
743* `node`: one runqueue per each NUMA node of the host;
744* `all`: just one runqueue shared by all the logical pCPUs of
745         the host
746
747Regardless of the above choice, Xen attempts to respect
748`sched_credit2_max_cpus_runqueue` limit, which may mean more than one runqueue
749for the `all` value. If that isn't intended, raise
750the `sched_credit2_max_cpus_runqueue` value.
751
752### dbgp
753> `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
754> `= xhci[ <integer> | @pci<bus>:<slot>.<func> ][,share=<bool>|hwdom]`
755
756Specify the USB controller to use, either by instance number (when going
757over the PCI busses sequentially) or by PCI device (must be on segment 0).
758
759Use `ehci` for EHCI debug port, use `xhci` for XHCI debug capability.
760XHCI driver will wait indefinitely for the debug host to connect - make sure
761the cable is connected.
762The `share` option for xhci controls who else can use the controller:
763* `no`: use the controller exclusively for console, even hardware domain
764  (dom0) cannot use it
765* `hwdom`: hardware domain may use the controller too, ports not used for debug
766  console will be available for normal devices; this is the default
767* `yes`: the controller can be assigned to any domain; it is not safe to assign
768  the controller to untrusted domain
769
770Choosing `share=hwdom` (the default) or `share=yes` allows a domain to reset the
771controller, which may cause small portion of the console output to be lost.
772
773The `share=yes` configuration is not security supported.
774
775### debug_stack_lines
776> `= <integer>`
777
778> Default: `20`
779
780Limits the number lines printed in Xen stack traces.
781
782### debugtrace
783> `= [cpu:]<size>`
784
785> Default: `128`
786
787Specify the size of the console debug trace buffer. By specifying `cpu:`
788additionally a trace buffer of the specified size is allocated per cpu.
789The debug trace feature is only enabled in debugging builds of Xen.
790
791### dit (x86/Intel)
792> `= <boolean>`
793
794> Default: `CONFIG_DIT_DEFAULT`
795
796Specify whether Xen and guests should operate in Data Independent Timing
797mode (Intel calls this DOITM, Data Operand Independent Timing Mode). Note
798that enabling this option cannot guarantee anything beyond what underlying
799hardware guarantees (with, where available and known to Xen, respective
800tweaks applied).
801
802### dma_bits
803> `= <integer>`
804
805Specify the bit width of the DMA heap.
806
807### dom0
808    = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
809                cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)
810
811    = List of [ sve=<integer> ] (Arm64)
812
813Controls for how dom0 is constructed on x86 systems.
814
815*   The `pv` and `pvh` options select the virtualisation mode of dom0.
816
817    The `pv` option is only available when `CONFIG_PV` is compiled in.  The
818    `pvh` option is only available when `CONFIG_HVM` is compiled in.  When
819    both options are compiled in, the default is PV.
820
821    In addition, the following requirements must be met:
822
823    *   The dom0 kernel selected by the boot loader must be capable of the
824        selected mode.
825    *   For a PVH dom0, the hardware must have VT-x/SVM extensions available.
826
827*   The `shadow` boolean allows dom0 to be explicitly constructed using shadow
828    paging.  This option is unavailable when `CONFIG_SHADOW_PAGING` is
829    disabled.
830
831    For PVH, dom0 defaults to using HAP on capable hardware, and falls back to
832    shadow paging otherwise.  A PVH dom0 cannot be used if Xen is compiled
833    without shadow paging support, and the hardware lacks HAP support.
834
835    For PV, the use of dom0 shadow mode is only for development purposes.  PV
836    guests do no require any paging support by default.
837
838*   The `verbose` boolean is intended for diagnostics, and prints out extra
839    information during the dom0 build.  It defaults to the compile time choice
840    of `CONFIG_VERBOSE_DEBUG`.
841
842*   The `cpuid-faulting` boolean is an interim option, is only applicable to
843    PV dom0, and defaults to true.
844
845    Before Xen 4.13, the domain builder logic for guest construction depended
846    on seeing host CPUID values to function correctly.  As a result, CPUID
847    Faulting was never activated for PV dom0's, even on capable hardware.
848
849    In Xen 4.13, the domain builder logic has been fixed, and no longer has
850    this dependency.  As a consequence, CPUID Faulting is activated by default
851    even for PV dom0's.
852
853    However, as PV dom0's have always seen host CPUID data in the past, there
854    is a chance that further dependencies exist.  This boolean can be used to
855    restore the pre-4.13 behaviour.  If specifying `no-cpuid-faulting` fixes
856    an issue in dom0, please report a bug.
857
858*   The `msr-relaxed` boolean is an interim option, and defaults to false.
859
860    In Xen 4.15, the default behaviour for unhandled MSRs has been changed,
861    to avoid leaking host data into guests, and to avoid breaking guest
862    logic which uses \#GP probing to identify the availability of MSRs.
863
864    However, this new stricter behaviour has the possibility to break
865    guests, and a more 4.14-like behaviour can be selected by specifying
866    `dom0=msr-relaxed`.
867
868    If using this option is necessary to fix an issue, please report a bug.
869
870Enables features on dom0 on Arm systems.
871
872*   The `sve` integer parameter enables Arm SVE usage for Dom0 and sets the
873    maximum SVE vector length, the option is applicable only to Arm64 Dom0
874    kernels.
875    A value equal to 0 disables the feature, this is the default value.
876    Values below 0 means the feature uses the maximum SVE vector length
877    supported by hardware, if SVE is supported.
878    Values above 0 explicitly set the maximum SVE vector length for Dom0,
879    allowed values are from 128 to maximum 2048, being multiple of 128.
880    Please note that when the user explicitly specifies the value, if that value
881    is above the hardware supported maximum SVE vector length, the domain
882    creation will fail and the system will stop, the same will occur if the
883    option is provided with a positive non zero value, but the platform doesn't
884    support SVE.
885
886### dom0-cpuid
887    = List of comma separated booleans
888
889    Applicability: x86
890
891This option allows for fine tuning of the facilities dom0 will use, after
892accounting for hardware capabilities and Xen settings as enumerated via CPUID.
893
894Options are accepted in positive and negative form, to enable or disable
895specific features.  All selections via this mechanism are subject to normal
896CPU Policy safety and dependency logic.
897
898This option is intended for developers to opt dom0 into non-default features,
899and is not intended for use in production circumstances.  If using this option
900is necessary to fix an issue, please report a bug.
901
902### dom0-iommu
903    = List of [ passthrough=<bool>, strict=<bool>, map-inclusive=<bool>,
904                map-reserved=<bool>, none ]
905
906Controls for the dom0 IOMMU setup.
907
908*   The `passthrough` boolean controls whether IOMMU translation functionality
909    is disabled for devices in dom0 (`passthrough=1`) or whether the IOMMU is
910    used to ensure that dom0 can only DMA to its permitted areas of RAM
911    (`passthrough=0`).
912
913    This option is only applicable to x86 PV dom0's, and defaults to false.
914
915    Some older Intel VT-d hardware isn't capable of disabling translation
916    functionality on a per-device basis, and will cause this option to be
917    ignored and assumed to be 0.  Similar behaviour on such systems is only
918    available by fully disabling all IOMMUs.
919
920    This option is hardwired to false for x86 PVH dom0's (where a non-identity
921    transform is required for dom0 to function), and is ignored for ARM.
922
923*   The `strict` boolean is applicable to x86 PV dom0's only and defaults to
924    false.  It controls whether dom0 can have IOMMU mappings for all domain
925    RAM in the system, or only for its allocated RAM (and grant mappings etc.)
926
927    This option is hardwired to true for x86 PVH dom0's (as RAM belonging to
928    other domains in the system don't live in a compatible address space), and
929    is ignored for ARM.
930
931*   The `map-inclusive` boolean is applicable to x86 PV dom0's, and sets up
932    identity IOMMU mappings for all non-RAM regions below 4GB except for
933    unusable ranges, and ranges belonging to Xen.
934
935    Typically, some devices in a system use bits of RAM for communication, and
936    these areas should be listed as reserved in the E820 table and identified
937    via RMRR or IVMD entries in the ACPI tables, so Xen can ensure that they
938    are identity-mapped in the IOMMU.  However, some firmware makes mistakes,
939    and this option is a coarse-grain workaround for those errors.
940
941    Where possible, finer grain corrections should be made with the `rmrr=`,
942    `ivmd=`, `ivrs_hpet[]=`, or `ivrs_ioapic[]=` command line options.
943
944    This option is disabled by default, and deprecated and intended for
945    removal in future versions of Xen.  If specifying `map-inclusive` is the
946    only way to make your system boot, please report a bug.
947
948*   The `map-reserved` functionality is very similar to `map-inclusive`.
949
950    The differences from `map-inclusive` are that `map-reserved` is applicable
951    to both x86 PV and PVH dom0's, is enabled by default, and represents a
952    subset of the correction by only mapping reserved memory regions rather
953    than all non-RAM regions.
954
955*   The `none` option is intended for development purposes only, and skips
956    certain safety checks pertaining to the correct IOMMU configuration for
957    dom0 to boot.
958
959    Incorrect use of this option may result in a malfunctioning system.
960
961### dom0_ioports_disable (x86)
962> `= List of <hex>-<hex>`
963
964Specify a list of IO ports to be excluded from dom0 access.
965
966### dom0_max_vcpus
967
968Either:
969
970> `= <integer>`.
971
972The number of VCPUs to give to dom0.  This number of VCPUs can be more
973than the number of PCPUs on the host.  The default is the number of
974PCPUs.
975
976Or:
977
978> `= <min>-<max>` where `<min>` and `<max>` are integers.
979
980Gives dom0 a number of VCPUs equal to the number of PCPUs, but always
981at least `<min>` and no more than `<max>`.  Using `<min>` may give
982more VCPUs than PCPUs.  `<min>` or `<max>` may be omitted and the
983defaults of 1 and unlimited respectively are used instead.
984
985For example, with `dom0_max_vcpus=4-8`:
986
987>        Number of
988>     PCPUs | Dom0 VCPUs
989>      2    |  4
990>      4    |  4
991>      6    |  6
992>      8    |  8
993>     10    |  8
994
995### dom0_mem (ARM)
996> `= <size>`
997
998Set the amount of memory for the initial domain (dom0). It must be
999greater than zero. This parameter is required.
1000
1001### dom0_mem (x86)
1002> `= List of ( min:<sz> | max:<sz> | <sz> )`
1003
1004Set the amount of memory for the initial domain (dom0). If a size is
1005positive, it represents an absolute value.  If a size is negative, it
1006is subtracted from the total available memory.
1007
1008* `<sz>` specifies the exact amount of memory.
1009* `min:<sz>` specifies the minimum amount of memory.
1010* `max:<sz>` specifies the maximum amount of memory.
1011
1012If `<sz>` is not specified, the default is all the available memory
1013minus some reserve.  The reserve is 1/16 of the available memory or
1014128 MB (whichever is smaller).
1015
1016The amount of memory will be at least the minimum but never more than
1017the maximum (i.e., `max` overrides the `min` option).  If there isn't
1018enough memory then as much as possible is allocated.
1019
1020`max:<sz>` also sets the maximum reservation (the maximum amount of
1021memory dom0 can balloon up to).  If this is omitted then the maximum
1022reservation is unlimited.
1023
1024For example, to set dom0's initial memory allocation to 512MB but
1025allow it to balloon up as far as 1GB use `dom0_mem=512M,max:1G`
1026
1027> `<sz>` is: `<size> | [<size>+]<frac>%`
1028> `<frac>` is an integer < 100
1029
1030* `<frac>` specifies a fraction of host memory size in percent.
1031
1032So `<sz>` being `1G+25%` on a 256 GB host would result in 65 GB.
1033
1034If you use this option then it is highly recommended that you disable
1035any dom0 autoballooning feature present in your toolstack. See the
1036_xl.conf(5)_ man page or [Xen Best
1037Practices](https://wiki.xen.org/wiki/Xen_Best_Practices#Xen_dom0_dedicated_memory_and_preventing_dom0_memory_ballooning).
1038
1039This option doesn't have effect if pv-shim mode is enabled.
1040
1041### dom0_nodes (x86)
1042
1043> `= List of [ <integer> | relaxed | strict ]`
1044
1045> Default: `strict`
1046
1047Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created
1048and memory assigned to Dom0 will be adjusted to match the node
1049restrictions set up here. Note that the values to be specified here are
1050ACPI PXM ones, not Xen internal node numbers. `relaxed` sets up vCPU
1051affinities to prefer but be not limited to the specified node(s).
1052
1053### dom0_vcpus_pin
1054> `= <boolean>`
1055
1056> Default: `false`
1057
1058Pin dom0 vcpus to their respective pcpus
1059
1060### dtuart (ARM)
1061> `= path [:options]`
1062
1063> Default: `""`
1064
1065Specify the full path in the device tree for the UART.  If the path doesn't
1066start with `/`, it is assumed to be an alias.  The options are device specific.
1067
1068### e820-mtrr-clip (x86)
1069> `= <boolean>`
1070
1071Flag that specifies if RAM should be clipped to the highest cacheable
1072MTRR.
1073
1074> Default: `true` on Intel CPUs, otherwise `false`
1075
1076### e820-verbose (x86)
1077> `= <boolean>`
1078
1079> Default: `false`
1080
1081Flag that enables verbose output when processing e820 information and
1082applying clipping.
1083
1084### edd (x86)
1085> `= off | on | skipmbr`
1086
1087Control retrieval of Extended Disc Data (EDD) from the BIOS during
1088boot.
1089
1090### edid (x86)
1091> `= no | force`
1092
1093Either force retrieval of monitor EDID information via VESA DDC, or
1094disable it (edid=no). This option should not normally be required
1095except for debugging purposes.
1096
1097### efi
1098    = List of [ rs=<bool>, attr=no|uc ]
1099
1100Controls for interacting with the system Extended Firmware Interface.
1101
1102*   The `rs` boolean controls whether Runtime Services are used.  By default,
1103    Xen uses Runtime Services itself, and proxies certain calls on behalf of
1104    dom0.  Selecting `rs=0` prohibits all use of Runtime Services.
1105
1106*   The `attr=` string exists to specify what to do with memory regions of
1107    unknown/unrecognised cacheability.  `attr=no` is the default and will
1108    leave the memory regions unmapped, while `attr=uc` will map them as fully
1109    uncacheable.
1110
1111### ept
1112> `= List of [ ad=<bool>, pml=<bool>, exec-sp=<bool> ]`
1113
1114> Applicability: Intel
1115
1116Extended Page Tables are a feature of Intel's VT-x technology, whereby
1117hardware manages the virtualisation of HVM guest pagetables.  EPT was
1118introduced with the Nehalem architecture.
1119
1120*   The `ad` boolean controls hardware tracking of Access and Dirty bits in the
1121    EPT pagetables, and was first introduced in Broadwell Server.
1122
1123    By default, Xen will use A/D tracking when available in hardware, except
1124    on Avoton processors affected by erratum AVR41.  Explicitly choosing
1125    `ad=0` will disable the use of A/D tracking on capable hardware, whereas
1126    choosing `ad=1` will cause tracking to be used even on AVR41-affected
1127    hardware.
1128
1129*   The `pml` boolean controls the use of Page Modification Logging, which is
1130    also introduced in Broadwell Server.
1131
1132    PML is a feature whereby the processor generates a list of pages which
1133    have been dirtied.  This is necessary information for operations such as
1134    live migration, and having the processor maintain the list of dirtied
1135    pages is more efficient than traditional software implementations where
1136    all guest writes trap into Xen so the dirty bitmap can be maintained.
1137
1138    By default, Xen will use PML when it is available in hardware.  PML
1139    functionally depends on A/D tracking, so choosing `ad=0` will implicitly
1140    disable PML.  `pml=0` can be used to prevent the use of PML on otherwise
1141    capable hardware.
1142
1143*   The `exec-sp` boolean controls whether EPT superpages with execute
1144    permissions are permitted.  In general this is good for performance.
1145
1146    However, on processors vulnerable CVE-2018-12207, HVM guest kernels can
1147    use executable superpages to crash the host.  By default, executable
1148    superpages are disabled on affected hardware.
1149
1150    If HVM guest kernels are trusted not to mount a DoS against the system,
1151    this option can enabled to regain performance.
1152
1153    This boolean may be modified at runtime using `xl set-parameters
1154    ept=[no-]exec-sp` to switch between fast and secure.
1155
1156    *   When switching from secure to fast, preexisting HVM domains will run
1157        at their current performance until they are rebooted; new domains will
1158        run without any overhead.
1159
1160    *   When switching from fast to secure, all HVM domains will immediately
1161        suffer a performance penalty.
1162
1163    **Warning: No guarantee is made that this runtime option will be retained
1164      indefinitely, or that it will retain this exact behaviour.  It is
1165      intended as an emergency option for people who first chose fast, then
1166      change their minds to secure, and wish not to reboot.**
1167
1168### extra_guest_irqs (x86)
1169> `= [<domU number>][,<dom0 number>]`
1170
1171> Default: `32,<variable>`
1172
1173Change the number of PIRQs available for guests.  The optional first number is
1174common for all domUs, while the optional second number (preceded by a comma)
1175is for dom0.  Changing the setting for domU has no impact on dom0 and vice
1176versa.  For example to change dom0 without changing domU, use
1177`extra_guest_irqs=,512`.  The default value for Dom0 and an eventual separate
1178hardware domain is architecture dependent.  The upper limit for both values on
1179x86 is such that the resulting total number of IRQs can't be higher than 32768.
1180Note that specifying zero as domU value means zero, while for dom0 it means
1181to use the default.  Note further that the Dom0 setting has no useful meaning
1182for the PVH case; use of the option may have an adverse effect there, though.
1183
1184### ext_regions (Arm)
1185> `= <boolean>`
1186
1187> Default : `true`
1188
1189Flag to enable or disable support for extended regions for Dom0 and
1190Dom0less DomUs.
1191
1192Extended regions are ranges of unused address space exposed to the guest
1193as "safe to use" for special memory mappings. Disable if your board
1194device tree is incomplete.
1195
1196### flask
1197> `= permissive | enforcing | late | disabled`
1198
1199> Default: `enforcing`
1200
1201Specify how the FLASK security server should be configured.  This option is only
1202available if the hypervisor was compiled with FLASK support.  This can be
1203enabled by running either:
1204- make -C xen config and enabling XSM and FLASK.
1205- make -C xen menuconfig and enabling 'FLux Advanced Security Kernel support' and 'Xen Security Modules support'
1206
1207* `permissive`: This is intended for development and is not suitable for use
1208  with untrusted guests.  If a policy is provided by the bootloader, it will be
1209  loaded; errors will be reported to the ring buffer but will not prevent
1210  booting.  The policy can be changed to enforcing mode using "xl setenforce".
1211* `enforcing`: This will cause the security server to enter enforcing mode prior
1212  to the creation of domain 0.  If an valid policy is not provided by the
1213  bootloader and no built-in policy is present, the hypervisor will not continue
1214  booting.
1215* `late`: This disables loading of the built-in security policy or the policy
1216  provided by the bootloader.  FLASK will be enabled but will not enforce access
1217  controls until a policy is loaded by a domain using "xl loadpolicy".  Once a
1218  policy is loaded, FLASK will run in enforcing mode unless "xl setenforce" has
1219  changed that setting.
1220* `disabled`: This causes the XSM framework to revert to the dummy module.  The
1221  dummy module provides the same security policy as is used when compiling the
1222  hypervisor without support for XSM.  The xsm_op hypercall can also be used to
1223  switch to this mode after boot, but there is no way to re-enable FLASK once
1224  the dummy module is loaded.
1225
1226### font
1227> `= <height>` where height is `8x8 | 8x14 | 8x16`
1228
1229Specify the font size when using the VESA console driver.
1230
1231### force-ept (Intel)
1232> `= <boolean>`
1233
1234> Default: `false`
1235
1236Allow EPT to be enabled when VMX feature `VM_ENTRY_LOAD_GUEST_PAT` is not
1237present.
1238
1239*Warning:*
1240Due to CVE-2013-2212, VMX feature `VM_ENTRY_LOAD_GUEST_PAT` is by default
1241required as a prerequisite for using EPT.  If you are not using PCI Passthrough,
1242or trust the guest administrator who would be using passthrough, then the
1243requirement can be relaxed.  This option is particularly useful for nested
1244virtualization, to allow the L1 hypervisor to use EPT even if the L0 hypervisor
1245does not provide `VM_ENTRY_LOAD_GUEST_PAT`.
1246
1247### gnttab
1248> `= List of [ max-ver:<integer>, transitive=<bool>, transfer=<bool> ]`
1249
1250> Default (Arm): `gnttab=max-ver:1`
1251> Default (x86,PV): `gnttab=max-ver:2,transitive,transfer`
1252> Default (x86,HVM): `gnttab=max-ver:2,transitive`
1253
1254Control various aspects of the grant table behaviour available to guests.
1255
1256* `max-ver` Select the maximum grant table version to offer to guests.  Valid
1257version are 1 and 2.
1258* `transitive` Permit or disallow the use of transitive grants.  Note that the
1259use of grant table v2 without transitive grants is an ABI breakage from the
1260guests point of view.
1261* `transfer` Permit or disallow the GNTTABOP_transfer operation of the
1262grant table hypercall.  Note that disallowing GNTTABOP_transfer is an ABI
1263breakage from the guests point of view.  This option is only available on
1264hypervisors configured to support PV guests.
1265
1266The usage of gnttab v2 is not security supported on ARM platforms.
1267
1268### gnttab_max_frames
1269> `= <integer>`
1270
1271> Default: `64`
1272
1273> Can be modified at runtime
1274
1275Specify the default upper bound on the number of frames which any domain may
1276use as part of its grant table unless a different value is specified at domain
1277creation.
1278
1279Note this value is the effective upper bound for dom0.
1280
1281### gnttab_max_maptrack_frames
1282> `= <integer>`
1283
1284> Default: `1024`
1285
1286> Can be modified at runtime
1287
1288Specify the default upper bound on the number of frames which any domain may
1289use as part of its maptrack array unless a different value is specified at
1290domain creation.
1291
1292Note this value is the effective upper bound for dom0.
1293
1294### global-pages
1295    = <boolean>
1296
1297    Applicability: x86
1298    Default: true unless running virtualized on AMD or Hygon hardware
1299
1300Control whether to use global pages for PV guests, and thus the need to
1301perform TLB flushes by writing to CR4.  This is a performance trade-off.
1302
1303AMD SVM does not support selective trapping of CR4 writes, which means that a
1304global TLB flush (two CR4 writes) takes two VMExits, and massively outweigh
1305the benefit of using global pages to begin with.  This case is easy for Xen to
1306spot, and is accounted for in the default setting.
1307
1308Other cases where this option might be a benefit is on VT-x hardware when
1309selective CR4 writes are not supported/enabled by the hypervisor, or in any
1310virtualised case using shadow paging.  These are not easy for Xen to spot, so
1311are not accounted for in the default setting.
1312
1313### guest_loglvl
1314> `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
1315
1316> Default: `guest_loglvl=none/warning`
1317
1318> Can be modified at runtime
1319
1320Set the logging level for Xen guests.  Any log message with equal more
1321more importance will be printed.
1322
1323The optional `<rate-limited level>` option instructs which severities
1324should be rate limited.
1325
1326### hap (x86)
1327> `= <boolean>`
1328
1329> Default: `true`
1330
1331Flag to globally enable or disable support for Hardware Assisted
1332Paging (HAP)
1333
1334### hap_1gb (x86)
1335> `= <boolean>`
1336
1337> Default: `true`
1338
1339Flag to enable 1 GB host page table support for Hardware Assisted
1340Paging (HAP).
1341
1342### hap_2mb (x86)
1343> `= <boolean>`
1344
1345> Default: `true`
1346
1347Flag to enable 2 MB host page table support for Hardware Assisted
1348Paging (HAP).
1349
1350### hardware_dom
1351> `= <domid>`
1352
1353> Default: `0`
1354
1355Enable late hardware domain creation using the specified domain ID.  This is
1356intended to be used when domain 0 is a stub domain which builds a disaggregated
1357system including a hardware domain with the specified domain ID.  This option is
1358supported only when compiled with XSM on x86.
1359
1360### hest_disable
1361> ` = <boolean>`
1362
1363> Default: `false`
1364
1365Control Xens use of the APEI Hardware Error Source Table, should one be found.
1366
1367### highmem-start (x86)
1368> `= <size>`
1369
1370Specify the memory boundary past which memory will be treated as highmem (x86
1371debug hypervisor only).
1372
1373### hmp-unsafe (arm)
1374> `= <boolean>`
1375
1376> Default : `false`
1377
1378Say yes at your own risk if you want to enable heterogenous computing
1379(such as big.LITTLE). This may result to an unstable and insecure
1380platform, unless you manually specify the cpu affinity of all domains so
1381that all vcpus are scheduled on the same class of pcpus (big or LITTLE
1382but not both). vcpu migration between big cores and LITTLE cores is not
1383supported. See docs/misc/arm/big.LITTLE.txt for more information.
1384
1385When the hmp-unsafe option is disabled (default), CPUs that are not
1386identical to the boot CPU will be parked and not used by Xen.
1387
1388### hpet
1389    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
1390
1391    Applicability: x86
1392
1393Controls Xen's use of the system's High Precision Event Timer.  By default,
1394Xen will use an HPET when available and not subject to errata.  Use of the
1395HPET can be disabled by specifying `hpet=0`.
1396
1397 * The `broadcast` boolean is disabled by default, but forces Xen to keep
1398   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
1399   enabled.  This then also affects raising of the RTC interrupt.
1400
1401 * The `legacy-replacement` boolean allows for control over whether Legacy
1402   Replacement mode is enabled.
1403
1404   Legacy Replacement mode is intended for hardware which does not have an
1405   8254 PIT, and allows the HPET to be configured into a compatible mode.
1406   Intel chipsets from Skylake/ApolloLake onwards can turn the PIT off for
1407   power saving reasons, and there is no platform-agnostic mechanism for
1408   discovering this.
1409
1410   By default, Xen will not change hardware configuration, unless the PIT
1411   appears to be absent, at which point Xen will try to enable Legacy
1412   Replacement mode before falling back to pre-IO-APIC interrupt routing
1413   options.
1414
1415   This behaviour can be inhibited by specifying `legacy-replacement=0`.
1416   Alternatively, this mode can be enabled unconditionally (if available) by
1417   specifying `legacy-replacement=1`.
1418
1419### hpetbroadcast (x86)
1420> `= <boolean>`
1421
1422Deprecated alternative of `hpet=broadcast`.
1423
1424### hvm_debug (x86)
1425> `= <integer>`
1426
1427The specified value is a bit mask with the individual bits having the
1428following meaning:
1429
1430>     Bit  0 - debug level 0 (unused at present)
1431>     Bit  1 - debug level 1 (Control Register logging)
1432>     Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
1433>     Bit  3 - debug level 3 (unused at present)
1434>     Bit  4 - I/O operation logging
1435>     Bit  5 - vMMU logging
1436>     Bit  6 - vLAPIC general logging
1437>     Bit  7 - vLAPIC timer logging
1438>     Bit  8 - vLAPIC interrupt logging
1439>     Bit  9 - vIOAPIC logging
1440>     Bit 10 - hypercall logging
1441>     Bit 11 - MSR operation logging
1442
1443Recognized in debug builds of the hypervisor only.
1444
1445### hvm_fep (x86)
1446> `= <boolean>`
1447
1448> Default: `false`
1449
1450Allow use of the Forced Emulation Prefix in HVM guests, to allow emulation of
1451arbitrary instructions.
1452
1453This option is intended for development and testing purposes.
1454
1455*Warning*
1456As this feature opens up the instruction emulator to arbitrary
1457instruction from an HVM guest, don't use this in production system. No
1458security support is provided when this flag is set.
1459
1460### hvm_port80 (x86)
1461> `= <boolean>`
1462
1463> Default: `true`
1464
1465Specify whether guests are to be given access to physical port 80
1466(often used for debugging purposes), to override the DMI based
1467detection of systems known to misbehave upon accesses to that port.
1468
1469### idle_latency_factor (x86)
1470> `= <integer>`
1471
1472### ioapic_ack (x86)
1473> `= old | new`
1474
1475> Default: `new` unless directed-EOI is supported
1476
1477### iommu
1478    = List of [ <bool>, verbose, debug, force, required,
1479                quarantine=<bool>|scratch-page,
1480                sharept, superpages, intremap, intpost, crash-disable,
1481                snoop, qinval, igfx, amd-iommu-perdev-intremap,
1482                dom0-{passthrough,strict} ]
1483
1484    All sub-options are boolean in nature.
1485
1486I/O Memory Memory Units perform a function similar to the CPU MMU (hence the
1487name), but typically exist as a discrete device, integrated as part of a PCI
1488Root Complex.  The most common configuration is to have one IOMMU per package
1489(for on-die PCIe devices and directly attached PCIe lanes), and one IOMMU
1490covering the remaining I/O in the system.
1491
1492The functionality in an IOMMU commonly falls into two orthogonal categories:
1493
14941.  DMA remapping which uses a pagetable-like hierarchical structure and maps
1495    I/O Virtual Addresses (DFNs - Device Frame Numbers in Xen's terminology)
1496    to System Physical Addresses (MFNs - Machine Frame Numbers in Xen's
1497    terminology).
1498
14992.  Interrupt Remapping, which controls incoming Message Signalled Interrupt
1500    requests, including their routing to specific CPUs.
1501
1502IOMMU functionality can be used to provide a translation which the hardware
1503device driver isn't aware of (e.g. PCI Passthrough and a native driver inside
1504the guest) and/or to enforce fine-grained control over the memory and
1505interrupts which a device is attempting to access.
1506
1507By default, IOMMUs are configured for use if they are available.  An overall
1508boolean (e.g. `iommu=no`) can override this and leave the IOMMUs disabled.
1509
1510*   The `verbose` and `debug` booleans can be used to print additional
1511    diagnostic information.  Neither are active by default.
1512
1513*   The `force` and `required` booleans are synonymous and, when requested,
1514    will prevent Xen from booting if IOMMUs aren't discovered and enabled
1515    successfully.
1516
1517*   The `quarantine` option can be used to control Xen's behavior when
1518    de-assigning devices from guests.  The default behaviour is chosen at
1519    compile time, and is one of `CONFIG_IOMMU_QUARANTINE_{NONE,BASIC,SCRATCH_PAGE}`.
1520
1521    When a PCI device is assigned to an untrusted domain, it is possible
1522    for that domain to program the device to DMA to an arbitrary address.
1523    The IOMMU is used to protect the host from malicious DMA by making
1524    sure that the device addresses can only target memory assigned to the
1525    guest.  However, when the guest domain is torn down, assigning the
1526    device back to the hardware domain would allow any in-flight DMA to
1527    potentially target critical host data.  To avoid this, quarantining
1528    should be enabled.  Quarantining can be done in two ways: In its basic
1529    form, all in-flight DMA will simply be forced to encounter IOMMU
1530    faults.  Since there are systems where doing so can cause host lockup,
1531    an alternative form is available where accesses to memory will be directed
1532    to a scratch page. The implication here is that such accesses will go
1533    unnoticed, i.e. an admin may not become aware of the underlying problem.
1534
1535    Therefore, if this option is set to true (the default), Xen always
1536    quarantines such devices; they must be explicitly assigned back to Dom0
1537    before they can be used there again.  If set to "scratch-page", still
1538    active DMA operations will additionally be directed to a "scratch" page.  If
1539    set to false, Xen will only quarantine devices the toolstack has arranged
1540    for getting quarantined, and only in the "basic" form.
1541
1542    This option is only valid on builds supporting PCI.
1543
1544*   The `sharept` boolean controls whether the IOMMU pagetables are shared
1545    with the CPU-side HAP pagetables, or allocated separately.  Sharing
1546    reduces the memory overhead, but doesn't work in combination with CPU-side
1547    pagefault-based features, e.g. dirty VRAM tracking when a PCI device is
1548    assigned.
1549
1550    Due to implementation choices, sharing pagetables doesn't work on AMD
1551    hardware, and this option is ignored.  It is enabled by default on Intel
1552    systems.
1553
1554    This option is ignored on ARM, and the pagetables are always shared.
1555
1556*   The `superpages` boolean controls whether superpage mappings may be used
1557    in IOMMU page tables.  If using this option is necessary to fix an issue,
1558    please report a bug.
1559
1560    This option is only valid on x86.
1561
1562*   The `intremap` boolean controls the Interrupt Remapping sub-feature, and
1563    is active by default on compatible hardware.  On x86 systems, the first
1564    generation of IOMMUs only supported DMA remapping, and Interrupt Remapping
1565    appeared in the second generation.
1566
1567    This option is only valid on x86.
1568
1569*   The `intpost` boolean controls the Posted Interrupt sub-feature.  In
1570    combination with APIC acceleration (VT-x APICV, SVM AVIC), the IOMMU can
1571    be configured to deliver interrupts from assigned PCI devices directly
1572    into the guest, without trapping out into hypervisor context.
1573
1574    This option depends on `intremap`, and is disabled by default due to some
1575    corner cases in the implementation which have yet to be resolved.
1576
1577    This option is only valid on x86, and only builds of Xen with HVM support.
1578
1579*   The `crash-disable` boolean controls disabling IOMMU functionality (DMAR/IR/QI)
1580    before switching to a crash kernel. This option is inactive by default and
1581    is for compatibility with older kdump kernels only. Modern kernels copy
1582    all the necessary tables from the previous one following kexec which makes
1583    the transition transparent for them with IOMMU functions still on.
1584
1585The following options are specific to Intel VT-d hardware:
1586
1587*   The `snoop` boolean controls the Snoop Control sub-feature, and is active
1588    by default on compatible hardware.
1589
1590    An incoming DMA request may specify _Snooped_ (query the CPU caches for
1591    the appropriate lines) or _Non-Snooped_ (don't query the CPU caches).
1592    _Non-Snooped_ accesses incur less latency, but behind-the-scenes
1593    hypervisor activity can invalidate the expectations of the device driver,
1594    and Snoop Control allows the hypervisor to force DMA requests to be
1595    _Snooped_ when they would otherwise not be.
1596
1597*   The `qinval` boolean controls the Queued Invalidation sub-feature, and is
1598    active by default on compatible hardware.  Queued Invalidation is a
1599    feature in second-generation IOMMUs and is a functional prerequisite for
1600    Interrupt Remapping. Note that Xen disregards this setting for Intel VT-d
1601    version 6 and greater as Registered-Based Invalidation isn't supported
1602    by them.
1603
1604*   The `igfx` boolean is active by default, and controls whether IOMMUs in
1605    front of solely graphics devices get enabled or not.
1606
1607    It is intended as a debugging mechanism for graphics issues, and to be
1608    similar to Linux's `intel_iommu=igfx_off` option.  If specifying `no-igfx`
1609    fixes anything, please report the problem.
1610
1611The following options are specific to AMD-Vi hardware:
1612
1613*   The `amd-iommu-perdev-intremap` boolean controls whether the interrupt
1614    remapping table is per device (the default), or a single global table for
1615    the entire system.
1616
1617    Using a global table is not security supported as it allows all devices to
1618    impersonate each other as far as interrupts as concerned (see XSA-36), but
1619    it is a workaround for SP5100 Erratum 28.
1620
1621**WARNING: The `dom0-passthrough` and `dom0-strict` booleans are both
1622deprecated, and superseded by _dom0-iommu={passthrough,strict}_ respectively -
1623using both the old and new command line options in combination is undefined.**
1624
1625### iommu_dev_iotlb_timeout
1626> `= <integer>`
1627
1628> Default: `1000`
1629
1630Specify the timeout of the device IOTLB invalidation in milliseconds.
1631By default, the timeout is 1000 ms. When you see error 'Queue invalidate
1632wait descriptor timed out', try increasing this value.
1633
1634### iommu_inclusive_mapping
1635> `= <boolean>`
1636
1637**WARNING: This command line option is deprecated, and superseded by
1638_dom0-iommu=map-inclusive_ - using both options in combination is undefined.**
1639
1640### irq-max-guests (x86)
1641> `= <integer>`
1642
1643> Default: `32`
1644
1645Maximum number of guests any individual IRQ could be shared between,
1646i.e. a limit on the number of guests it is possible to start each having
1647assigned a device sharing a common interrupt line.  Accepts values between
16481 and 255.
1649
1650### irq_ratelimit (x86)
1651> `= <integer>`
1652
1653### irq_vector_map (x86)
1654
1655### ivmd (x86)
1656> `= <start>[-<end>][=<bdf1>[-<bdf1'>][,<bdf2>[-<bdf2'>][,...]]][;<start>...]`
1657
1658Define IVMD-like ranges that are missing from ACPI tables along with the
1659device(s) they belong to, and use them for 1:1 mapping.  End addresses can be
1660omitted when exactly one page is meant.  The ranges are inclusive when start
1661and end are specified.  Note that only PCI segment 0 is supported at this time,
1662but it is fine to specify it explicitly.
1663
1664'start' and 'end' values are page numbers (not full physical addresses),
1665in hexadecimal format (can optionally be preceded by "0x").
1666
1667Omitting the optional (range of) BDF spcifiers signals that the range is to
1668be applied to all devices.
1669
1670Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be
1671reserved, and devices 0:0:1a.0...0:0:1a.3 collectively require three pages
1672(0xd5d46 thru 0xd5d48) to be reserved, one usage would be:
1673
1674ivmd=d5d45=0:1d.0;0xd5d46-0xd5d48=0:1a.0-0:1a.3
1675
1676Note: grub2 requires to escape or quote special characters, like ';' when
1677multiple ranges are specified - refer to the grub2 documentation.
1678
1679### ivrs_hpet[`<hpet>`] (AMD)
1680> `=[<seg>:]<bus>:<device>.<func>`
1681
1682Force the use of `[<seg>:]<bus>:<device>.<func>` as device ID of HPET
1683`<hpet>` instead of the one specified by the IVHD sub-tables of the IVRS
1684ACPI table.
1685
1686### ivrs_ioapic[`<ioapic>`] (AMD)
1687> `=[<seg>:]<bus>:<device>.<func>`
1688
1689Force the use of `[<seg>:]<bus>:<device>.<func>` as device ID of IO-APIC
1690`<ioapic>` instead of the one specified by the IVHD sub-tables of the IVRS
1691ACPI table.
1692
1693### lapic (x86)
1694> `= <boolean>`
1695
1696Force the use of use of the local APIC on a uniprocessor system, even
1697if left disabled by the BIOS.
1698
1699### lapic_timer_c2_ok (x86)
1700> `= <boolean>`
1701
1702### ler (x86)
1703> `= <boolean>`
1704
1705> Default: false
1706
1707This option is intended for debugging purposes only.  Enable MSR_DEBUGCTL.LBR
1708in hypervisor context to be able to dump the Last Interrupt/Exception To/From
1709record with other registers.
1710
1711### lock-depth-size
1712> `= <integer>`
1713
1714> Default: `lock-depth-size=64`
1715
1716Specifies the maximum number of nested locks tested for illegal recursions.
1717Higher nesting levels still work, but recursion testing is omitted for those
1718levels. In case an illegal recursion is detected the system will crash
1719immediately. Specifying `0` will disable all testing of illegal lock nesting.
1720
1721This option is available for hypervisors built with CONFIG_DEBUG_LOCKS only.
1722
1723### loglvl
1724> `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
1725
1726> Default: `loglvl=info`
1727
1728> Can be modified at runtime
1729
1730Set the logging level for Xen.  Any log message with equal more more
1731importance will be printed.
1732
1733The optional `<rate-limited level>` option instructs which severities
1734should be rate limited.
1735
1736### low_crashinfo
1737> `= none | min | all`
1738
1739> Default: `none` if not specified at all, or to `min` if **low_crashinfo** is present without qualification.
1740
1741This option is only useful for hosts with a 32bit dom0 kernel, wishing
1742to use kexec functionality in the case of a crash.  It represents
1743which data structures should be deliberately allocated in low memory,
1744so the crash kernel may find find them.  Should be used in combination
1745with **crashinfo_maxaddr**.
1746
1747### low_mem_virq_limit
1748> `= <size>`
1749
1750> Default: `64M`
1751
1752Specify the threshold below which Xen will inform dom0 that the quantity of
1753free memory is getting low.  Specifying `0` will disable this notification.
1754
1755### maxcpus
1756> `= <integer>`
1757
1758Specify the maximum number of CPUs that should be brought up.
1759
1760This option is ignored in **pv-shim** mode.
1761
1762**WARNING: On Arm big.LITTLE systems, when `hmp-unsafe` option is enabled, this command line
1763option does not guarantee on which CPU types will be used.**
1764
1765### max_cstate (x86)
1766> `= <integer>[,<integer>]`
1767
1768Specify the deepest C-state CPUs are permitted to be placed in, and
1769optionally the maximum sub C-state to be used used.  The latter only applies
1770to the highest permitted C-state.
1771
1772### max_gsi_irqs (x86)
1773> `= <integer>`
1774
1775Specifies the number of interrupts to be use for pin (IO-APIC or legacy PIC)
1776based interrupts. Any higher IRQs will be available for use via PCI MSI.
1777
1778### max_lpi_bits (arm)
1779> `= <integer>`
1780
1781Specifies the number of ARM GICv3 LPI interrupts to allocate on the host,
1782presented as the number of bits needed to encode it. This must be at least
178314 and not exceed 32, and each LPI requires one byte (configuration) and
1784one pending bit to be allocated.
1785Defaults to 20 bits (to cover at most 1048576 interrupts).
1786
1787### mce (x86)
1788> `= <boolean>`
1789
1790> Default: `true`
1791
1792Allows to disable the use of Machine Check Exceptions.  Note that doing
1793so may result in silent shutdown of the system in case an event occurs
1794which would have resulted in raising a Machine Check Exception.  Silent
1795here is as far as Xen is concerned; firmware may offer to retrieve some
1796collected data.
1797
1798### mce_fb (Intel)
1799> `= <boolean>`
1800
1801> Default: `false`
1802
1803Force broadcasting of Machine Check Exceptions, suppressing the use of
1804Local MCE functionality available in newer Intel hardware.
1805
1806### mce_verbosity (x86)
1807> `= verbose`
1808
1809Specify verbose machine check output.
1810
1811### mem (x86)
1812> `= <size>`
1813
1814Specify the maximum address of physical RAM.  Any RAM beyond this
1815limit is ignored by Xen.
1816
1817### memop-max-order
1818> `= [<domU>][,[<ctldom>][,[<hwdom>][,<ptdom>]]]`
1819
1820> x86 default: `9,18,12,12`
1821> ARM default: `9,18,10,10`
1822
1823Change the maximum order permitted for allocation (or allocation-like)
1824requests issued by the various kinds of domains (in this order:
1825ordinary DomU, control domain, hardware domain, and - when supported
1826by the platform - DomU with pass-through device assigned).
1827
1828### mmcfg (x86)
1829> `= <boolean>[,amd-fam10]`
1830
1831> Default: `1`
1832
1833Specify if the MMConfig space should be enabled.
1834
1835### mmio-relax (x86)
1836> `= <boolean> | all`
1837
1838> Default: `false`
1839
1840By default, domains may not create cached mappings to MMIO regions.
1841This option relaxes the check for Domain 0 (or when using `all`, all PV
1842domains), to permit the use of cacheable MMIO mappings.
1843
1844### msi (x86)
1845> `= <boolean>`
1846
1847> Default: `true`
1848
1849Force Xen to (not) use PCI-MSI, even if ACPI FADT says otherwise.
1850
1851### mtrr.show (x86)
1852> `= <boolean>`
1853
1854> Default: `false`
1855
1856Print boot time MTRR state.
1857
1858### mwait-idle (x86)
1859> `= <boolean>`
1860
1861> Default: `true`
1862
1863Use the MWAIT idle driver (with model specific C-state knowledge) instead
1864of the ACPI based one.
1865
1866### nmi (x86)
1867> `= ignore | dom0 | fatal`
1868
1869> Default: `fatal` for a debug build, or `dom0` for a non-debug build
1870
1871Specify what Xen should do in the event of an NMI parity or I/O error.
1872`ignore` discards the error; `dom0` causes Xen to report the error to
1873dom0, while 'fatal' causes Xen to print diagnostics and then hang.
1874
1875### noapic (x86)
1876
1877Instruct Xen to ignore any IOAPICs that are present in the system, and
1878instead continue to use the legacy PIC. This is _not_ recommended with
1879pvops type kernels.
1880
1881Because responsibility for APIC setup is shared between Xen and the
1882domain 0 kernel this option is automatically propagated to the domain
18830 command line.
1884
1885### invpcid (x86)
1886> `= <boolean>`
1887
1888> Default: `true`
1889
1890By default, Xen will use the INVPCID instruction for TLB management if
1891it is available.  This option can be used to cause Xen to fall back to
1892older mechanisms, which are generally slower.
1893
1894### load-balance-ratelimit
1895> `= <integer>`
1896
1897The minimum interval between load balancing events on a given pcpu, in
1898microseconds.  A value of '0' will disable rate limiting.  Maximum
1899value 1 second. At the moment only credit honors this parameter.
1900Default 1ms.
1901
1902### noirqbalance (x86)
1903> `= <boolean>`
1904
1905Disable software IRQ balancing and affinity. This can be used on
1906systems such as Dell 1850/2850 that have workarounds in hardware for
1907IRQ routing issues.
1908
1909### nolapic (x86)
1910> `= <boolean>`
1911
1912> Default: `false`
1913
1914Ignore the local APIC on a uniprocessor system, even if enabled by the
1915BIOS.
1916
1917### no-real-mode (x86)
1918> `= <boolean>`
1919
1920Do not execute real-mode bootstrap code when booting Xen. This option
1921should not be used except for debugging. It will effectively disable
1922the **vga** option, which relies on real mode to set the video mode.
1923
1924### noreboot
1925> `= <boolean>`
1926
1927Do not automatically reboot after an error.  This is useful for
1928catching debug output.  Defaults to automatically reboot after 5
1929seconds.
1930
1931### nosmp (x86)
1932> `= <boolean>`
1933
1934Disable SMP support.  No secondary processors will be booted.
1935Defaults to booting secondary processors.
1936
1937This option is ignored in **pv-shim** mode.
1938
1939### nr_irqs (x86)
1940> `= <integer>`
1941
1942### numa (x86)
1943> `= on | off | fake=<integer> | noacpi`
1944
1945> Default: `on`
1946
1947### partial-emulation (arm)
1948> `= <boolean>`
1949
1950> Default: `false`
1951
1952Flag to enable or disable partial emulation of system/coprocessor registers.
1953Only effective if CONFIG_PARTIAL_EMULATION is enabled.
1954
1955**WARNING: Enabling this option might result in unwanted/non-spec compliant
1956behavior.**
1957
1958### pci
1959    = List of [ serr=<bool>, perr=<bool> ]
1960
1961    Default: Signaling left as set by firmware.
1962
1963Override the firmware settings, and explicitly enable or disable the
1964signalling of PCI System and Parity errors.
1965
1966### pci-phantom
1967> `=[<seg>:]<bus>:<device>,<stride>`
1968
1969Mark a group of PCI devices as using phantom functions without actually
1970advertising so, so the IOMMU can create translation contexts for them.
1971
1972All numbers specified must be hexadecimal ones.
1973
1974This option can be specified more than once (up to 8 times at present).
1975
1976### pci-passthrough (arm)
1977> `= <boolean>`
1978
1979> Default: `false`
1980
1981Flag to enable or disable support for PCI passthrough
1982
1983### pcid (x86)
1984> `= <boolean> | xpti=<bool>`
1985
1986> Default: `xpti`
1987
1988> Can be modified at runtime (change takes effect only for domains created
1989  afterwards)
1990
1991If available, control usage of the PCID feature of the processor for
199264-bit pv-domains. PCID can be used either for no domain at all (`false`),
1993for all of them (`true`), only for those subject to XPTI (`xpti`) or for
1994those not subject to XPTI (`no-xpti`). The feature is used only in case
1995INVPCID is supported and not disabled via `invpcid=false`.
1996
1997### ple_gap
1998> `= <integer>`
1999
2000### ple_window (Intel)
2001> `= <integer>`
2002
2003### preferred-cstates (x86)
2004> `= ( <integer> | List of ( C1 | C1E | C2 | ... )`
2005
2006This is a mask of C-states which are to be used preferably.  This option is
2007applicable only on hardware were certain C-states are exclusive of one another.
2008
2009### probe-port-aliases (x86)
2010> `= <boolean>`
2011
2012> Default: `true` outside of shim mode, `false` in shim mode
2013
2014Certain devices accessible by I/O ports may be accessible also through "alias"
2015ports (originally a result of incomplete address decoding).  When such devices
2016are solely under Xen's control, Xen disallows even Dom0 access to the "primary"
2017ports.  When alias probing is active and aliases are detected, "alias" ports
2018would then be treated similar to the "primary" ones.
2019
2020### psr (Intel)
2021> `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
2022
2023> Default: `psr=cmt:0,rmid_max:255,cat:0,cos_max:255,cdp:0`
2024
2025Platform Shared Resource(PSR) Services.  Intel Haswell and later server
2026platforms offer information about the sharing of resources.
2027
2028To use the PSR monitoring service for a certain domain, a Resource
2029Monitoring ID(RMID) is used to bind the domain to corresponding shared
2030resource.  RMID is a hardware-provided layer of abstraction between software
2031and logical processors.
2032
2033To use the PSR cache allocation service for a certain domain, a capacity
2034bitmasks(CBM) is used to bind the domain to corresponding shared resource.
2035CBM represents cache capacity and indicates the degree of overlap and isolation
2036between domains. In hypervisor a Class of Service(COS) ID is allocated for each
2037unique CBM.
2038
2039The following resources are available:
2040
2041* Cache Monitoring Technology (Haswell and later).  Information regarding the
2042  L3 cache occupancy.
2043  * `cmt` instructs Xen to enable/disable Cache Monitoring Technology.
2044  * `rmid_max` indicates the max value for rmid.
2045* Memory Bandwidth Monitoring (Broadwell and later). Information regarding the
2046  total/local memory bandwidth. Follow the same options with Cache Monitoring
2047  Technology.
2048
2049* Cache Allocation Technology (Broadwell and later).  Information regarding
2050  the cache allocation.
2051  * `cat` instructs Xen to enable/disable Cache Allocation Technology.
2052  * `cos_max` indicates the max value for COS ID.
2053* Code and Data Prioritization Technology (Broadwell and later). Information
2054  regarding the code cache and the data cache allocation. CDP is based on CAT.
2055  * `cdp` instructs Xen to enable/disable Code and Data Prioritization. Note
2056    that `cos_max` of CDP is a little different from `cos_max` of CAT. With
2057    CDP, one COS will corespond two CBMs other than one with CAT, due to the
2058    sum of CBMs is fixed, that means actual `cos_max` in use will automatically
2059    reduce to half when CDP is enabled.
2060
2061### pv
2062    = List of [ 32=<bool> ]
2063
2064    Applicability: x86
2065
2066Controls for aspects of PV guest support.
2067
2068*   The `32` boolean controls whether 32bit PV guests can be created.  It
2069    defaults to `true`, and is ignored when `CONFIG_PV32` is compiled out.
2070
2071    32bit PV guests are incompatible with CET Shadow Stacks.  If Xen is using
2072    shadow stacks, this option will be overridden to `false`.  Backwards
2073    compatibility can be maintained with the `pv-shim` mechanism.
2074
2075### pv-linear-pt (x86)
2076> `= <boolean>`
2077
2078> Default: `true`
2079
2080Only available if Xen is compiled with `CONFIG_PV_LINEAR_PT` support
2081enabled.
2082
2083Allow PV guests to have pagetable entries pointing to other pagetables
2084of the same level (i.e., allowing L2 PTEs to point to other L2 pages).
2085This technique is often called "linear pagetables", and is sometimes
2086used to allow operating systems a simple way to consistently map the
2087current process's pagetables into its own virtual address space.
2088
2089Linux and MiniOS don't use this technique.  NetBSD and Novell Netware
2090do; there may be other custom operating systems which do.  If you're
2091certain you don't plan on having PV guests which use this feature,
2092turning it off can reduce the attack surface.
2093
2094### pv-l1tf (x86)
2095> `= List of [ <bool>, dom0=<bool>, domu=<bool> ]`
2096
2097> Default: `false` on believed-unaffected hardware, or in pv-shim mode.
2098>          `domu`  on believed-affected hardware.
2099
2100Mitigations for L1TF / XSA-273 / CVE-2018-3620 for PV guests.
2101
2102For backwards compatibility, we may not alter an architecturally-legitimate
2103pagetable entry a PV guest chooses to write.  We can however force such a
2104guest into shadow mode so that Xen controls the PTEs which are reachable by
2105the CPU pagewalk.
2106
2107Shadowing is performed at the point where a PV guest first tries to write an
2108L1TF-vulnerable PTE.  Therefore, a PV guest kernel which has been updated with
2109its own L1TF mitigations will not trigger shadow mode if it is well behaved.
2110
2111If `CONFIG_SHADOW_PAGING` is not compiled in, this mitigation instead crashes
2112the guest when an L1TF-vulnerable PTE is written, which still allows updated,
2113well-behaved PV guests to run, despite Shadow being compiled out.
2114
2115In the pv-shim case, Shadow is expected to be compiled out, and a malicious
2116guest kernel can only leak data from the shim Xen, rather than the host Xen.
2117
2118### pv-shim (x86)
2119> `= <boolean>`
2120
2121> Default: `false`
2122
2123This option is intended for use by a toolstack, when choosing to run a PV
2124guest compatibly inside an HVM container.
2125
2126In this mode, the kernel and initrd passed as modules to the hypervisor are
2127constructed into a plain unprivileged PV domain.
2128
2129### rcu-idle-timer-period-ms
2130> `= <integer>`
2131
2132> Default: `10`
2133
2134How frequently a CPU which has gone idle, but with pending RCU callbacks,
2135should be woken up to check if the grace period has completed, and the
2136callbacks are safe to be executed. Expressed in milliseconds; maximum is
2137100, and it can't be 0.
2138
2139### reboot (x86)
2140> `= t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | e[fi] | n[o] [, [w]arm | [c]old]`
2141
2142> Default: `0`
2143
2144Specify the host reboot method.
2145
2146`warm` instructs Xen to not set the cold reboot flag.
2147
2148`cold` instructs Xen to set the cold reboot flag.
2149
2150`no` instructs Xen to not automatically reboot after panics or crashes.
2151
2152`triple` instructs Xen to reboot the host by causing a triple fault.
2153
2154`kbd` instructs Xen to reboot the host via the keyboard controller.
2155
2156`acpi` instructs Xen to reboot the host using RESET_REG in the ACPI FADT.
2157
2158`pci` instructs Xen to reboot the host using PCI reset register (port CF9).
2159
2160`Power` instructs Xen to power-cycle the host using PCI reset register (port CF9).
2161
2162'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
2163 default it will use that method first).
2164
2165`xen` instructs Xen to reboot using Xen's SCHEDOP hypercall (this is the default
2166when running nested Xen)
2167
2168### rmrr
2169> `= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]`
2170
2171Define RMRR units that are missing from ACPI table along with device they
2172belong to and use them for 1:1 mapping. End addresses can be omitted and one
2173page will be mapped. The ranges are inclusive when start and end are specified.
2174If segment of the first device is not specified, segment zero will be used.
2175If other segments are not specified, first device segment will be used.
2176If a segment is specified for other than the first device and it does not match
2177the one specified for the first one, an error will be reported.
2178
2179'start' and 'end' values are page numbers (not full physical addresses),
2180in hexadecimal format (can optionally be preceded by "0x").
2181
2182Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be
2183reserved, and device 0:0:1a.0 requires three pages (0xd5d46 thru 0xd5d48)
2184to be reserved, one usage would be:
2185
2186rmrr=d5d45=0:0:1d.0;0xd5d46-0xd5d48=0:0:1a.0
2187
2188Note: grub2 requires to escape or use quotations if special characters are used,
2189namely ';', refer to the grub2 documentation if multiple ranges are specified.
2190
2191### ro-hpet (x86)
2192> `= <boolean>`
2193
2194> Default: `true`
2195
2196Map the HPET page as read only in Dom0. If disabled the page will be mapped
2197with read and write permissions.
2198
2199### sched
2200> `= credit | credit2 | arinc653 | rtds | null`
2201
2202> Default: `sched=credit2`
2203
2204Choose the default scheduler. Note the default scheduler is selectable via
2205Kconfig and depends on enabled schedulers. Check
2206`CONFIG_SCHED_DEFAULT` to see which scheduler is the default.
2207
2208### sched_credit2_max_cpus_runqueue
2209> `= <integer>`
2210
2211> Default: `16`
2212
2213Defines how many CPUs will be put, at most, in each Credit2 runqueue.
2214
2215Runqueues are still arranged according to the host topology (and following
2216what indicated by the 'credit2_runqueue' parameter). But we also have a cap
2217to the number of CPUs that share each runqueues.
2218
2219A value that is a submultiple of the number of online CPUs is recommended,
2220as that would likely produce a perfectly balanced runqueue configuration.
2221
2222### sched_credit2_migrate_resist
2223> `= <integer>`
2224
2225### sched_credit_tslice_ms
2226> `= <integer>`
2227
2228Set the timeslice of the credit1 scheduler, in milliseconds.  The
2229default is 30ms.  Reasonable values may include 10, 5, or even 1 for
2230very latency-sensitive workloads.
2231
2232### sched-gran (x86)
2233> `= cpu | core | socket`
2234
2235> Default: `sched-gran=cpu`
2236
2237Set the scheduling granularity. In case the granularity is larger than 1 (e.g.
2238`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned
2239statically to a "scheduling unit" which will then be subject to scheduling.
2240This assignment of vcpus to scheduling units is fixed.
2241
2242`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a
2243hyperthread using x86/Intel terminology)
2244
2245`core`: As many vcpus as there are cpus on a physical core are scheduled
2246together on a physical core.
2247
2248`socket`: As many vcpus as there are cpus on a physical sockets are scheduled
2249together on a physical socket.
2250
2251Note: a value other than `cpu` will result in rejecting a runtime modification
2252attempt of the "smt" setting.
2253
2254Note: for AMD x86 processors before Fam17 the terminology in the official data
2255sheets is different: a cpu is named "core" and multiple "cores" are running
2256in the same "compute unit". As from Fam17 on AMD is using the same names as
2257Intel ("thread" and "core") the topology levels are named "cpu", "core" and
2258"socket" even on older AMD processors.
2259
2260### sched_ratelimit_us
2261> `= <integer>`
2262
2263In order to limit the rate of context switching, set the minimum
2264amount of time that a vcpu can be scheduled for before preempting it,
2265in microseconds.  The default is 1000us (1ms).  Setting this to 0
2266disables it altogether.
2267
2268### sched_smt_power_savings
2269> `= <boolean>`
2270
2271Normally Xen will try to maximize performance and cache utilization by
2272spreading out vcpus across as many different divisions as possible
2273(i.e, numa nodes, sockets, cores threads, &c).  This often maximizes
2274throughput, but also maximizes energy usage, since it reduces the
2275depth to which a processor can sleep.
2276
2277This option inverts the logic, so that the scheduler in effect tries
2278to keep the vcpus on the smallest amount of silicon possible; i.e.,
2279first fill up sibling threads, then sibling cores, then sibling
2280sockets, &c.  This will reduce performance somewhat, particularly on
2281systems with hyperthreading enabled, but should reduce power by
2282enabling more sockets and cores to go into deeper sleep states.
2283
2284### scrub-domheap
2285> `= <boolean>`
2286
2287> Default: `false`
2288
2289Scrub domains' freed pages. This is a safety net against a (buggy) domain
2290accidentally leaking secrets by releasing pages without proper sanitization.
2291
2292### serial_tx_buffer
2293> `= <size>`
2294
2295> Default: `16kB`
2296
2297Set the serial transmit buffer size.
2298
2299### serrors (ARM)
2300> `= diverse | panic`
2301
2302> Default: `diverse`
2303
2304This parameter is provided to administrators to determine how the hypervisor
2305handles SErrors.
2306
2307* `diverse`:
2308  The hypervisor will distinguish guest SErrors from hypervisor SErrors:
2309    - The guest generated SErrors will be forwarded to the currently running
2310      guest.
2311    - The hypervisor generated SErrors will cause the whole system to crash
2312
2313* `panic`:
2314  All SErrors will cause the whole system to crash. This option should only
2315  be used if you trust all your guests and/or they don't have a gadget (e.g.
2316  device) to generate SErrors in normal run.
2317
2318### shim_mem (x86)
2319> `= List of ( min:<size> | max:<size> | <size> )`
2320
2321Set the amount of memory that xen-shim uses. Only has effect if pv-shim mode is
2322enabled. Note that this value accounts for the memory used by the shim itself
2323plus the free memory slack given to the shim for runtime allocations.
2324
2325* `min:<size>` specifies the minimum amount of memory. Ignored if greater
2326   than max.
2327* `max:<size>` specifies the maximum amount of memory.
2328* `<size>` specifies the exact amount of memory. Overrides both min and max.
2329
2330By default, the amount of free memory slack given to the shim for runtime usage
2331is 1MB.
2332
2333### smap (x86)
2334> `= <boolean> | hvm`
2335
2336> Default: `true` unless running in pv-shim mode on AMD or Hygon hardware
2337
2338Flag to enable Supervisor Mode Access Prevention
2339Use `smap=hvm` to allow SMAP use by HVM guests only.
2340
2341In PV shim mode on AMD or Hygon hardware due to significant performance impact
2342in some cases and generally lower security risk the option defaults to false.
2343
2344### smep (x86)
2345> `= <boolean> | hvm`
2346
2347> Default: `true` unless running in pv-shim mode on AMD or Hygon hardware
2348
2349Flag to enable Supervisor Mode Execution Protection
2350Use `smep=hvm` to allow SMEP use by HVM guests only.
2351
2352In PV shim mode on AMD or Hygon hardware due to significant performance impact
2353in some cases and generally lower security risk the option defaults to false.
2354
2355### smt (x86)
2356> `= <boolean>`
2357
2358Default: `true`
2359
2360Control bring up of multiple hyper-threads per CPU core.
2361
2362### snb_igd_quirk
2363> `= <boolean> | cap | <integer>`
2364
2365A true boolean value enables legacy behavior (1s timeout), while `cap`
2366enforces the maximum theoretically necessary timeout of 670ms. Any number
2367is being interpreted as a custom timeout in milliseconds. Zero or boolean
2368false disable the quirk workaround, which is also the default.
2369
2370### spec-ctrl (Arm)
2371> `= List of [ ssbd=force-disable|runtime|force-enable ]`
2372
2373Controls for speculative execution sidechannel mitigations.
2374
2375The option `ssbd=` is used to control the state of Speculative Store
2376Bypass Disable (SSBD) mitigation.
2377
2378* `ssbd=force-disable` will keep the mitigation permanently off. The guest
2379will not be able to control the state of the mitigation.
2380* `ssbd=runtime` will always turn on the mitigation when running in the
2381hypervisor context. The guest will be to turn on/off the mitigation for
2382itself by using the firmware interface `ARCH_WORKAROUND_2`.
2383* `ssbd=force-enable` will keep the mitigation permanently on. The guest will
2384not be able to control the state of the mitigation.
2385
2386By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
2387
2388### spec-ctrl (x86)
2389> `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
2390>              {msr-sc,rsb,verw,{ibpb,bhb}-entry}=<bool>|{pv,hvm}=<bool>,
2391>              bti-thunk=retpoline|lfence|jmp,bhb-seq=short|tsx|long,
2392>              {ibrs,ibpb,ssbd,psfd,
2393>              eager-fpu,l1d-flush,branch-harden,srb-lock,
2394>              unpriv-mmio,gds-mit,div-scrub,lock-harden,
2395>              bhi-dis-s}=<bool> ]`
2396
2397Controls for speculative execution sidechannel mitigations.  By default, Xen
2398will pick the most appropriate mitigations based on compiled in support,
2399loaded microcode, and hardware details, and will virtualise appropriate
2400mitigations for guests to use.
2401
2402**WARNING: Any use of this option may interfere with heuristics.  Use with
2403extreme care.**
2404
2405An overall boolean value, `spec-ctrl=no`, can be specified to turn off all
2406mitigations, including pieces of infrastructure used to virtualise certain
2407mitigation features for guests.  This also includes settings which `xpti`,
2408`smt`, `pv-l1tf`, `tsx` control, unless the respective option(s) have been
2409specified earlier on the command line.
2410
2411Alternatively, a slightly more restricted `spec-ctrl=no-xen` can be used to
2412turn off all of Xen's mitigations, while leaving the virtualisation support
2413in place for guests to use.
2414
2415Use of a positive boolean value for either of these options is invalid.
2416
2417The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `verw=`, `ibpb-entry=` and `bhb-entry=`
2418options offer fine grained control over the primitives by Xen.  These impact
2419Xen's ability to protect itself, and/or Xen's ability to virtualise support
2420for guests to use.
2421
2422* `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
2423  respectively.
2424* Each other option can be used either as a plain boolean
2425  (e.g. `spec-ctrl=rsb` to control both the PV and HVM sub-options), or with
2426  `pv=` or `hvm=` subsuboptions (e.g. `spec-ctrl=rsb=no-hvm` to disable HVM
2427  RSB only).
2428
2429* `msr-sc=` offers control over Xen's support for manipulating `MSR_SPEC_CTRL`
2430  on entry and exit.  These blocks are necessary to virtualise support for
2431  guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
2432* `rsb=` offers control over whether to overwrite the Return Stack Buffer /
2433  Return Address Stack on entry to Xen and on idle.
2434* `verw=` offers control over whether to use VERW for its scrubbing side
2435  effects at appropriate privilege transitions.  The exact side effects are
2436  microarchitecture and microcode specific.  *Note: `md-clear=` is accepted as
2437  a deprecated alias.  For compatibility with development versions of XSA-297,
2438  `mds=` is also accepted on Xen 4.12 and earlier as an alias.  Consult vendor
2439  documentation in preference to here.*
2440* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
2441  Barrier) is used on entry to Xen.  This is used by default on hardware
2442  vulnerable to Branch Type Confusion, and hardware vulnerable to Speculative
2443  Return Stack Overflow if appropriate microcode has been loaded, but for
2444  performance reasons dom0 is unprotected by default.  If it is necessary to
2445  protect dom0 too, boot with `spec-ctrl=ibpb-entry`.
2446* `bhb-entry=` offers control over whether BHB-clearing (Branch History
2447  Buffer) sequences are used on entry to Xen.  This is used by default on
2448  hardware vulnerable to Branch History Injection, when the BHI_DIS_S control
2449  is not available (see `bhi-dis-s`).  The choice of scrubbing sequence can be
2450  selected using the `bhb-seq=` option.  If it is necessary to protect dom0
2451  too, boot with `spec-ctrl=bhb-entry`.
2452
2453If Xen was compiled with `CONFIG_INDIRECT_THUNK` support, `bti-thunk=` can be
2454used to select which of the thunks gets patched into the
2455`__x86_indirect_thunk_%reg` locations.  The default thunk is `retpoline`
2456(generally preferred), with the alternatives being `jmp` (a `jmp *%reg` gadget,
2457minimal overhead), and `lfence` (an `lfence; jmp *%reg` gadget).
2458
2459On all hardware, `bhb-seq=` can be used to select which of the BHB-clearing
2460sequences gets used.  This interacts with the `bhb-entry=` and `bhi-dis-s=`
2461options in order to mitigate Branch History Injection on affected hardware.
2462The default sequence is `short`, with `tsx` as an alternative available
2463capable hardware, and `long` that can be opted in to.
2464
2465On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
2466`ibrs=` option can be used to force or prevent Xen using the feature itself.
2467If Xen is not using IBRS itself, functionality is still set up so IBRS can be
2468virtualised for guests.
2469
2470On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the
2471`stibp=` option can be used to force or prevent Xen using the feature itself.
2472By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and
2473when hardware hints recommend using it as a blanket setting.
2474
2475On hardware supporting SSBD (Speculative Store Bypass Disable), the `ssbd=`
2476option can be used to force or prevent Xen using the feature itself.  The
2477feature is virtualised for guests, independently of Xen's choice of setting.
2478On AMD hardware, disabling Xen SSBD usage on the command line (`ssbd=0` which
2479is the default value) can lead to Xen running with the guest SSBD selection
2480depending on hardware support, on the same hardware setting `ssbd=1` will
2481result in SSBD always being enabled, regardless of guest choice.
2482
2483On hardware supporting PSFD (Predictive Store Forwarding Disable), the `psfd=`
2484option can be used to force or prevent Xen using the feature itself.  By
2485default, Xen will not use PSFD.  PSFD is implied by SSBD, and SSBD is off by
2486default.
2487
2488On hardware supporting BHI_DIS_S (Branch History Injection Disable
2489Supervisor), the `bhi-dis-s=` option can be used to force or prevent Xen using
2490the feature itself.  By default Xen will use BHI_DIS_S on hardware susceptible
2491to Branch History Injection.
2492
2493On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
2494option can be used to force (the default) or prevent Xen from issuing branch
2495prediction barriers on vcpu context switches.
2496
2497On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
2498from using fully eager FPU context switches.  This is currently implemented as
2499a global control.  By default, Xen will choose to use fully eager context
2500switches on hardware believed to speculate past #NM exceptions.
2501
2502On hardware supporting L1D_FLUSH, the `l1d-flush=` option can be used to force
2503or prevent Xen from issuing an L1 data cache flush on each VMEntry.
2504Irrespective of Xen's setting, the feature is virtualised for HVM guests to
2505use.  By default, Xen will enable this mitigation on hardware believed to be
2506vulnerable to L1TF.
2507
2508If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_BRANCH`, the
2509`branch-harden=` boolean can be used to force or prevent Xen from using
2510speculation barriers to protect selected conditional branches.  By default,
2511Xen will enable this mitigation.
2512
2513On hardware supporting SRBDS_CTRL, the `srb-lock=` option can be used to force
2514or prevent Xen from protect the Special Register Buffer from leaking stale
2515data. By default, Xen will enable this mitigation, except on parts where MDS
2516is fixed and TAA is fixed/mitigated and there are no unprivileged MMIO
2517mappings (in which case, there is believed to be no way for an attacker to
2518obtain stale data).
2519
2520The `unpriv-mmio=` boolean indicates whether the system has (or will have)
2521less than fully privileged domains granted access to MMIO devices.  By
2522default, this option is disabled.  If enabled, Xen will use the `FB_CLEAR`
2523and/or `SRBDS_CTRL` functionality available in the Intel May 2022 microcode
2524release to mitigate cross-domain leakage of data via the MMIO Stale Data
2525vulnerabilities.
2526
2527On all hardware, the `gds-mit=` option can be used to force or prevent Xen
2528from mitigating the GDS (Gather Data Sampling) vulnerability.  By default, Xen
2529will mitigate GDS on hardware believed to be vulnerable.  On hardware
2530supporting GDS_CTRL (requires the August 2023 microcode), and where firmware
2531has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate
2532GDS with.  Otherwise, Xen will mitigate by disabling AVX, which blocks the use
2533of the AVX2 Gather instructions.
2534
2535On all hardware, the `div-scrub=` option can be used to force or prevent Xen
2536from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
2537DIV-leakage on hardware believed to be vulnerable.
2538
2539If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
2540boolean can be used to force or prevent Xen from using speculation barriers to
2541protect lock critical regions.  This mitigation won't be engaged by default,
2542and needs to be explicitly enabled on the command line.
2543
2544### sync_console
2545> `= <boolean>`
2546
2547> Default: `false`
2548
2549Flag to force synchronous console output.  Useful for debugging, but
2550not suitable for production environments due to incurred overhead.
2551
2552### tboot (x86)
2553> `= 0x<phys_addr>`
2554
2555Specify the physical address of the trusted boot shared page.
2556
2557### tbuf_size
2558> `= <integer>`
2559
2560Specify the per-cpu trace buffer size in pages.
2561
2562### tdt (x86)
2563> `= <boolean>`
2564
2565> Default: `true`
2566
2567Flag to enable TSC deadline as the APIC timer mode.
2568
2569### tevt_mask
2570> `= <integer>`
2571
2572Specify a mask for Xen event tracing. This allows Xen tracing to be
2573enabled at boot. Refer to the xentrace(8) documentation for a list of
2574valid event mask values. In order to enable tracing, a buffer size (in
2575pages) must also be specified via the tbuf_size parameter.
2576
2577### tickle_one_idle_cpu
2578> `= <boolean>`
2579
2580### timer_slop
2581> `= <integer>`
2582
2583### tsc (x86)
2584> `= unstable | skewed | stable:socket`
2585
2586### tsx
2587    = <bool>
2588
2589    Applicability: x86
2590    Default: false on parts vulnerable to TAA, true otherwise
2591
2592Controls for the use of Transactional Synchronization eXtensions.
2593
2594Several microcode updates are relevant:
2595
2596 * March 2019, fixing the TSX memory ordering errata on all TSX-enabled CPUs
2597   to date.  Introduced MSR_TSX_FORCE_ABORT on SKL/SKX/KBL/WHL/CFL parts.  The
2598   errata workaround uses Performance Counter 3, so the user can select
2599   between working TSX and working perfcounters.
2600
2601 * November 2019, fixing the TSX Async Abort speculative vulnerability.
2602   Introduced MSR_TSX_CTRL on all TSX-enabled MDS_NO parts to date,
2603   CLX/WHL-R/CFL-R, with the controls becoming architectural moving forward
2604   and formally retiring HLE from the architecture.  The user can disable TSX
2605   to mitigate TAA, and elect to hide the HLE/RTM CPUID bits.  Also causes
2606   VERW to once-again flush the microarchiectural buffers in case a TAA
2607   mitigation is wanted along with TSX being enabled.
2608
2609 * June 2021, removing the workaround for March 2019 on client CPUs and
2610   formally de-featured TSX on SKL/KBL/WHL/CFL (Note: SKX still retains the
2611   March 2019 fix).  Introduced the ability to hide the HLE/RTM CPUID bits.
2612   PCR3 works fine, and TSX is disabled by default, but the user can re-enable
2613   TSX at their own risk, accepting that the memory order erratum is unfixed.
2614
2615 * February 2022, removing the VERW flushing workaround from November 2019 on
2616   client CPUs and formally de-featuring TSX on WHL-R/CFL-R (Note: CLX still
2617   retains the VERW flushing workaround).  TSX defaults to disabled, and is
2618   locked off when SGX is enabled in the BIOS.  When SGX is not enabled, TSX
2619   can be re-enabled at the users own risk, as it reintroduces the TSX Async
2620   Abort speculative vulnerability.
2621
2622On systems with the ability to configure TSX, this boolean offers system wide
2623control of whether TSX is enabled or disabled.
2624
2625When TSX is disabled, transactions unconditionally abort.  This is compatible
2626with the TSX spec, which requires software to have a non-transactional path as
2627a fallback.  The RTM and HLE CPUID bits are hidden from VMs by default, but
2628can be re-enabled if required.  This allows VMs which previously saw RTM/HLE
2629to be migrated in, although any TSX-enabled software will run with reduced
2630performance.
2631
2632 * When TSX is locked off by firmware, `tsx=` is ignored and treated as
2633   `false`.
2634
2635 * An explicit `tsx=` choice is honoured, even if it is `true` and would
2636   result in a vulnerable system.
2637
2638 * When no explicit `tsx=` choice is given, parts vulnerable to TAA will be
2639   mitigated by disabling TSX, as this is the lowest overhead option.
2640
2641 * When no explicit `tsx=` option is given, parts susceptible to the memory
2642   ordering errata default to `true` to enable working TSX.  Alternatively,
2643   selecting `tsx=0` will disable TSX and restore PCR3 to a working state.
2644
2645   SKX and SKL/KBL/WHL/CFL on pre-June 2021 microcode default to `true`.
2646   Alternatively, selecting `tsx=0` will disable TSX and restore PCR3 to a
2647   working state.
2648
2649   SKL/KBL/WHL/CFL on the June 2021 microcode or later default to `false`.
2650   Alternatively, selecting `tsx=1` will re-enable TSX at the users own risk.
2651
2652### ucode
2653> `= List of [ <integer> | scan=<bool>, nmi=<bool>, allow-same=<bool> ]`
2654
2655    Applicability: x86
2656    Default: `nmi`
2657
2658Controls for CPU microcode loading. For early loading, this parameter can
2659specify how and where to find the microcode update blob. For late loading,
2660this parameter specifies if the update happens within a NMI handler.
2661
2662'integer' specifies the CPU microcode update blob module index. When positive,
2663this specifies the n-th module (in the GrUB entry, zero based) to be used
2664for updating CPU micrcode. When negative, counting starts at the end of
2665the modules in the GrUB entry (so with the blob commonly being last,
2666one could specify `ucode=-1`). Note that the value of zero is not valid
2667here (entry zero, i.e. the first module, is always the Dom0 kernel
2668image). Note further that use of this option has an unspecified effect
2669when used with xen.efi (there the concept of modules doesn't exist, and
2670the blob gets specified via the `ucode=<filename>` config file/section
2671entry; see [EFI configuration file description](efi.html)).
2672
2673'scan' instructs the hypervisor to scan the multiboot images for an cpio
2674image that contains microcode. Depending on the platform the blob with the
2675microcode in the cpio name space must be:
2676  - on Intel: kernel/x86/microcode/GenuineIntel.bin
2677  - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
2678When using xen.efi, the `ucode=<filename>` config file setting takes
2679precedence over `scan`.
2680
2681'nmi' determines late loading is performed in NMI handler or just in
2682stop_machine context. In NMI handler, even NMIs are blocked, which is
2683considered safer. The default value is `true`.
2684
2685'allow-same' alters the default acceptance policy for new microcode to permit
2686trying to reload the same version.  Many CPUs will actually reload microcode
2687of the same version, and this allows for easy testing of the late microcode
2688loading path.
2689
2690### unrestricted_guest (Intel)
2691> `= <boolean>`
2692
2693### vcpu_migration_delay
2694> `= <integer>`
2695
2696> Default: `0`
2697
2698Specify a delay, in microseconds, between migrations of a VCPU between
2699PCPUs when using the credit1 scheduler. This prevents rapid fluttering
2700of a VCPU between CPUs, and reduces the implicit overheads such as
2701cache-warming. 1ms (1000) has been measured as a good value.
2702
2703### vesa-ram
2704> `= <integer>`
2705
2706> Default: `0`
2707
2708This allows to override the amount of video RAM, in MiB, determined to be
2709present.
2710
2711### vga
2712> `= ( ask | current | text-80x<rows> | gfx-<width>x<height>x<depth> | mode-<mode> )[,keep]`
2713
2714`ask` causes Xen to display a menu of available modes and request the
2715user to choose one of them.
2716
2717`current` causes Xen to use the graphics adapter in its current state,
2718without further setup.
2719
2720`text-80x<rows>` instructs Xen to set up text mode.  Valid values for
2721`<rows>` are `25, 28, 30, 34, 43, 50, 80`
2722
2723`gfx-<width>x<height>x<depth>` instructs Xen to set up graphics mode
2724with the specified width, height and depth.
2725
2726`mode-<mode>` instructs Xen to use a specific mode, as shown with the
2727`ask` option.  (N.B menu modes are displayed in hex, so `<mode>`
2728should be a hexadecimal number)
2729
2730The optional `keep` parameter causes Xen to continue using the vga
2731console even after dom0 has been started.  The default behaviour is to
2732relinquish control to dom0.
2733
2734### viridian-spinlock-retry-count (x86)
2735> `= <integer>`
2736
2737> Default: `2047`
2738
2739Specify the maximum number of retries before an enlightened Windows
2740guest will notify Xen that it has failed to acquire a spinlock.
2741
2742### viridian-version (x86)
2743> `= [<major>],[<minor>],[<build>]`
2744
2745> Default: `6,0,0x1772`
2746
2747<major>, <minor> and <build> must be integers. The values will be
2748encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
2749
2750### vm-notify-window (Intel)
2751> `= <integer>`
2752
2753> Default: `0`
2754
2755Specify the value of the VM Notify window used to detect locked VMs. Set to -1
2756to disable the feature.  Value is in units of crystal clock cycles.
2757
2758Note the hardware might add a threshold to the provided value in order to make
2759it safe, and hence using 0 is fine.
2760
2761### vpid (Intel)
2762> `= <boolean>`
2763
2764> Default: `true`
2765
2766Use Virtual Processor ID support if available.  This prevents the need for TLB
2767flushes on VM entry and exit, increasing performance.
2768
2769### vpmu (x86)
2770    = List of [ <bool>, bts, ipc, arch, rtm-abort=<bool> ]
2771
2772    Applicability: x86.  Default: false
2773
2774Controls for Performance Monitoring Unit virtualisation.
2775
2776Performance monitoring facilities tend to be very hardware specific, and
2777provide access to a wealth of low level processor information.
2778
2779*   An overall boolean can be used to enable or disable vPMU support.  vPMU is
2780    disabled by default.
2781
2782    When enabled, guests have full access to all performance counter settings,
2783    including model specific functionality.  This is a superset of the
2784    functionality offered by `ipc` and/or `arch`, but a subset of the
2785    functionality offered by `bts`.
2786
2787    Xen's watchdog functionality is implemented using performance counters.
2788    As a result, use of the **watchdog** option will override and disable
2789    vPMU.
2790
2791*   The `bts` option enables performance monitoring, and permits additional
2792    access to the Branch Trace Store controls.  BTS is an Intel feature where
2793    the processor can write data into a buffer whenever a branch occurs.
2794    However, as this feature isn't virtualised, a misconfiguration by the
2795    guest can lock the entire system up.
2796
2797*   The `ipc` option allows access to the most minimal set of counters
2798    possible: instructions, cycles, and reference cycles.  These can be used
2799    to calculate instructions per cycle (IPC).
2800
2801*   The `arch` option allows access to the pre-defined architectural events.
2802
2803*   The `rtm-abort` boolean has been superseded.  Use `tsx=0` instead.
2804
2805*Warning:*
2806As the virtualisation is not 100% safe, don't use the vpmu flag on
2807production systems (see https://xenbits.xen.org/xsa/advisory-163.html)!
2808
2809### vwfi (arm)
2810> `= trap | native`
2811
2812> Default: `trap`
2813
2814WFI is the ARM instruction to "wait for interrupt". WFE is similar and
2815means "wait for event". This option, which is ARM specific, changes the
2816way guest WFI and WFE are implemented in Xen. By default, Xen traps both
2817instructions. In the case of WFI, Xen blocks the guest vcpu; in the case
2818of WFE, Xen yield the guest vcpu. When setting vwfi to `native`, Xen
2819doesn't trap either instruction, running them in guest context. Setting
2820vwfi to `native` reduces irq latency significantly. It can also lead to
2821suboptimal scheduling decisions, but only when the system is
2822oversubscribed (i.e., in total there are more vCPUs than pCPUs).
2823
2824### watchdog (x86)
2825> `= force | <boolean>`
2826
2827> Default: `false`
2828
2829Run an NMI watchdog on each processor.  If a processor is stuck for
2830longer than the **watchdog_timeout**, a panic occurs.  When `force` is
2831specified, in addition to running an NMI watchdog on each processor,
2832unknown NMIs will still be processed.
2833
2834### watchdog_timeout (x86)
2835> `= <integer>`
2836
2837> Default: `5`
2838
2839Set the NMI watchdog timeout in seconds.  Specifying `0` will turn off
2840the watchdog.
2841
2842### x2apic (x86)
2843> `= <boolean>`
2844
2845> Default: `true`
2846
2847Permit use of x2apic setup for SMP environments.
2848
2849### x2apic-mode (x86)
2850> `= physical | cluster | mixed`
2851
2852> Default: `physical` if **FADT** mandates physical mode, otherwise set at
2853>          build time by CONFIG_X2APIC_{PHYSICAL,LOGICAL,MIXED}.
2854
2855In the case that x2apic is in use, this option switches between modes to
2856address APICs in the system as interrupt destinations.
2857
2858### x2apic_phys (x86)
2859> `= <boolean>`
2860
2861> Default: `true` if **FADT** mandates physical mode or if interrupt remapping
2862>          is not available, `false` otherwise.
2863
2864In the case that x2apic is in use, this option switches between physical and
2865clustered mode.  The default, given no hint from the **FADT**, is cluster
2866mode.
2867
2868**WARNING: `x2apic_phys` is deprecated and superseded by `x2apic-mode`.
2869The latter takes precedence if both are set.**
2870
2871### xenheap_megabytes (arm32)
2872> `= <size>`
2873
2874> Default: `0` (1/32 of RAM)
2875
2876Amount of RAM to set aside for the Xenheap. Must be an integer multiple of 32.
2877
2878By default will use 1/32 of the RAM up to a maximum of 1GB and with a
2879minimum of 32M, subject to a suitably aligned and sized contiguous
2880region of memory being available.
2881
2882### xpti (x86)
2883> `= List of [ default | <boolean> | dom0=<bool> | domu=<bool> ]`
2884
2885> Default: `false` on hardware known not to be vulnerable to Meltdown (e.g. AMD)
2886> Default: `true` everywhere else
2887
2888Override default selection of whether to isolate 64-bit PV guest page
2889tables.
2890
2891`true` activates page table isolation even on hardware not vulnerable by
2892Meltdown for all domains.
2893
2894`false` deactivates page table isolation on all systems for all domains.
2895
2896`default` sets the default behaviour.
2897
2898With `dom0` and `domu` it is possible to control page table isolation
2899for dom0 or guest domains only.
2900
2901### xsave (x86)
2902> `= <boolean>`
2903
2904> Default: `true`
2905
2906Permit use of the `xsave/xrstor` instructions.
2907
2908### xsm
2909> `= dummy | flask | silo`
2910
2911> Default: selectable via Kconfig.  Depends on enabled XSM modules.
2912
2913Specify which XSM module should be enabled.  This option is only available if
2914the hypervisor was compiled with `CONFIG_XSM` enabled.
2915
2916* `dummy`: this is the default choice.  Basic restriction for common deployment
2917  (the dummy module) will be applied.  It's also used when XSM is compiled out.
2918* `flask`: this is the policy based access control.  To choose this, the
2919  separated option in kconfig must also be enabled.
2920* `silo`: this will deny any unmediated communication channels between
2921  unprivileged VMs.  To choose this, the separated option in kconfig must also
2922  be enabled.
2923