1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4BIOS/EFI Configuration
5======================
6
7BIOS and EFI are largely responsible for configuring static information about
8devices (or potential future devices) such that Linux can build the appropriate
9logical representations of these devices.
10
11At a high level, this is what occurs during this phase of configuration.
12
13* The bootloader starts the BIOS/EFI.
14
15* BIOS/EFI do early device probe to determine static configuration
16
17* BIOS/EFI creates ACPI Tables that describe static config for the OS
18
19* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc)
20
21* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process.
22
23Much of what this section is concerned with is ACPI Table production and
24static memory map configuration. More detail on these tables can be found
25at :doc:`ACPI Tables <acpi>`.
26
27.. note::
28   Platform Vendors should read carefully, as this sections has recommendations
29   on physical memory region size and alignment, memory holes, HDM interleave,
30   and what linux expects of HDM decoders trying to work with these features.
31
32UEFI Settings
33=============
34If your platform supports it, the :code:`uefisettings` command can be used to
35read/write EFI settings. Changes will be reflected on the next reboot. Kexec
36is not a sufficient reboot.
37
38One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit.
39When this is enabled, this bit tells linux to defer management of a memory
40region to a driver (in this case, the CXL driver). Otherwise, the memory is
41treated as "normal memory", and is exposed to the page allocator during
42:code:`__init`.
43
44uefisettings examples
45---------------------
46
47:code:`uefisettings identify` ::
48
49        uefisettings identify
50
51        bios_vendor: xxx
52        bios_version: xxx
53        bios_release: xxx
54        bios_date: xxx
55        product_name: xxx
56        product_family: xxx
57        product_version: xxx
58
59On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL
60Memory Attribute` field.  This may be called something else on your platform.
61
62:code:`uefisettings get "CXL Memory Attribute"` ::
63
64        selector: xxx
65        ...
66        question: Question {
67            name: "CXL Memory Attribute",
68            answer: "Enabled",
69            ...
70        }
71
72Physical Memory Map
73===================
74
75Physical Address Region Alignment
76---------------------------------
77
78As of Linux v6.14, the hotplug memory system requires memory regions to be
79uniform in size and alignment.  While the CXL specification allows for memory
80regions as small as 256MB, the supported memory block size and alignment for
81hotplugged memory is architecture-defined.
82
83A Linux memory blocks may be as small as 128MB and increase in powers of two.
84
85* On ARM, the default block size and alignment is either 128MB or 256MB.
86
87* On x86, the default block size is 256MB, and increases to 2GB as the
88  capacity of the system increases up to 64GB.
89
90For best support across versions, platform vendors should place CXL memory at
91a 2GB aligned base address, and regions should be 2GB aligned.  This also helps
92prevent the creating thousands of memory devices (one per block).
93
94Memory Holes
95------------
96
97Holes in the memory map are tricky.  Consider a 4GB device located at base
98address 0x100000000, but with the following memory map ::
99
100  ---------------------
101  |    0x100000000    |
102  |        CXL        |
103  |    0x1BFFFFFFF    |
104  ---------------------
105  |    0x1C0000000    |
106  |    MEMORY HOLE    |
107  |    0x1FFFFFFFF    |
108  ---------------------
109  |    0x200000000    |
110  |     CXL CONT.     |
111  |    0x23FFFFFFF    |
112  ---------------------
113
114There are two issues to consider:
115
116* decoder programming, and
117* memory block alignment.
118
119If your architecture requires 2GB uniform size and aligned memory blocks, the
120only capacity Linux is capable of mapping (as of v6.14) would be the capacity
121from `0x100000000-0x180000000`.  The remaining capacity will be stranded, as
122they are not of 2GB aligned length.
123
124Assuming your architecture and memory configuration allows 1GB memory blocks,
125this memory map is supported and this should be presented as multiple CFMWS
126in the CEDT that describe each side of the memory hole separately - along with
127matching decoders.
128
129Multiple decoders can (and should) be used to manage such a memory hole (see
130below), but each chunk of a memory hole should be aligned to a reasonable block
131size (larger alignment is always better).  If you intend to have memory holes
132in the memory map, expect to use one decoder per contiguous chunk of host
133physical memory.
134
135As of v6.14, Linux does provide support for memory hotplug of multiple
136physical memory regions separated by a memory hole described by a single
137HDM decoder.
138
139
140Decoder Programming
141===================
142If BIOS/EFI intends to program the decoders to be statically configured,
143there are a few things to consider to avoid major pitfalls that will
144prevent Linux compatibility.  Some of these recommendations are not
145required "per the specification", but Linux makes no guarantees of support
146otherwise.
147
148
149Translation Point
150-----------------
151Per the specification, the only decoders which **TRANSLATE** Host Physical
152Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**.
153All other decoders in the fabric are intended to route accesses without
154translating the addresses.
155
156This is heavily implied by the specification, see: ::
157
158  CXL Specification 3.1
159  8.2.4.20: CXL HDM Decoder Capability Structure
160  - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow
161  - Implementation Note: Device Decoder Logic
162
163Given this, Linux makes a strong assumption that decoders between CPU and
164endpoint will all be programmed with addresses ranges that are subsets of
165their parent decoder.
166
167Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications
168"hand off" responsibility between domains, some early adopting platforms
169attempted to do translation at the originating memory controller or host
170bridge.  This configuration requires a platform specific extension to the
171driver and is not officially endorsed - despite being supported.
172
173It is *highly recommended* **NOT** to do this; otherwise, you are on your own
174to implement driver support for your platform.
175
176Interleave and Configuration Flexibility
177----------------------------------------
178If providing cross-host-bridge interleave, a CFMWS entry in the :doc:`CEDT
179<acpi/cedt>` must be presented with target host-bridges for the interleaved
180device sets (there may be multiple behind each host bridge).
181
182If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is
183required for that host bridge - if it covers the entire capacity of the devices
184behind the host bridge.
185
186If intending to provide users flexibility in programming decoders beyond the
187root, you may want to provide multiple CFMWS entries in the CEDT intended for
188different purposes.  For example, you may want to consider adding:
189
1901) A CFMWS entry to cover all interleavable host bridges.
1912) A CFMWS entry to cover all devices on a single host bridge.
1923) A CFMWS entry to cover each device.
193
194A platform may choose to add all of these, or change the mode based on a BIOS
195setting.  For each CFMWS entry, Linux expects descriptions of the described
196memory regions in the :doc:`SRAT <acpi/srat>` to determine the number of
197NUMA nodes it should reserve during early boot / init.
198
199As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if
200a matching SRAT entry does not exist; however, this is not guaranteed in the
201future and such a configuration should be avoided.
202
203Memory Holes
204------------
205If your platform includes memory holes intersparsed between your CXL memory, it
206is recommended to utilize multiple decoders to cover these regions of memory,
207rather than try to program the decoders to accept the entire range and expect
208Linux to manage the overlap.
209
210For example, consider the Memory Hole described above ::
211
212  ---------------------
213  |    0x100000000    |
214  |        CXL        |
215  |    0x1BFFFFFFF    |
216  ---------------------
217  |    0x1C0000000    |
218  |    MEMORY HOLE    |
219  |    0x1FFFFFFFF    |
220  ---------------------
221  |    0x200000000    |
222  |     CXL CONT.     |
223  |    0x23FFFFFFF    |
224  ---------------------
225
226Assuming this is provided by a single device attached directly to a host bridge,
227Linux would expect the following decoder programming ::
228
229     -----------------------   -----------------------
230     | root-decoder-0      |   | root-decoder-1      |
231     |   base: 0x100000000 |   |   base: 0x200000000 |
232     |   size:  0xC0000000 |   |   size:  0x40000000 |
233     -----------------------   -----------------------
234                |                         |
235     -----------------------   -----------------------
236     | HB-decoder-0        |   | HB-decoder-1        |
237     |   base: 0x100000000 |   |   base: 0x200000000 |
238     |   size:  0xC0000000 |   |   size:  0x40000000 |
239     -----------------------   -----------------------
240                |                         |
241     -----------------------   -----------------------
242     | ep-decoder-0        |   | ep-decoder-1        |
243     |   base: 0x100000000 |   |   base: 0x200000000 |
244     |   size:  0xC0000000 |   |   size:  0x40000000 |
245     -----------------------   -----------------------
246
247With a CEDT configuration with two CFMWS describing the above root decoders.
248
249Linux makes no guarantee of support for strange memory hole situations.
250
251Multi-Media Devices
252-------------------
253The CFMWS field of the CEDT has special restriction bits which describe whether
254the described memory region allows volatile or persistent memory (or both). If
255the platform intends to support either:
256
2571) A device with multiple medias, or
2582) Using a persistent memory device as normal memory
259
260A platform may wish to create multiple CEDT CFMWS entries to describe the same
261memory, with the intent of allowing the end user flexibility in how that memory
262is configured. Linux does not presently have strong requirements in this area.
263