1.. SPDX-License-Identifier: GPL-2.0 2 3========================== 4The Linux Microcode Loader 5========================== 6 7:Authors: - Fenghua Yu <fenghua.yu@intel.com> 8 - Borislav Petkov <bp@suse.de> 9 - Ashok Raj <ashok.raj@intel.com> 10 11The kernel has a x86 microcode loading facility which is supposed to 12provide microcode loading methods in the OS. Potential use cases are 13updating the microcode on platforms beyond the OEM End-Of-Life support, 14and updating the microcode on long-running systems without rebooting. 15 16The loader supports three loading methods: 17 18Early load microcode 19==================== 20 21The kernel can update microcode very early during boot. Loading 22microcode early can fix CPU issues before they are observed during 23kernel boot time. 24 25The microcode is stored in an initrd file. During boot, it is read from 26it and loaded into the CPU cores. 27 28The format of the combined initrd image is microcode in (uncompressed) 29cpio format followed by the (possibly compressed) initrd image. The 30loader parses the combined initrd image during boot. 31 32The microcode files in cpio name space are: 33 34on Intel: 35 kernel/x86/microcode/GenuineIntel.bin 36on AMD : 37 kernel/x86/microcode/AuthenticAMD.bin 38 39During BSP (BootStrapping Processor) boot (pre-SMP), the kernel 40scans the microcode file in the initrd. If microcode matching the 41CPU is found, it will be applied in the BSP and later on in all APs 42(Application Processors). 43 44The loader also saves the matching microcode for the CPU in memory. 45Thus, the cached microcode patch is applied when CPUs resume from a 46sleep state. 47 48Here's a crude example how to prepare an initrd with microcode (this is 49normally done automatically by the distribution, when recreating the 50initrd, so you don't really have to do it yourself. It is documented 51here for future reference only). 52:: 53 54 #!/bin/bash 55 56 if [ -z "$1" ]; then 57 echo "You need to supply an initrd file" 58 exit 1 59 fi 60 61 INITRD="$1" 62 63 DSTDIR=kernel/x86/microcode 64 TMPDIR=/tmp/initrd 65 66 rm -rf $TMPDIR 67 68 mkdir $TMPDIR 69 cd $TMPDIR 70 mkdir -p $DSTDIR 71 72 if [ -d /lib/firmware/amd-ucode ]; then 73 cat /lib/firmware/amd-ucode/microcode_amd*.bin > $DSTDIR/AuthenticAMD.bin 74 fi 75 76 if [ -d /lib/firmware/intel-ucode ]; then 77 cat /lib/firmware/intel-ucode/* > $DSTDIR/GenuineIntel.bin 78 fi 79 80 find . | cpio -o -H newc >../ucode.cpio 81 cd .. 82 mv $INITRD $INITRD.orig 83 cat ucode.cpio $INITRD.orig > $INITRD 84 85 rm -rf $TMPDIR 86 87 88The system needs to have the microcode packages installed into 89/lib/firmware or you need to fixup the paths above if yours are 90somewhere else and/or you've downloaded them directly from the processor 91vendor's site. 92 93Late loading 94============ 95 96You simply install the microcode packages your distro supplies and 97run:: 98 99 # echo 1 > /sys/devices/system/cpu/microcode/reload 100 101as root. 102 103The loading mechanism looks for microcode blobs in 104/lib/firmware/{intel-ucode,amd-ucode}. The default distro installation 105packages already put them there. 106 107Since kernel 5.19, late loading is not enabled by default. 108 109The /dev/cpu/microcode method has been removed in 5.19. 110 111Why is late loading dangerous? 112============================== 113 114Synchronizing all CPUs 115---------------------- 116 117The microcode engine which receives the microcode update is shared 118between the two logical threads in a SMT system. Therefore, when 119the update is executed on one SMT thread of the core, the sibling 120"automatically" gets the update. 121 122Since the microcode can "simulate" MSRs too, while the microcode update 123is in progress, those simulated MSRs transiently cease to exist. This 124can result in unpredictable results if the SMT sibling thread happens to 125be in the middle of an access to such an MSR. The usual observation is 126that such MSR accesses cause #GPs to be raised to signal that former are 127not present. 128 129The disappearing MSRs are just one common issue which is being observed. 130Any other instruction that's being patched and gets concurrently 131executed by the other SMT sibling, can also result in similar, 132unpredictable behavior. 133 134To eliminate this case, a stop_machine()-based CPU synchronization was 135introduced as a way to guarantee that all logical CPUs will not execute 136any code but just wait in a spin loop, polling an atomic variable. 137 138While this took care of device or external interrupts, IPIs including 139LVT ones, such as CMCI etc, it cannot address other special interrupts 140that can't be shut off. Those are Machine Check (#MC), System Management 141(#SMI) and Non-Maskable interrupts (#NMI). 142 143Machine Checks 144-------------- 145 146Machine Checks (#MC) are non-maskable. There are two kinds of MCEs. 147Fatal un-recoverable MCEs and recoverable MCEs. While un-recoverable 148errors are fatal, recoverable errors can also happen in kernel context 149are also treated as fatal by the kernel. 150 151On certain Intel machines, MCEs are also broadcast to all threads in a 152system. If one thread is in the middle of executing WRMSR, a MCE will be 153taken at the end of the flow. Either way, they will wait for the thread 154performing the wrmsr(0x79) to rendezvous in the MCE handler and shutdown 155eventually if any of the threads in the system fail to check in to the 156MCE rendezvous. 157 158To be paranoid and get predictable behavior, the OS can choose to set 159MCG_STATUS.MCIP. Since MCEs can be at most one in a system, if an 160MCE was signaled, the above condition will promote to a system reset 161automatically. OS can turn off MCIP at the end of the update for that 162core. 163 164System Management Interrupt 165--------------------------- 166 167SMIs are also broadcast to all CPUs in the platform. Microcode update 168requests exclusive access to the core before writing to MSR 0x79. So if 169it does happen such that, one thread is in WRMSR flow, and the 2nd got 170an SMI, that thread will be stopped in the first instruction in the SMI 171handler. 172 173Since the secondary thread is stopped in the first instruction in SMI, 174there is very little chance that it would be in the middle of executing 175an instruction being patched. Plus OS has no way to stop SMIs from 176happening. 177 178Non-Maskable Interrupts 179----------------------- 180 181When thread0 of a core is doing the microcode update, if thread1 is 182pulled into NMI, that can cause unpredictable behavior due to the 183reasons above. 184 185OS can choose a variety of methods to avoid running into this situation. 186 187 188Is the microcode suitable for late loading? 189------------------------------------------- 190 191Late loading is done when the system is fully operational and running 192real workloads. Late loading behavior depends on what the base patch on 193the CPU is before upgrading to the new patch. 194 195This is true for Intel CPUs. 196 197Consider, for example, a CPU has patch level 1 and the update is to 198patch level 3. 199 200Between patch1 and patch3, patch2 might have deprecated a software-visible 201feature. 202 203This is unacceptable if software is even potentially using that feature. 204For instance, say MSR_X is no longer available after an update, 205accessing that MSR will cause a #GP fault. 206 207Basically there is no way to declare a new microcode update suitable 208for late-loading. This is another one of the problems that caused late 209loading to be not enabled by default. 210 211Builtin microcode 212================= 213 214The loader supports also loading of a builtin microcode supplied through 215the regular builtin firmware method CONFIG_EXTRA_FIRMWARE. Only 64-bit is 216currently supported. 217 218Here's an example:: 219 220 CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09 amd-ucode/microcode_amd_fam15h.bin" 221 CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware" 222 223This basically means, you have the following tree structure locally:: 224 225 /lib/firmware/ 226 |-- amd-ucode 227 ... 228 | |-- microcode_amd_fam15h.bin 229 ... 230 |-- intel-ucode 231 ... 232 | |-- 06-3a-09 233 ... 234 235so that the build system can find those files and integrate them into 236the final kernel image. The early loader finds them and applies them. 237 238Needless to say, this method is not the most flexible one because it 239requires rebuilding the kernel each time updated microcode from the CPU 240vendor is available. 241