1# LK Memory Management Overview 2 3# Abstract 4 5This article discusses the memory management system of the LK OS. 6 7First, we look at the initial memory mapping and allocator without VMM, and see how LK use them to sets up the virtual memory system. 8 9Then, the article explains the data structures LK uses to manage physical pages and virtual addresses. 10 11At the end, we show how LK uses these structures for page mapping. 12 13# Initial mapping 14 15Upon reset, LK will do the initial mapping according global variable mmu_initial_mapping. For ARM, the related code can be found at [arch/arm/arm/start.S#L137](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/arch/arm/arm/start.S#L137) 16 17mmu_initial_mapping layout depends on platform. Take qemu-virt-arm as an example. It is in [platform/qemu-virt-arm/platform.c](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/platform/qemu-virt-arm/platform.c#L36). The following demonstrate the layout. 18 19 20 21Initial Memory Mapping 22 23In LK, the term 'paddr' is used for referring physical memory addresses. After this initial mapping, the virtual addresses are referred to as 'kvaddr'. A one-to-one mapping always exists between 'paddr' and 'kvaddr'. The function 'paddr_to_kvaddr' is used to carry out this translation. 24 25```c 26// kernel/vm/vm.c 27void *paddr_to_kvaddr(paddr_t pa) { 28 struct mmu_initial_mapping *map = mmu_initial_mappings; 29 while (map->size > 0) { 30 if (!(map->flags & MMU_INITIAL_MAPPING_TEMPORARY) && 31 pa >= map->phys && 32 pa <= map->phys + map->size - 1) { 33 return (void *)(map->virt + (pa - map->phys)); 34 } 35 map++; 36 } 37 return NULL; 38} 39``` 40 41LK also uses the term 'vaddr'. This stands for the virtual address that is mapped by the virtual memory management — the real deal VMM. 'vaddr' is very flexible - the address could either be in the kernel space or the user space. The address can also change its mapping to different physical memories as needed. 'kvaddr' is always a 'vaddr', but it's important to note that not all 'vaddr' are 'kvaddr'. We'll go into more detail on this as we delve deeper into memory management. 42 43# Boot allocator 44 45qemu loads LK kernel at the beginning of the RAM. 46 47After initial mapping, but before the memory management gets set up, LK uses a boot allocator to allocate memory from the RAM. This allocator is simple and has two pointers - "boot_alloc_start" and "boot_alloc_end". Both pointers point to the end of LK kernel code at the beginning. When more memory is needed, the allocator returns the value of "boot_alloc_end" and moves it forward. The value of "boot_alloc_start" stays the same. 48 49 50 51LK kernel && Boot allocator 52 53The related code can be found at [kernel/vm/bootalloc.c](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/kernel/vm/bootalloc.c#L27) 54 55```c 56extern int _end; 57uintptr_t boot_alloc_start = (uintptr_t) &_end; 58uintptr_t boot_alloc_end = (uintptr_t) &_end; 59void *boot_alloc_mem(size_t len) { 60 ... 61 ptr = ALIGN(boot_alloc_end, 8); 62 boot_alloc_end = (ptr + ALIGN(len, 8)); 63 ... 64 return (void *)ptr; 65} 66``` 67 68“_end” is a link script variable that points to the end of kernel code. The link script is [arch/arm/system-onesegment.ld](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/arch/arm/system-onesegment.ld#L111) 69 70```c 71ENTRY(_start) 72SECTIONS 73{ 74 . = %KERNEL_BASE% + %KERNEL_LOAD_OFFSET%; 75 _start = .; 76 ... 77 _end = .; 78} 79``` 80 81“KERNEL_BASE” is 0x8000_0000, “KERNEL_LOAD_OFFSET” is not defined, default to 0. 82 83```c 84// arch/arm/rules.mk 85... 86KERNEL_BASE ?= 0x80000000 87KERNEL_LOAD_OFFSET ?= 0 88... 89``` 90 91# Physical Memory (pmm_arena) 92 93With the initial mapping and boot allocator, qemu-virt-arm’s platform code will sets up the structure for managing physical memory. 94 95LK uses a structure named 'pmm_arena' for a continuous chunk of physical memory. In our case, qemu-virt-arm only use one 'pmm_arena' to describe the whole RAM block. This structure includes basic information like size, paddr (called base), and kvaddr of the physical memory (check the Initial Mapping section if you need to remember what these mean). All 'pmm_arena' structures are put into a global linked list head ‘arena_list’. Although in qemu-virt-arm, there is only on pmm_arena. 96 97In LK, memory is managed at the page’s granularity, where each page is 4KB. The 'pmm_arena' structure includes a pointer to a 'vm_page' array. Each element in the array corresponds to one page in the 'pmm_arena'. 98 99Once the 'pmm_arena' is initialized, all 'vm_page' items are linked to the 'free_list' in the 'pmm_arena'. This means the Virtual Memory Manager (VMM) hasn't allocated these pages yet. We'll go deeper into the VMM and this field later on. 100 101 102 103pmm_arena 104 105Function 'pmm_add_arena’ is used to initialize 'pmm_arena’ structure. This happens when qemu-virt-arm setup mmu. 106 107```c 108platform/qemu-virt-arm/platform.c 109void platform_early_init(void) { 110 ... 111 /* add the main memory arena */ 112 pmm_add_arena(&arena); 113 ... 114} 115``` 116 117Here's one thing we should note. After the 'pmm_arena' is set up, all the pages are linked to the 'free_list'. However, not all the pages in the RAM are really free. For example, the LK’s code are located at the start of the RAM and 'pmm_arena’ itself also occupied some RAM (allocated by boot allocator). To tackle this issue, LK will adjusts the 'free_list' after finishing VMM setup. 118 119The code that adjusts the 'free_list' is located in the 'vm_init_preheap()' function. 120 121```c 122static void vm_init_preheap(uint level) { 123 vmm_init_preheap(); // <-- Init VMM 124 125 // Remove LK code's RAM from free_list 126 mark_pages_in_use((vaddr_t)&_start, ((uintptr_t)&_end - (uintptr_t)&_start)); 127 128 // Remove the RAM allocated via boot allocator from free_list 129 if (boot_alloc_end != boot_alloc_start) { 130 mark_pages_in_use(boot_alloc_start, boot_alloc_end - boot_alloc_start); 131 } 132} 133``` 134 135# Virtual Address (aspace) 136 137In LK, similar to many system, a process contains two separate virtual address spaces. One is the kernel address space, it is shared by all processes. Processes will use kernel address space when handling system calls. The other one is the user address space. User address space isn’t shared. Each process has its own user address space. All of the memory management structures are managed in the kernel address space. The initial mapping we talked about earlier is also in the kernel address space. 138 139 140 141Address Space 142 143It's worth noting that, unlike some other kernels, the kernel space in LK doesn’t contains user space mapping. It means the kernel code can’t access user space memory directly. 144 145The ranges for both the kernel and user address spaces are defined at: 146 147```c 148// arch/arm/rules.mk 149... 150KERNEL_ASPACE_BASE=0x40000000 151KERNEL_ASPACE_SIZE=0xc0000000 152 153// kernel/include/kernel/vm.h 154... 155#ifndef USER_ASPACE_BASE 156#define USER_ASPACE_BASE ((vaddr_t)0x01000000UL) 157#endif 158#ifndef USER_ASPACE_SIZE 159#define USER_ASPACE_SIZE ((vaddr_t)KERNEL_ASPACE_BASE - USER_ASPACE_BASE - 0x01000000UL) 160#endif 161``` 162 163LK uses a structure known as 'vmm_aspace' to represent an address space. There's a globally shared 'vmm_aspace' that describes the kernel space, and each process has its own private 'vmm_aspace' for the user space. 164 165The 'vmm_aspace' structure contains information about the base of the virtual address region (in our case, it's 0x0100_0000 for user space and 0x4000_0000 for kernel space), the size of the region, and most importantly, the 'paddr' and 'kvaddr' of the ARM page table associated with this 'aspace'. When a user space application is running, the page table in its private 'aspace' is used. When a system call is processed, the page table in the shared kernel’s 'aspace' is used. 166 167'aspace' contains a region_list. The list tracks which parts of the virtual address space are reserved or already in use. Each used or reserved virtual address region is represented by a 'vmm_region' structure. This structure contains the region’s base address and how big it is. 'vmm_region' is ordered by the base address. These addresses are called 'vaddr' (remember we mentioned that 'vaddr' includes 'kvaddr' in the Initial Mapping section?). 168 169 170 171There's one more thing that is need mentioned. The kernel aspace range spans from 0x4000_0000 to 0xFFFF_FFFF. This range contains the initial virtual space layout created during initial mapping (0x8000_0000 to 0xC000_0000 for RAM, and 0xC000_0000 to 0xFFFF_FFFF for periperal). So LK needs to tell kernel aspace that these virtual spaces are already in used. This is accomplished by calling the function vmm_reserve_space after kernel aspace is initialized. 172 173```c 174// kernel/vm/vm.c 175static void vm_init_postheap(uint level) { 176 vmm_init(); // <-- Init VMM (including kernel aspacew) 177 178 /* create vmm regions to cover what is already there from the initial mapping table */ 179 struct mmu_initial_mapping *map = mmu_initial_mappings; 180 while (map->size > 0) { 181 if (!(map->flags & MMU_INITIAL_MAPPING_TEMPORARY)) { 182 vmm_reserve_space(vmm_get_kernel_aspace(), map->name, map->size, map->virt); 183 } 184 map++; 185 } 186} 187``` 188 189In the function vmm_reserve_space, LK creates a vmm_region, inserts it into region_list. More importantly, the function also updates the page table (pointed by tt_virt / tt_phys) to tell the processor that these pages are reserved. 190 191# Physical to virtual mapping 192 193Up to this point, we've discussed how LK manages physical memory (via the 'pmm_arena' structures) and how it manages virtual addresses (through the 'vmm_aspace' structures). However, these concepts become truly useful when we are able to map a physical memory to a virtual address. 194 195To allocate a chunk of memory that is accessible via virtual address space, the first step is to find enough unallocated physical pages. It can be done by searching pmm_arena’s free_list and unlink vm_page from the list. 196 197 198 199unlink vm_page from free_list 200 201Notes that even though the 'vm_page's are unlinked from the 'pmm_arena's 'free_list', the 'pmm_arena' doesn’t lose the reference to the 'vm_page'. Because the 'vm_page's are part of the 'page_array', and the 'pmm_arena' still hold the pointer to 'page_array'. 202 203In the picture above, the allocated 'vm_page's are consecutive. This isn't a requirement - it's just shown this way for simplicity. In practice, it's possible that the 'vm_page's are scattered around or even come from different 'pmm_arena's (although qemu-virt-arm only has one pmm_arena). 204 205The pmm_alloc_pages function searches the free_list and unlinks the vm_page from it. 206 207```c 208// kernel/vm/pmm.c 209size_t pmm_alloc_pages(uint count, struct list_node *list) { 210 ... 211 /* walk the arenas in order, allocating as many pages as we can from each */ 212 pmm_arena_t *a; 213 list_for_every_entry(&arena_list, a, pmm_arena_t, node) { 214 while (allocated < count && a->free_count > 0) { 215 vm_page_t *page = list_remove_head_type(&a->free_list, vm_page_t, node); 216 ... 217 list_add_tail(list, &page->node); 218 allocated++; 219 } 220 } 221 ... 222 return allocated; 223} 224``` 225 226Now that the physical pages are ready, the next step is to allocate a virtual address region to map them. As mentioned in the “Virtual Address (aspace)” section earlier, a process's virtual address is managed by vmm_aspace. A allocated virtual address region is a vmm_region within vmm_aspace. Allocating a virtual address region involves searching through the vmm_aspace’s regions linked list, finding a large enough gap, creating a new vmm_region describing the gap, and inserting vmm_region into regions linked list. 227 228 229 230new vmm_region 231 232Please note that the vmm_region is sorted based on their base vaddr. In the illustration, the new vmm_region is depicted as the first element for clarity. However, in actual practice, the new vmm_region might not always be the first element. 233 234The process of finding a gap, creating a vmm_region, and inserting it into the regions linked list can be accomplished using the alloc_region function. 235 236```c 237kernel/vm/vmm.c 238 239static vmm_region_t *alloc_region(vmm_aspace_t *aspace, const char *name, size_t size, 240 vaddr_t vaddr, uint8_t align_pow2, 241 uint vmm_flags, uint region_flags, uint arch_mmu_flags) { 242 // create vmm_region 243 vmm_region_t *r = alloc_region_struct(name, vaddr, size, region_flags, arch_mmu_flags); 244 ... 245 // Find a gap in vmm_aspace 246 struct list_node *before = NULL; 247 vaddr = alloc_spot(aspace, size, align_pow2, arch_mmu_flags, &before); 248 ... 249 250 // Insert vmm_region into vmm_aspace.regions 251 r->base = (vaddr_t)vaddr; 252 list_add_after(before, &r->node); 253 } 254 255 return r; 256} 257``` 258 259We now have the physical vm_pages and the vmm_region. The final step is to map the vm_page to the vmm_region. To achieve this, we must update the page table. The vmm_aspace contains a pointer to the page table (as referenced in the “Virtual Address (aspace)” section above). In addition to updating the page table, LK will also link the vm_pages to the vmm_region’s page_list linked list. 260 261To handle all the aforementioned tasks, including allocating physical pages, setting up the virtual address region, and actual mapping, LK uses the vmm_alloc function. 262 263```c 264// kernel/vm/vmm.c 265status_t vmm_alloc(vmm_aspace_t *aspace, const char *name, size_t size, void **ptr, 266 uint8_t align_pow2, uint vmm_flags, uint arch_mmu_flags) { 267 ... 268 // Allocate physical pages 269 struct list_node page_list; 270 list_initialize(&page_list); 271 size_t count = pmm_alloc_pages(size / PAGE_SIZE, &page_list); 272 ... 273 // Allocate virtual address region 274 vmm_region_t *r = alloc_region(aspace, name, size, vaddr, align_pow2, vmm_flags, 275 VMM_REGION_FLAG_PHYSICAL, arch_mmu_flags); 276 ... 277 // 1. Link vm_page to vmm_region.page_list 278 // 2. Update page table 279 vm_page_t *p; 280 vaddr_t va = r->base; 281 while ((p = list_remove_head_type(&page_list, vm_page_t, node))) { 282 paddr_t pa = vm_page_to_paddr(p); 283 // Update page table 284 err = arch_mmu_map(&aspace->arch_aspace, va, pa, 1, arch_mmu_flags); 285 // Link vm_page to vmm_region.page_ 286 list_add_tail(&r->page_list, &p->node); 287 va += PAGE_SIZE; 288 } 289 ... 290} 291``` 292 293After the mapping is complete, the resulting graph looks as follows 294 295 296 297Physical Memory Mapping 298