1# LK Memory Management Overview
2
3# Abstract
4
5This article discusses the memory management system of the LK OS.
6
7First, we look at the initial memory mapping and allocator without VMM, and see how LK use them to sets up the virtual memory system.
8
9Then, the article explains the data structures LK uses to manage physical pages and virtual addresses.
10
11At the end, we show how LK uses these structures for page mapping.
12
13# Initial mapping
14
15Upon reset, LK will do the initial mapping according global variable mmu_initial_mapping. For ARM, the related code can be found at [arch/arm/arm/start.S#L137](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/arch/arm/arm/start.S#L137)
16
17mmu_initial_mapping layout depends on platform. Take qemu-virt-arm as an example. It is in [platform/qemu-virt-arm/platform.c](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/platform/qemu-virt-arm/platform.c#L36). The following demonstrate the layout.
18
19![Initial Memory Mapping](vmm_overview/Initial_Memory_Mapping.png)
20
21Initial Memory Mapping
22
23In LK, the term 'paddr' is used for referring physical memory addresses. After this initial mapping, the virtual addresses are referred to as 'kvaddr'. A one-to-one mapping always exists between 'paddr' and 'kvaddr'. The function 'paddr_to_kvaddr' is used to carry out this translation.
24
25```c
26// kernel/vm/vm.c
27void *paddr_to_kvaddr(paddr_t pa) {
28    struct mmu_initial_mapping *map = mmu_initial_mappings;
29    while (map->size > 0) {
30        if (!(map->flags & MMU_INITIAL_MAPPING_TEMPORARY) &&
31                pa >= map->phys &&
32                pa <= map->phys + map->size - 1) {
33            return (void *)(map->virt + (pa - map->phys));
34        }
35        map++;
36    }
37    return NULL;
38}
39```
40
41LK also uses the term 'vaddr'. This stands for the virtual address that is mapped by the virtual memory management — the real deal VMM. 'vaddr' is very flexible - the address could either be in the kernel space or the user space. The address can also change its mapping to different physical memories as needed. 'kvaddr' is always a 'vaddr', but it's important to note that not all 'vaddr' are 'kvaddr'. We'll go into more detail on this as we delve deeper into memory management.
42
43# Boot allocator
44
45qemu loads LK kernel at the beginning of the RAM.
46
47After initial mapping, but before the memory management gets set up, LK uses a boot allocator to allocate memory from the RAM. This allocator is simple and has two pointers - "boot_alloc_start" and "boot_alloc_end". Both pointers point to the end of LK kernel code at the beginning. When more memory is needed, the allocator returns the value of "boot_alloc_end" and moves it forward. The value of "boot_alloc_start" stays the same.
48
49![LK kernel && Boot allocator](vmm_overview/LK_Kernel_And_Boot_Allocator.png)
50
51LK kernel && Boot allocator
52
53The related code can be found at [kernel/vm/bootalloc.c](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/kernel/vm/bootalloc.c#L27)
54
55```c
56extern int _end;
57uintptr_t boot_alloc_start = (uintptr_t) &_end;
58uintptr_t boot_alloc_end = (uintptr_t) &_end;
59void *boot_alloc_mem(size_t len) {
60    ...
61    ptr = ALIGN(boot_alloc_end, 8);
62    boot_alloc_end = (ptr + ALIGN(len, 8));
63    ...
64    return (void *)ptr;
65}
66```
67
68“_end” is a link script variable that points to the end of kernel code. The link script is [arch/arm/system-onesegment.ld](https://github.com/littlekernel/lk/blob/7102838b49ef19bda9301968f04634bacd6d6d7a/arch/arm/system-onesegment.ld#L111)
69
70```c
71ENTRY(_start)
72SECTIONS
73{
74    . = %KERNEL_BASE% + %KERNEL_LOAD_OFFSET%;
75    _start = .;
76    ...
77    _end = .;
78}
79```
80
81“KERNEL_BASE” is 0x8000_0000, “KERNEL_LOAD_OFFSET” is not defined, default to 0.
82
83```c
84// arch/arm/rules.mk
85...
86KERNEL_BASE ?= 0x80000000
87KERNEL_LOAD_OFFSET ?= 0
88...
89```
90
91# Physical Memory (pmm_arena)
92
93With the initial mapping and boot allocator, qemu-virt-arm’s platform code will sets up the structure for managing physical memory.
94
95LK uses a structure named 'pmm_arena' for a continuous chunk of physical memory. In our case, qemu-virt-arm only use one 'pmm_arena'  to describe the whole RAM block. This structure includes basic information like size, paddr (called base), and kvaddr of the physical memory (check the Initial Mapping section if you need to remember what these mean). All 'pmm_arena' structures are put into a global linked list head ‘arena_list’. Although in qemu-virt-arm, there is only on pmm_arena.
96
97In LK, memory is managed at the page’s granularity, where each page is 4KB. The 'pmm_arena' structure includes a pointer to a 'vm_page' array. Each element in the array corresponds to one page in the 'pmm_arena'.
98
99Once the 'pmm_arena' is initialized, all 'vm_page' items are linked to the 'free_list' in the 'pmm_arena'. This means the Virtual Memory Manager (VMM) hasn't allocated these pages yet. We'll go deeper into the VMM and this field later on.
100
101![pmm_arena](vmm_overview/pmm_arena.png)
102
103pmm_arena
104
105Function 'pmm_add_arena’ is used to initialize 'pmm_arena’ structure. This happens when qemu-virt-arm setup mmu.
106
107```c
108platform/qemu-virt-arm/platform.c
109void platform_early_init(void) {
110    ...
111    /* add the main memory arena */
112    pmm_add_arena(&arena);
113    ...
114}
115```
116
117Here's one thing we should note. After the 'pmm_arena' is set up, all the pages are linked to the 'free_list'. However, not all the pages in the RAM are really free. For example, the LK’s code are located at the start of the RAM and 'pmm_arena’ itself also occupied some RAM (allocated by boot allocator). To tackle this issue, LK will adjusts the 'free_list' after finishing VMM setup.
118
119The code that adjusts the 'free_list' is located in the 'vm_init_preheap()' function.
120
121```c
122static void vm_init_preheap(uint level) {
123    vmm_init_preheap();  // <-- Init VMM
124
125    // Remove LK code's RAM from free_list
126    mark_pages_in_use((vaddr_t)&_start, ((uintptr_t)&_end - (uintptr_t)&_start));
127
128    // Remove the RAM allocated via boot allocator from free_list
129    if (boot_alloc_end != boot_alloc_start) {
130        mark_pages_in_use(boot_alloc_start, boot_alloc_end - boot_alloc_start);
131    }
132}
133```
134
135# Virtual Address (aspace)
136
137In LK, similar to many system, a process contains two separate virtual address spaces. One is the kernel address space, it is shared by all processes. Processes will use kernel address space when handling system calls. The other one is the user address space. User address space isn’t shared. Each process has its own user address space. All of the memory management structures are managed in the kernel address space. The initial mapping we talked about earlier is also in the kernel address space.
138
139![Address Space](vmm_overview/address_space.png)
140
141Address Space
142
143It's worth noting that, unlike some other kernels, the kernel space in LK doesn’t contains user space mapping. It means the kernel code can’t access user space memory directly.
144
145The ranges for both the kernel and user address spaces are defined at:
146
147```c
148// arch/arm/rules.mk
149...
150KERNEL_ASPACE_BASE=0x40000000
151KERNEL_ASPACE_SIZE=0xc0000000
152
153// kernel/include/kernel/vm.h
154...
155#ifndef USER_ASPACE_BASE
156#define USER_ASPACE_BASE ((vaddr_t)0x01000000UL)
157#endif
158#ifndef USER_ASPACE_SIZE
159#define USER_ASPACE_SIZE ((vaddr_t)KERNEL_ASPACE_BASE - USER_ASPACE_BASE - 0x01000000UL)
160#endif
161```
162
163LK uses a structure known as 'vmm_aspace' to represent an address space. There's a globally shared 'vmm_aspace' that describes the kernel space, and each process  has its own private 'vmm_aspace' for the user space.
164
165The 'vmm_aspace' structure contains information about the base of the virtual address region (in our case, it's 0x0100_0000 for user space and 0x4000_0000 for kernel space), the size of the region, and most importantly, the 'paddr' and 'kvaddr' of the ARM page table associated with this 'aspace'. When a user space application is running, the page table in its private 'aspace'  is used. When a system call is processed, the page table in the shared kernel’s 'aspace' is used.
166
167'aspace' contains a region_list. The list tracks which parts of the virtual address space are reserved or already in use. Each used or reserved virtual address region is represented by a 'vmm_region' structure. This structure contains the region’s base address and how big it is. 'vmm_region' is ordered by the base address. These addresses are called 'vaddr' (remember we mentioned that 'vaddr' includes 'kvaddr' in the Initial Mapping section?).
168
169![vmm_aspace](vmm_overview/vmm_aspace.png)
170
171There's one more thing that is need mentioned. The kernel aspace range spans from 0x4000_0000 to 0xFFFF_FFFF. This range contains the initial virtual space layout created during initial mapping (0x8000_0000 to 0xC000_0000 for RAM, and 0xC000_0000 to 0xFFFF_FFFF for periperal). So LK needs to tell kernel aspace that these virtual spaces are already in used. This is accomplished by calling the function vmm_reserve_space after kernel aspace is initialized.
172
173```c
174// kernel/vm/vm.c
175static void vm_init_postheap(uint level) {
176    vmm_init();   // <-- Init VMM (including kernel aspacew)
177
178    /* create vmm regions to cover what is already there from the initial mapping table */
179    struct mmu_initial_mapping *map = mmu_initial_mappings;
180    while (map->size > 0) {
181        if (!(map->flags & MMU_INITIAL_MAPPING_TEMPORARY)) {
182            vmm_reserve_space(vmm_get_kernel_aspace(), map->name, map->size, map->virt);
183        }
184        map++;
185    }
186}
187```
188
189In the function vmm_reserve_space, LK creates a vmm_region, inserts it into region_list. More importantly, the function also updates the page table (pointed by tt_virt / tt_phys) to tell the processor that these pages are reserved.
190
191# Physical to virtual mapping
192
193Up to this point, we've discussed how LK manages physical memory (via the 'pmm_arena' structures) and how it manages virtual addresses (through the 'vmm_aspace' structures). However, these concepts become truly useful when we are able to map a physical memory to a virtual address.
194
195To allocate a chunk of memory that is accessible via virtual address space, the first step is to find enough unallocated physical pages. It can be done by searching pmm_arena’s free_list and unlink vm_page from the list.
196
197![unlink vm_page from free_list](vmm_overview/find_free_vm_page.png)
198
199unlink vm_page from free_list
200
201Notes that even though the 'vm_page's are unlinked from the 'pmm_arena's 'free_list', the 'pmm_arena' doesn’t lose the reference to the 'vm_page'. Because the 'vm_page's are part of the 'page_array', and the 'pmm_arena' still hold the pointer to 'page_array'.
202
203In the picture above, the allocated 'vm_page's are consecutive. This isn't a requirement - it's just shown this way for simplicity. In practice, it's possible that the 'vm_page's are scattered around or even come from different 'pmm_arena's (although qemu-virt-arm only has one pmm_arena).
204
205The pmm_alloc_pages function searches the free_list and unlinks the vm_page from it.
206
207```c
208// kernel/vm/pmm.c
209size_t pmm_alloc_pages(uint count, struct list_node *list) {
210    ...
211    /* walk the arenas in order, allocating as many pages as we can from each */
212    pmm_arena_t *a;
213    list_for_every_entry(&arena_list, a, pmm_arena_t, node) {
214        while (allocated < count && a->free_count > 0) {
215            vm_page_t *page = list_remove_head_type(&a->free_list, vm_page_t, node);
216            ...
217            list_add_tail(list, &page->node);
218            allocated++;
219        }
220    }
221    ...
222    return allocated;
223}
224```
225
226Now that the physical pages are ready, the next step is to allocate a virtual address region to map them. As mentioned in the “Virtual Address (aspace)” section earlier, a process's virtual address is managed by vmm_aspace. A allocated virtual address region is a vmm_region within vmm_aspace. Allocating a virtual address region involves searching through the vmm_aspace’s regions linked list, finding a large enough gap, creating a new vmm_region describing the gap, and inserting vmm_region into regions linked list.
227
228![new vmm_region](vmm_overview/new_vmm_region.png)
229
230new vmm_region
231
232Please note that the vmm_region is sorted based on their base vaddr. In the illustration, the new vmm_region is depicted as the first element for clarity. However, in actual practice, the new vmm_region might not always be the first element.
233
234The process of finding a gap, creating a vmm_region, and inserting it into the regions linked list can be accomplished using the alloc_region function.
235
236```c
237kernel/vm/vmm.c
238
239static vmm_region_t *alloc_region(vmm_aspace_t *aspace, const char *name, size_t size,
240                                  vaddr_t vaddr, uint8_t align_pow2,
241                                  uint vmm_flags, uint region_flags, uint arch_mmu_flags) {
242    // create vmm_region
243    vmm_region_t *r = alloc_region_struct(name, vaddr, size, region_flags, arch_mmu_flags);
244    ...
245        // Find a gap in vmm_aspace
246        struct list_node *before = NULL;
247        vaddr = alloc_spot(aspace, size, align_pow2, arch_mmu_flags, &before);
248        ...
249
250        // Insert vmm_region into vmm_aspace.regions
251        r->base = (vaddr_t)vaddr;
252        list_add_after(before, &r->node);
253    }
254
255    return r;
256}
257```
258
259We now have the physical vm_pages and the vmm_region. The final step is to map the vm_page to the vmm_region. To achieve this, we must update the page table. The vmm_aspace contains a pointer to the page table (as referenced in the “Virtual Address (aspace)” section above). In addition to updating the page table, LK will also link the vm_pages to the vmm_region’s page_list linked list.
260
261To handle all the aforementioned tasks, including allocating physical pages, setting up the virtual address region, and actual mapping, LK uses the vmm_alloc function.
262
263```c
264// kernel/vm/vmm.c
265status_t vmm_alloc(vmm_aspace_t *aspace, const char *name, size_t size, void **ptr,
266                   uint8_t align_pow2, uint vmm_flags, uint arch_mmu_flags) {
267    ...
268    // Allocate physical pages
269    struct list_node page_list;
270    list_initialize(&page_list);
271    size_t count = pmm_alloc_pages(size / PAGE_SIZE, &page_list);
272    ...
273    // Allocate virtual address region
274    vmm_region_t *r = alloc_region(aspace, name, size, vaddr, align_pow2, vmm_flags,
275                                   VMM_REGION_FLAG_PHYSICAL, arch_mmu_flags);
276    ...
277    // 1. Link vm_page to vmm_region.page_list
278    // 2. Update page table
279    vm_page_t *p;
280    vaddr_t va = r->base;
281    while ((p = list_remove_head_type(&page_list, vm_page_t, node))) {
282        paddr_t pa = vm_page_to_paddr(p);
283        // Update page table
284        err = arch_mmu_map(&aspace->arch_aspace, va, pa, 1, arch_mmu_flags);
285        // Link vm_page to vmm_region.page_
286        list_add_tail(&r->page_list, &p->node);
287        va += PAGE_SIZE;
288    }
289    ...
290}
291```
292
293After the mapping is complete, the resulting graph looks as follows
294
295![Physical Memory Mapping](vmm_overview/physical_memory_mapping.png)
296
297Physical Memory Mapping
298