1Zircon Scheduling 2================= 3 4Background 5========== 6 7The primary responsibility of any scheduler is to share the limited 8resource of processor time between all threads that wish to use it. In a 9general purpose operating system, it tries to do so in a fair way, 10ensuring that all threads are allowed to make some progress. 11 12Our scheduler is an evolution of LK’s scheduler. As such it 13started as a minimal scheduler implementation and was extended to meet 14our needs as the project grew. 15 16Design 17====== 18 19#### Overview 20 21In essence there is a scheduler running on each logical CPU in the machine. 22These schedulers run independently and use IPI (Inter-Processor 23Interrupts) to coordinate. However each CPU is responsible for 24scheduling the threads that are running on it. See [*CPU 25Assignment*](#cpu-assignment-and-migration) below for how we decide 26which CPU a thread is on, and how/when it migrates. 27 28Each CPU has its own set of priority queues. One for each priority level 29in the system, currently 32. Note that these are fifo queues, not the data 30structure known as a priority queue. In each queue is an ordered list of 31runnable threads awaiting execution. When it is time for a new thread to run, 32the scheduler simply looks at the highest numbered queue that contains a thread, 33pops the head off of that queue and runs that thread.See 34[*Priority Management*](#priority-management) below for more details 35about how it decides which thread should be in which queue. If there are no 36threads in the queues to run it will instead run the idle thread, see [*Realtime 37and Idle Threads*](#realtime-and-idle-threads) below for more details. 38 39Each thread is assigned the same timeslice size (THREAD_INITIAL_TIME_SLICE) 40when it is picked to start running. If it uses its whole timeslice it will be 41reinserted at the end of the appropriate priority queue. However if it has 42some of its timeslice remaining from a previous run it will be inserted at the 43head of the priority queue so it will be able to resume as quickly as possible. 44When it is picked back up again it will only run for the remainder of its 45previous timeslice. 46 47When the scheduler selects a new thread from the priority queue it sets 48the CPU's preemption timer for either a full timeslice, or the remainder of the 49previous timeslice. When that timer fires the scheduler will stop execution on 50that thread, add it to the appropriate queue, select another thread and start 51over again. 52 53If a thread blocks waiting for a shared resource then it's taken out of 54its priority queue and is placed in a wait queue for the shared resource. 55When it is unblocked it will be reinserted in the appropriate priority 56queue of an eligible CPU ([*CPU 57Assignment*](#cpu-assignment-and-migration)) and if it had remaining timeslice 58to run it will be added to the front of the queue for expedited handling. 59 60#### Priority Management 61 62There are three different factors used to determine the effective 63priority of a thread, the effective priority being what is used to 64determine which queue it will be in. 65 66The first factor is the base priority, which is simply the thread’s 67requested priority. There are currently 32 levels with 0 being the 68lowest and 31 being the highest. 69 70The second factor is the priority boost. This is a value bounded between 71\[-MAX_PRIORITY_ADJ, MAX_PRIORITY_ADJ\] used to offset the base priority, it is 72modified by the following cases: 73 74- When a thread is unblocked, after waiting on a shared resource or 75 sleeping, it is given a one point boost. 76 77- When a thread yields (volunteers to give up control), or volunteers 78 to reschedule, its boost is decremented by one but is capped at 0 79 (won’t go negative). 80 81- When a thread is preempted and has used up its entire timeslice, its 82 boost is decremented by one but is able to go negative. 83 84The third factor is its inherited priority. If the thread is in control 85of a shared resource and it is blocking another thread of a higher 86priority then it is given a temporary boost up to that thread’s priority 87to allow it to finish quickly and allow the higher priority thread to 88resume. 89 90The effective priority of the thread is either the inherited priority, 91if it has one, or the base priority plus its boost. When this priority 92changes, due to any of the factors changing, the scheduler will move it 93to a new priority queue and reschedule the CPU. Allowing it to have 94control if it is now the highest priority task, or relinquish control if 95it is no longer highest. 96 97The intent in this system is to ensure that interactive threads are 98serviced quickly. These are usually the threads that interact directly 99with the user and cause user-perceivable latency. These threads usually 100do little work and spend most of their time blocked awaiting another 101user event. So they get the priority boost from unblocking while 102background threads that do most of the processing receive the priority 103penalty for using their entire timeslice. 104 105#### CPU Assignment and Migration 106 107Threads are able to request which CPUs on which they wish to run using a 108CPU affinity mask, a 32 bit mask where 0b001 is CPU 1, 0b100 is CPU 3, 109and 0b101 is either CPU 1 or CPU 3. This mask is usually respected but 110if the CPUs it requests are all inactive it will be assigned to another 111CPU. Also notable, if it is “pinned” to a CPU, that is its mask contains 112only one CPU, and that CPU becomes inactive the thread will sit 113unserviced until that CPU becomes active again. See [*CPU 114Activation*](#cpu-activation) below for details. 115 116When selecting a CPU for a thread the scheduler will choose, in order: 117 1181. The CPU doing the selection, if it is **idle** and in the affinity mask. 119 1202. The CPU the thread last ran on, if it is **idle** and in the affinity mask. 121 1223. Any **idle** CPU in the affinity mask. 123 1244. The CPU the thread last ran on, if it is active and in the affinity mask. 125 1265. The CPU doing the selection, if it is the only one in the affinity mask or 127 all cpus in the mask are not active. 128 1296. Any active CPU in the affinity mask. 130 131If the thread is running on a CPU not in its affinity mask (due to case 1325 above) the scheduler will try to rectify this every time the thread is 133preempted, yields, or voluntarily reschedules. Also if the thread 134changes its affinity mask the scheduler may migrate it. 135 136Every time a thread comes back from waiting on a shared resource or 137sleeping and needs to be assigned a priority queue, the scheduler will 138re-evaluate its CPU choice for the thread, using the above logic, and 139may move it. 140 141#### CPU Activation 142 143When a CPU is being deactivated, that is shutdown and removed from the 144system, the scheduler will transition all running threads onto other 145CPUs. The only exception is threads that are “pinned”, that is they only 146have the deactivating CPU in their affinity mask, these threads are put 147back into the run queue where they will sit unserviced until the CPU is 148reactivated. 149 150When a CPU is reactivated it will service the waiting pinned threads and 151threads that are running on non-Affinity CPUs should be migrated back 152pretty quickly by their CPUs scheduler due to the above rules. There is 153no active rebalancing of threads to the newly awakened CPU, but as it 154should be idle more often, it should see some migration due to the logic 155laid out above in [*CPU Assignment and 156Migration*](#cpu-assignment-and-migration). 157 158#### Realtime and Idle Threads 159 160These are special threads that are treated a little differently. 161 162The idle thread runs when no other threads are runnable. There is one on 163each CPU and it lives outside of the priority queues, but effectively in 164a priority queue of -1. It is used to track idle time and can be used by 165platform implementations for a low power wait mode. 166 167Realtime threads (marked with `THREAD_FLAG_REAL_TIME`) are allowed to run without 168preemption and will run until they block, yield, or manually reschedule. 169