1.. SPDX-License-Identifier: BSD-3-Clause 2.. SPDX-FileCopyrightText: Copyright TF-RMM Contributors. 3 4.. _locking_rmm: 5 6RMM Locking Guidelines 7========================= 8 9This document outlines the locking requirements, discusses the implementation 10and provides guidelines for a deadlock free |RMM| implementation. Further, the 11document hitherto is based upon |RMM| Alpha-05 specification and is expected to 12change as the implementation proceeds. 13 14.. _locking_intro: 15 16Introduction 17------------- 18In order to meet the requirement for the |RMM| to be small, simple to reason 19about, and to co-exist with contemporary hypervisors which are already 20designed to manage system memory, the |RMM| does not include a memory allocator. 21It instead relies on an untrusted caller providing granules of memory used 22to hold both meta data to manage realms as well as code and data for realms. 23 24To maintain confidentiality and integrity of these granules, the |RMM| 25implements memory access controls by maintaining awareness of the state of each 26granule (aka Granule State, ref :ref:`locking_impl`) and enforcing rules on how 27memory granules can transition from one state to another and how a granule can 28be used depending on its state. For example, all granules that can be accessed 29by software outside the |PAR| of a realm are in a specific state, and a granule 30that holds meta data for a realm is in another specific state that prevents it 31from being used as data in a realm and accidentally corrupted by a realm, which 32could lead to internal failure in the |RMM|. 33 34Due to this complex nature of the operations supported by the |RMM|, for example 35when managing page tables for realms, the |RMM| must be able to hold locks on 36multiple objects at the same time. It is a well known fact that holding multiple 37locks at the same time can easily lead to deadlocking the system, as for example 38illustrated by the dining philosophers problem [EWD310]_. In traditional 39operating systems software such issues are avoided by defining a partial order 40on all system objects and always acquiring a lower-ordered object before a 41higher-ordered object. This solution was shown to be correct by Dijkstra 42[EWD625]_. Solutions are typically obtained by assigning an arbitrary order 43based upon certain attributes of the objects, for example by using the memory 44address of the object. 45 46Unfortunately, software such as the |RMM| cannot use these methods directly 47because the |RMM| receives an opaque pointer from the untrusted caller and it 48cannot know before locking the object if it is indeed of the expected state. 49Furthermore, MMU page tables are hierarchical data structures and operations on 50the page tables typically must be able to locate a leaf node in the hierarchy 51based on single value (a virtual address) and therefore must walk the page 52tables in their hierarchical order. This implies an order of objects in the same 53Granule State which is not known by a process executing in the |RMM| before 54holding at least one lock on object in the page table hierarchy. An obvious 55solution to these problems would be to use a single global lock for the |RMM|, 56but that would serialize all operations across all shared data structures in the 57system and severely impact performance. 58 59 60.. _locking_reqs: 61 62Requirements 63------------- 64 65To address the synchronization needs of the |RMM| described above, we must 66employ locking and lock-free mechanisms which satisfies a number of properties. 67These are discussed below: 68 69Critical Section 70***************** 71 72A critical section can be defined as a section of code within a process that 73requires access to shared resources and that must not be executed while 74another process is in a corresponding section of code [WS2001]_. 75 76Further, access to shared resources without appropriate synchronization can lead 77to **race conditions**, which can be defined as a situation in which multiple 78threads or processes read and write a shared item and the final result depends 79on the relative timing of their execution [WS2001]_. 80 81In terms of |RMM|, an access to a shared resource can be considered as a list 82of operations/instructions in program order that either reads from or writes to 83a shared memory location (e.g. the granule data structure or the memory granule 84described by the granule data structure, ref :ref:`locking_impl`). It is also 85understood that this list of operations does not execute indefinitely, but 86eventually terminates. 87 88We can now define our desired properties as follows: 89 90Mutual Exclusion 91***************** 92 93Mutual exclusion can be defined as the requirement that when one process is in a 94critical section that accesses shared resources, no other process may be in a 95critical section that accesses any of those shared resources [WS2001]_. 96 97The following example illustrates how an implementation might enforce mutual 98exclusion of critical sections using a lock on a valid granule data structure 99`struct granule *a`: 100 101.. code-block:: C 102 103 struct granule *a; 104 bool r; 105 106 r = try_lock(a); 107 if (!r) { 108 return -ERROR; 109 } 110 critical_section(a); 111 unlock(a); 112 other_work(); 113 114We note that a process might fail to perform the `lock` operation on object `a` 115and return an error or successfully acquire the lock, execute the 116`critical_section()`, `unlock()` and then continue to make forward progress to 117`other_work()` function. 118 119Deadlock Avoidance 120******************* 121 122A deadlock can be defined as a situation in which two or more processes are 123unable to proceed because each is waiting for one of the others to do something 124[WS2001]_. 125 126In other words, one or more processes are trying to enter their critical 127sections but none of them make forward progress. 128 129We can then define the deadlock avoidance property as the inverse scenario: 130 131When one or more processes are trying to enter their critical sections, at least 132one of them makes forward progress. 133 134A deadlock is a fatal event if it occurs in supervisory software such as the 135|RMM|. This must be avoided as it can render the system vulnerable to exploits 136and/or unresponsive which may lead to data loss, interrupted service and 137eventually economic loss. 138 139Starvation Avoidance 140********************* 141 142Starvation can be defined as a situation in which a runnable process is 143overlooked indefinitely by the scheduler; although it is able to proceed, it is 144never chosen [WS2001]_. 145 146Then starvation avoidance can be defined as, all processes that are trying to 147enter their critical sections eventually make forward progress. 148 149Starvation must be avoided, because if one or more processes do not make forward 150progress, the PE on which the process runs will not perform useful work and 151will be lost to the user, resulting in similar issues like a deadlocked system. 152 153Nested Critical Sections 154************************* 155 156A critical section for an object may be nested within the critical section for 157another object for the same process. In other words, a process may enter more 158than one critical section at the same time. 159 160For example, if the |RMM| needs to copy data from one granule to another 161granule, and must be sure that both granules can only be modified by the process 162itself, it may be implemented in the following way: 163 164.. code-block:: C 165 166 struct granule *a; 167 struct granule *b; 168 bool r; 169 170 r = try_lock(a); 171 if (!r) { 172 return -ERROR; 173 } 174 175 /* critical section for granule a -- ENTER */ 176 177 r = try_lock(b); 178 if (r) { 179 /* critical section for granule b -- ENTER */ 180 b->foo = a->foo; 181 /* critical section for granule b -- EXIT */ 182 unlock(b); 183 } 184 185 /* critical section for granule a -- EXIT */ 186 unlock(a); 187 188.. _locking_impl: 189 190Implementation 191--------------- 192 193The |RMM| maintains granule states by defining a data structure for each 194memory granule in the system. Conceptually, the data structure contains the 195following fields: 196 197* Granule State 198* Lock 199* Reference Count 200 201The Lock field provides mutual exclusion of processes executing in their 202critical sections which may access the shared granule data structure and the 203shared meta data which may be stored in the memory granule which is in one of 204the |RD|, |REC|, and Table states. Both the data structure describing 205the memory granule and the contents of the memory granule itself can be accessed 206by multiple PEs concurrently and we therefore require some concurrency protocol 207to avoid corruption of shared data structures. An alternative to using a lock 208providing mutual exclusion would be to design all operations that access shared 209data structures as lock-free algorithms, but due to the complexity of the data 210structures and the operation of the |RMM| we consider this too difficult to 211accomplish in practice. 212 213The Reference Count field is used to keep track of references between granules. 214For example, an |RD| describes a realm, and a |REC| describes an execution 215context within that realm, and therefore an |RD| must always exist when a |REC| 216exists. To prevent the |RMM| from destroying an |RD| while a |REC| still exists, 217the |RMM| holds a reference count on the |RD| for each |REC| associated with the 218same realm, and only when the all the RECs in a realm have been destroyed and 219the reference count on an |RD| drops to zero, can the |RD| be destroyed and the 220granule be repurposed for other use. 221 222Based on the above, we now describe the Granule State field and the current 223locking/refcount implementation: 224 225* **UnDelegated:** These are granules for which |RMM| does not prevent the |PAS| 226 of the granule from being changed by another agent to any value. 227 In this state, the granule content access is not protected by granule::lock, 228 as it is always subject to reads and writes from Non-Realm worlds. 229 230* **Delegated:** These are granules with memory only accessible by the |RMM|. 231 The granule content is protected by granule::lock. No reference counts are 232 held on this granule state. 233 234* **Realm Descriptor (RD):** These are granules containing meta data describing 235 a realm, and only accessible by the |RMM|. Granule content access is protected 236 by granule::lock. A reference count is also held on this granule for each 237 associated |REC| granule. 238 239* **Realm Execution Context (REC):** These are granules containing meta data 240 describing a virtual PE running in a realm, and are only accessible by the 241 |RMM|. The execution content access is not protected by granule::lock, because 242 we cannot enter a realm while holding the lock. Further, the following rules 243 apply with respect to the granule's reference counts: 244 245 - A reference count is held on this granule when a |REC| is running. 246 247 - As |REC| cannot be run on two PEs at the same time, the maximum value 248 of the reference count is one. 249 250 - When the |REC| is entered, the reference count is incremented 251 (set to 1) atomically while granule::lock is held. 252 253 - When the |REC| exits, the reference counter is released (set to 0) 254 atomically with store-release semantics without granule::lock being 255 held. 256 257 - The |RMM| can access the granule's content on the entry and exit path 258 from the |REC| while the reference is held. 259 260* **Translation Table:** These are granules containing meta data describing 261 virtual to physical address translation for the realm, accessible by the |RMM| 262 and the hardware Memory Management Unit (MMU). Granule content access is 263 protected by granule::lock, but hardware translation table walks may read the 264 RTT at any point in time. Multiple granules in this state can only be locked 265 at the same time if they are part of the same tree, and only in topological 266 order from root to leaf. The topological order of concatenated root level RTTs 267 is from the lowest address to the highest address. The complete internal 268 locking order for RTT granules is: RD -> [RTT] -> ... -> RTT. A reference 269 count is held on this granule for each entry in the RTT that refers to a 270 granule: 271 272 - Table s2tte. 273 274 - Valid s2tte. 275 276 - Valid_NS s2tte. 277 278 - Assigned s2tte. 279 280* **Data:** These are granules containing realm data, accessible by the |RMM| 281 and by the realm to which it belongs. Granule content access is not protected 282 by granule::lock, as it is always subject to reads and writes from within a 283 realm. A granule in this state is always referenced from exactly one entry in 284 an RTT granule which must be locked before locking this granule. Only a single 285 DATA granule can be locked at a time on a given PE. The complete internal 286 locking order for DATA granules is: RD -> RTT -> RTT -> ... -> DATA. 287 No reference counts are held on this granule type. 288 289 290Locking 291******** 292 293The |RMM| uses spinlocks along with the object state for locking implementation. 294The lock provides similar exclusive acquire semantics known from trivial 295spinlock implementations, however also allows verification of whether the locked 296object is of an expected state. 297 298The data structure for the spinlock can be described in C as follows: 299 300.. code-block:: C 301 302 typedef struct { 303 unsigned int val; 304 } spinlock_t; 305 306This data structure can be embedded in any object that requires synchronization 307of access, such as the `struct granule` described above. 308 309The following operations are defined on spinlocks: 310 311.. code-block:: C 312 :caption: **Typical spinlock operations** 313 314 /* 315 * Locks a spinlock with acquire memory ordering semantics or goes into 316 * a tight loop (spins) and repeatedly checks the lock variable 317 * atomically until it becomes available. 318 */ 319 void spinlock_acquire(spinlock_t *l); 320 321 /* 322 * Unlocks a spinlock with release memory ordering semantics. Must only 323 * be called if the calling PE already holds the lock. 324 */ 325 void spinlock_release(spinlock_t *l); 326 327 328The above functions should not be directly used for locking/unlocking granules, 329instead the following should be used: 330 331.. code-block:: C 332 :caption: **Granule locking operations** 333 334 /* 335 * Acquires a lock (or spins until the lock is available), then checks 336 * if the granule is in the `expected_state`. If the `expected_state` 337 * is matched, then returns `true`. Otherwise, releases the lock and 338 * returns `false`. 339 */ 340 bool granule_lock_on_state_match(struct granule *g, 341 enum granule_state expected_state); 342 343 /* 344 * Used when we're certain of the state of an object (e.g. because we 345 * hold a reference to it) or when locking objects whose reference is 346 * obtained from another object, after that objects is locked. 347 */ 348 void granule_lock(struct granule *g, 349 enum granule_state expected_state); 350 351 /* 352 * Obtains a pointer to a locked granule at `addr` if `addr` is a valid 353 * granule physical address and the state of the granule at `addr` is 354 * `expected_state`. 355 */ 356 struct granule *find_lock_granule(unsigned long addr, 357 enum granule_state expected_state); 358 359 /* Find two granules and lock them in order of their address. */ 360 return_code_t find_lock_two_granules(unsigned long addr1, 361 enum granule_state expected_state1, 362 struct granule **g1, 363 unsigned long addr2, 364 enum granule_state expected_state2, 365 struct granule **g2); 366 367 /* 368 * Obtain a pointer to a locked granule at `addr` which is unused 369 * (refcount = 0), if `addr` is a valid granule physical address and the 370 * state of the granule at `addr` is `expected_state`. 371 */ 372 struct granule *find_lock_unused_granule(unsigned long addr, 373 enum granule_state 374 expected_state); 375 376.. code-block:: C 377 :caption: **Granule unlocking operations** 378 379 /* 380 * Release a spinlock held on a granule. Must only be called if the 381 * calling PE already holds the lock. 382 */ 383 void granule_unlock(struct granule *g); 384 385 /* 386 * Sets the state and releases a spinlock held on a granule. Must only 387 * be called if the calling PE already holds the lock. 388 */ 389 void granule_unlock_transition(struct granule *g, 390 enum granule_state new_state); 391 392 393Reference Counting 394******************* 395 396The reference count is implemented using the **refcount** variable within the 397granule structure to keep track of the references in between granules. For 398example, the refcount is used to prevent changes to the attributes of a parent 399granule which is referenced by child granules, ie. a parent with refcount not 400equal to zero. 401 402Race conditions on the refcount variable are avoided by either locking the 403granule before accessing the variable or by lock-free mechanisms such as 404Single-Copy Atomic operations along with ARM weakly ordered 405ACQUIRE/RELEASE/RELAXED memory semantics to synchronize shared resources. 406 407The following operations are defined on refcount: 408 409.. code-block:: C 410 :caption: **Read a refcount value** 411 412 /* 413 * Single-copy atomic read of refcount variable with RELAXED memory 414 * ordering semantics. Use this function if lock-free access to the 415 * refcount is required with relaxed memory ordering constraints applied 416 * at that point. 417 */ 418 unsigned long granule_refcount_read_relaxed(struct granule *g); 419 420 /* 421 * Single-copy atomic read of refcount variable with ACQUIRE memory 422 * ordering semantics. Use this function if lock-free access to the 423 * refcount is required with acquire memory ordering constraints applied 424 * at that point. 425 */ 426 unsigned long granule_refcount_read_acquire(struct granule *g); 427 428.. code-block:: C 429 :caption: **Increment a refcount value** 430 431 /* 432 * Increments the granule refcount. Must be called with the granule 433 * lock held. 434 */ 435 void __granule_get(struct granule *g); 436 437 /* 438 * Increments the granule refcount by `val`. Must be called with the 439 * granule lock held. 440 */ 441 void __granule_refcount_inc(struct granule *g, unsigned long val); 442 443 /* Atomically increments the reference counter of the granule.*/ 444 void atomic_granule_get(struct granule *g); 445 446 447.. code-block:: C 448 :caption: **Decrement a refcount value** 449 450 /* 451 * Decrements the granule refcount. Must be called with the granule 452 * lock held. 453 */ 454 void __granule_put(struct granule *g); 455 456 /* 457 * Decrements the granule refcount by `val`. Asserts if refcount can 458 * become negative. Must be called with the granule lock held. 459 */ 460 void __granule_refcount_dec(struct granule *g, unsigned long val); 461 462 /* Atomically decrements the reference counter of the granule. */ 463 void atomic_granule_put(struct granule *g); 464 465 /* 466 * Atomically decrements the reference counter of the granule. Stores to 467 * memory with RELEASE semantics. 468 */ 469 void atomic_granule_put_release(struct granule *g); 470 471.. code-block:: C 472 :caption: **Directly access refcount value** 473 474 /* 475 * Directly reads/writes the refcount variable. Must be called with the 476 * granule lock held. 477 */ 478 granule->refcount; 479 480.. _locking_guidelines: 481 482Guidelines 483----------- 484 485In order to meet the :ref:`locking_reqs` discussed above, this section 486stipulates some locking and lock-free algorithm implementation guidelines for 487developers. 488 489Mutual Exclusion 490***************** 491 492The spinlock, acquire/release and atomic operations provide trivial mutual 493exclusion implementations for |RMM|. However, the following general guidelines 494should be taken into consideration: 495 496 - Appropriate deadlock avoidance techniques should be incorporated when 497 using multiple locks. 498 499 - Lock-free access to shared resources should be atomic. 500 501 - Memory ordering constraints should be used prudently to avoid 502 performance degradation. For e.g. on an unlocked granule (e.g. REC), 503 prior to the refcount update, if there are associated memory 504 operations, then the update should be done with release semantics. 505 However, if there are no associated memory accesses to the granule 506 prior to the refcount update then release semantics will not be 507 required. 508 509 510Deadlock Avoidance 511****************** 512 513Deadlock avoidance is provided by defining a partial order on all objects in the 514system where the locking operation will eventually fail if the caller tries to 515acquire a lock of a different state object than expected. This means that no 516two processes will be expected to acquire locks in a different order than the 517defined partial order, and we can rely on the same reasoning for deadlock 518avoidance as shown by Dijkstra [EWD625]_. 519 520To establish this partial order, the objects referenced by |RMM| can be 521classified into two categories: 522 523#. **External**: A granule state belongs to the `external` class iff _any_ 524 parameter in _any_ RMI command is an address of a granule which is expected 525 to be in that state. The following granule states are `external`: 526 527 - GRANULE_STATE_NS 528 - GRANULE_STATE_DELEGATED 529 - GRANULE_STATE_RD 530 - GRANULE_STATE_REC 531 - DEV_GRANULE_STATE_NS 532 - DEV_GRANULE_STATE_DELEGATED 533 534#. **Internal**: A granule state belongs to the `internal` class iff it is not 535 an `external`. These are objects which are referenced from another 536 object after that object is locked. Each `internal` object should be 537 referenced from exactly one place. The following granule states are 538 `internal`: 539 540 - GRANULE_STATE_RTT 541 - GRANULE_STATE_DATA 542 - DEV_GRANULE_STATE_MAPPED 543 544We now state the locking guidelines for |RMM| as: 545 546#. Granules expected to be in an `external` state must be locked before locking 547 any granules in an `internal` state. 548 549#. Granules expected to be in an `external` state must be locked in order of 550 their physical address, starting with the lowest address. 551 552#. Memory granules expected to be in an `external` state must be locked before 553 locking any device memory granules in `external` state. 554 555#. Once a granule expected to be in an `external` state has been locked, its 556 state must be checked against the expected state. If these do not match, the 557 granule must be unlocked and no further granules may be locked within the 558 currently-executing RMM command. 559 560#. Granules in an `internal` state must be locked in order of state: 561 562 - `RTT` 563 - `DATA` 564 565#. Granules in the same `internal` state must be locked in the 566 :ref:`locking_impl` defined order for that specific state. 567 568#. A granule's state can be changed iff the granule is locked and the reference 569 count is zero. 570 571Starvation Avoidance 572******************** 573 574Currently, the lock-free implementation for RMI.REC.Enter provides Starvation 575Avoidance in |RMM|. However, for the locking implementation, Starvation 576Avoidance is yet to be accomplished. This can be added by a ticket or MCS style 577locking implementation [MCS]_. 578 579Nested Critical Sections 580************************ 581 582Spinlocks provide support for nested critical sections. Processes can acquire 583multiple spinlocks at the same time, as long as the locking order is not 584violated. 585 586References 587---------- 588 589.. [EWD310] Dijkstra, E.W. Hierarchical ordering of sequential processes. 590 EWD 310. 591 592.. [EWD625] Dijkstra, E.W. Two starvation free solutions to a general exclusion 593 problem. EWD 625. 594 595.. [MCS] Mellor-Crummey, John M. and Scott, Michael L. Algorithms for scalable 596 synchronization on shared-memory multiprocessors. ACM TOCS, Volume 9, 597 Issue 1, Feb. 1991. 598 599.. [WS2001] Stallings, W. (2001). Operating systems: Internals and design 600 principles. Upper Saddle River, N.J: Prentice Hall. 601