1# Runtime Lock Validation in Zircon 2 3## Introduction 4 5Zircon integrates a runtime lock validator to diagnose inconsistent lock 6ordering that could lead to deadlocks. This document discusses how the 7validator is integrated, how to enable and tune the validator at build time, 8and what output the validator produces. 9 10The theory of operation for the validator itself can be found in the 11[design document](lockdep-design.md). 12 13## Enabling the Lock Validator 14 15Lock validation is disabled by default. **When disabled the lock instrumentation 16is transparent, acting as a zero-overhead wrapper for the underlying locking 17primitives**. 18 19The validator is enabled at compile time by setting the make variable 20`ENABLE_LOCK_DEP` to true. As of this writing logic for this variable is 21handled by [make/engine.mk](../make/engine.mk). 22 23You can set this variable in your `local.mk` like this: 24 25```makefile 26# local.mk 27ENABLE_LOCK_DEP := true 28``` 29 30When the lock validator is enabled a set of global lock-free, wait-free data 31structures are generated to track the relationships between the instrumented 32locks; the acquire/release operations of the locks are augmented to update 33these data structures. 34 35## Lock Instrumentation 36 37The current incarnation of the runtime lock validator requires manually 38instrumenting each lock in kernel with a wrapper type. The wrapper type provides 39the context the validator needs to properly identify the lock and generate a 40global tracking structure for locks with the same context or role. 41 42The kernel defines utility macros for this purpose in `kernel/spinlock.h` and 43`kernel/mutex.h`. 44 45### Member Locks 46 47A type with a lock member like this: 48 49```C++ 50#include <kernel/mutex.h> 51 52class MyType { 53public: 54 // ... 55private: 56 mutable fbl::Mutex lock_; 57 // ... 58}; 59``` 60 61May be instrumented like this: 62 63```C++ 64#include <kernel/mutex.h> 65 66class MyType { 67public: 68 // ... 69private: 70 mutable DECLARE_MUTEX(MyType) lock_; 71 // ... 72}; 73``` 74 75Note that the containing type is passed to the macro 76`DECLARE_MUTEX(containing_type)`. This type provides the context the validator 77needs to distinguish locks that are members of `MyType` from locks that are 78members of other types. 79 80The macro `DECLARE_SPINLOCK(containing_type)` provides similar support for 81instrumenting `SpinLock` members. 82 83For those who are curious, the macro in the example above expands to this type 84expression: `::lockdep::LockDep<containing_type, fbl::Mutex, __LINE__>`. This 85expression results in a unique instantiation of the `lockdep::LockDep<>` type, 86both across different containing types, and within a containing type where 87there is more than one mutex. 88 89### Global Locks 90 91Global locks are instrumented using a singleton-type pattern. The kernel defines 92utility macros for this purpose in `kernel/mutex.h` and `kernel/spinlock.h`. 93 94In Zircon global locks are typically defined either at global/namespace scope or 95within another type as a static member. 96 97example.h: 98```C++ 99#include <kernel/mutex.h> 100 101extern fbl::Mutex a_global_lock; 102 103class MyType { 104public: 105 // ... 106private: 107 static fbl::Mutex all_objects_lock_; 108}; 109``` 110 111example.cpp: 112```C++ 113#include "example.h" 114 115fbl::Mutext a_global_lock; 116 117fbl::Mutext MyType::all_objects_lock_; 118``` 119 120The instrumentation simplifies declaring locks by declaring singleton types that 121may be used in either scope and handles ODR-use automatically. 122 123example.h: 124``` 125#include <kernel/mutex.h> 126 127DECLARE_SINGLETON_MUTEX(AGlobalLock); 128 129class MyType { 130public: 131 // ... 132private: 133 DECLARE_SINGLETON_MUTEX(AllObjectsLock); 134}; 135``` 136 137These macro invocations declare new singleton types, `AGlobalLock` and 138`MyType::AllObjectsLock` respectively. These types have a static `Get()` method 139that returns the underlying global lock with all of the necessary 140instrumentation. Note that there is no need to separately define storage for the 141locks, this is handled automatically by the supporting template types. 142 143The macro `DECLARE_SINGLETON_SPINLOCK(name)` provides similar support for 144declaring a global `SpinLock`. 145 146### Lock Guards 147 148Instrumented locks are acquired and released using the scoped capability types 149`Guard` and `GuardMultiple`. In the kernel these types are defined in 150`kernel/lockdep.h`. 151 152The operation of `Guard` for simple mutexes is similar to `AutoLock`: 153 154```C++ 155#include <kernel/mutex.h> 156 157class MyType { 158public: 159 // ... 160 161 int GetData() const { 162 Guard<fbl::Mutex> guard{&lock_}; 163 return data_; 164 } 165 166 int DoSomething() { 167 Guard<fbl::Mutex> guard{&lock_}; 168 int data_copy = data_; 169 guard.Release(); 170 171 return DoWorkUnlocked(data_copy); 172 } 173 174private: 175 mutable DECLARE_MUTEX(MyType) lock_; 176 int data_{0} TA_GUARDED(lock_); 177}; 178``` 179 180`SpinLock` types require an additional template argument to `Guard` to select 181one of a few possible options when acquiring the lock: `IrqSave`, `NoIrqSave`, 182and `TryLockNoIrqSave`. Omitting one of these type tags results in a 183compile-time error. 184 185```C++ 186#include <kernel/spinlock.h> 187 188class MyType { 189public: 190 // ... 191 192 int GetData() const { 193 Guard<SpinLock, IrqSave> guard{&lock_}; 194 return data_; 195 } 196 197 void DoSomethingInIrqContext() { 198 Guard<SpinLock, NoIrqSave> guard{&lock_}; 199 // ... 200 } 201 202 bool TryToDoSomethingInIrqContext() { 203 if (Guard<SpinLock, TryLockNoIrqSave> guard{&lock_}) { 204 // ... 205 return true; 206 } 207 return false; 208 } 209 210private: 211 mutable DECLARE_SPINLOCK(MyType) lock_; 212 int data_{0} TA_GUARDED(lock_); 213}; 214``` 215 216Instrumented global locks work similarly: 217 218```C++ 219#include <kernel/mutex.h> 220#include <fbl/intrusive_double_list.h> 221 222class MyType : public fbl::DoublyLinkedListable<MyType> { 223public: 224 // ... 225 226 void AddToList(MyType* object) { 227 Guard<fbl::Mutex> guard{AllObjectsLock::Get()}; 228 all_objects_list_.push_back(*object); 229 } 230 231private: 232 DECLARE_SINGLETON_MUTEX(AllObjectsLock); 233 fbl::DoublyLinkedList<MyType> all_objects_list_ TA_GUARDED(AllObjectsLock::Get()); 234}; 235``` 236 237Note that instrumented locks do not have manual `Acquire()` and `Release()` 238methods; using a `Guard` is the only way to acquire the locks directly. There 239are two important reasons for this: 240 2411. Manual acquire/release operations are more error prone than guard, plus 242 manual release when necessary. 2432. When lock validation is enabled the guard provides the storage that the 244 validator uses to account for actively held locks. This approach permits 245 temporary storage of validator state on the stack only for the duration the 246 lock is held, which corresponds with the use patterns of guard objects. 247 Without this approach the tracking data would either have to be stored with 248 each lock instance, increasing memory use even when locks are not held, or 249 stored in heap allocated memory. Neither of these alternatives is desirable. 250 251In rare circumstances the underlying lock may be accessed using the `lock()` 252accessor of the instrumented lock. This should be done with care as manipulating 253the underlying lock directly may result inconsistency between the state of the 254lock and the state the lock validator; at best this may lead to missing a lock 255order warning and at worst may lead to a deadlock. **You have been warned!** 256 257## Clang Static Analysis and Instrumented Locks 258 259The lock instrumentation is designed to interoperate with Clang static lock 260analysis. In general usage, an instrumented lock may be used as a "mutex" 261capability and specified in any of the static lock annotations. 262 263There are two special cases that need some extra attention: 264 2651. Returning pointers or references to capabilities. 2662. Unlocking a guard passed by reference. 267 268### Pointers and References to Capabilities 269 270When returning a lock by pointer or reference it may be convenient or necessary 271to use a uniform type. Recall from earlier that instrumented locks are wrapped 272in a type that captures the containing type, the underlying lock type, and the 273line number to disambiguate locks belonging to different types 274(`::lockdep::LockDep<Class, Locktype, Index>`). This can lead to difficulty when 275returning a lock from a uniform (virtual) interface (e.g. kernel 276`Dispatcher::get_lock()`). 277 278Fortunately there is a straightforward solution: every instrumented lock is also 279a subclass of `::lockdep::Lock<LockType>` (or simply `Lock<LockType>` in the 280kernel). This type only depends on the underlying `LockType`, not the context in 281which the instrumented lock is declared, making it convenient to use as a 282pointer or reference type to refer to an instrumented lock more generically. 283This type may be used in type annotations as well. 284 285The following illustrates the pattern, which is similar to that employed by the 286kernel `Dispatcher` types. 287 288```C++ 289#include <kernel/mutex.h> 290 291 292struct LockableInterface { 293 virtual ~LockableInterface() {} 294 virtual Lock<fbl::Mutex>* get_lock() = 0; 295 virtual void DoSomethingLocked() TA_REQ(get_lock()) = 0; 296}; 297 298class A : public LockableInterface { 299public: 300 Lock<fbl::Mutex>* get_lock() override { return &lock_; } 301 void DoSomethingLocked() override { 302 data_++; 303 } 304 void DoSomething() { 305 Guard<fbl::Mutex> guard{get_lock()}; 306 DoSomethingLocked(); 307 // ... 308 } 309private: 310 mutable DECLARE_MUTEX(A) lock_; 311 int data_ TA_GUARDED(get_lock()); 312}; 313 314class B : public LockableInterface { 315public: 316 Lock<fbl::Mutex>* get_lock() override { return &lock_; } 317 void DoSomethingLocked() override { 318 // ... 319 } 320 void DoSomething() { 321 Guard<fbl::Mutex> guard{get_lock()}; 322 DoSomethingLocked(); 323 // ... 324 } 325private: 326 mutable DECLARE_MUTEX(B) lock_; 327 char data_[32] TA_GUARDED(get_lock()); 328}; 329``` 330 331Note that the type of `A::lock_` is 332`::lockdep::LockDep<A, fbl::Mutex, __LINE__>` and the type of `B::lock_` is 333`::lockdep::LockDep<B, fbl::Mutex, __LINE__>`. However, both of these types are 334subclasses of `Lock<fbl::Mutex>`, so we can treat them uniformly as this type in 335pointer and reference expressions. 336 337While this is very convenient, a limitation in Clang static analysis prevents it 338from understanding that `LockableInterface::get_lock()` is equivalent to 339`A::lock_` or `B::lock_`, even in their local contexts. For this reason is it 340necessary to use `get_lock()` in all of the lock annotations. 341 342### Unlocking a Guard Passed by Reference 343 344In very rare circumstances it is useful to release a `Guard` instance held in 345a function from a callee of the function. 346 347**TODO(eieio): Complete documentation of this feature.** 348 349## Lock Validation Errors 350 351The lock validator detects and reports two broad classes of violations: 352 3531. Pair-wise violations reported at the point of acquisition. 3542. Multi-lock cycles reported asynchronously by a dedicated loop detection 355 thread. 356 357### Violations Reported at Acquisition 358 359When a violation is detected at the point of lock acquisition the validator 360produces a message like the following in the kernel log: 361 362``` 363[00000.817] 04704.04716> ZIRCON KERNEL PANIC 364[00000.817] 04704.04716> Lock validation failed for thread 0xffffff800a5ffa98 pid 4704 tid 4716 (thermd:initial-thread): 365[00000.817] 04704.04716> Reason: Out Of Order 366[00000.817] 04704.04716> Bad lock: name=lockdep::LockClass<SoloDispatcher<PortDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0> order=0 367[00000.817] 04704.04716> Conflict: name=lockdep::LockClass<VmObject, fbl::Mutex, 249, (lockdep::LockFlags)0> order=0 368[00000.817] 04704.04716> caller=0xffffffff00190837 frame=0xffffff98717f0970 369[00000.817] 04704.04716> BUILDID git-ce892d1b03c1a56799fb604d1d6303bb7b16e75a 370[00000.817] 04704.04716> dso: id=3ebe31f2ce250453f1210662d6f9d16e2595b9b8 base=0xffffffff00100000 name=zircon.elf 371[00000.817] 04704.04716> bt#00: 0xffffffff00190837 372[00000.817] 04704.04716> bt#01: 0xffffffff00163883 373[00000.817] 04704.04716> bt#02: 0xffffffff00165e58 374[00000.817] 04704.04716> bt#03: 0xffffffff0022a1d8 375[00000.817] 04704.04716> bt#04: 0xffffffff0022b759 376[00000.817] 04704.04716> bt#05: 0xffffffff00229eca 377[00000.817] 04704.04716> bt#06: 0xffffffff0022b759 378[00000.817] 04704.04716> bt#07: 0xffffffff00222787 379[00000.817] 04704.04716> bt#08: 0xffffffff00211f0c 380[00000.817] 04704.04716> bt#09: 0xffffffff0021d9d3 381[00000.817] 04704.04716> bt#10: 0xffffffff0019a059 382[00000.817] 04704.04716> bt#11: 0xffffffff0019ab8b 383[00000.817] 04704.04716> bt#12: 0xffffffff001a9448 384[00000.817] 04704.04716> bt#13: 0xffffffff0013f503 385[00000.817] 04704.04716> bt#14: 0xffffffff001af8be 386[00000.817] 04704.04716> bt#15: 0xffffffff001995e3 387[00000.817] 04704.04716> bt#16: end 388[00000.817] 04704.04716> 389``` 390 391Although this is reported as a panic (required wording for `fx symbolize` to 392recognize the kernel stack trace) the error is informational and non-fatal. The 393first line identifies the thread and process where the kernel lock violation 394occurred. The next line identifies the type of violation. The next two lines 395identify which locks were found to be inconsistent with previous observations; 396the "Bad lock" is the lock that is about to be acquired, while "Conflict" is 397a lock that is already held by the current context and is the point of 398inconsistency with the lock that is about to be acquired. All of the lines 399following this are part of the stack trace leading up to the bad lock. 400 401### Multi-Lock Cycles 402 403Circular dependencies between three or more locks are detected with a dedicated 404loop detection thread. Because this detection happens in a separate context from 405the lock operations that caused the cycle a stack trace is not provided. 406 407Reports from the loop detection thread look like this: 408 409``` 410[00002.000] 00000.00000> ZIRCON KERNEL OOPS 411[00002.000] 00000.00000> Circular lock dependency detected: 412[00002.000] 00000.00000> lockdep::LockClass<VmObject, fbl::Mutex, 249, (lockdep::LockFlags)0> 413[00002.000] 00000.00000> lockdep::LockClass<VmAspace, fbl::Mutex, 198, (lockdep::LockFlags)0> 414[00002.000] 00000.00000> lockdep::LockClass<SoloDispatcher<VmObjectDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0> 415[00002.000] 00000.00000> lockdep::LockClass<SoloDispatcher<PortDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0> 416``` 417 418Each of the locks involved in the cycle are reported in a group. Frequently only 419two of the circularly-dependent locks are acquired by a single thread at any 420given time, making manual detection difficult or impossible. However, the 421potential for deadlock between three or more threads is real and should be 422addressed for long-term system stability. 423 424## Kernel Commands 425 426When the lock validator is enabled the following kernel commands are available: 427 428* `k lockdep dump` - dumps the dependency graph and connected sets (loops) for 429 all instrumented locks. 430* `k lockdep loop` - triggers a loop detection pass and reports any loops found 431 to the kernel log. 432