1# Runtime Lock Validation in Zircon
2
3## Introduction
4
5Zircon integrates a runtime lock validator to diagnose inconsistent lock
6ordering that could lead to deadlocks. This document discusses how the
7validator is integrated, how to enable and tune the validator at build time,
8and what output the validator produces.
9
10The theory of operation for the validator itself can be found in the
11[design document](lockdep-design.md).
12
13## Enabling the Lock Validator
14
15Lock validation is disabled by default. **When disabled the lock instrumentation
16is transparent, acting as a zero-overhead wrapper for the underlying locking
17primitives**.
18
19The validator is enabled at compile time by setting the make variable
20`ENABLE_LOCK_DEP` to true. As of this writing logic for this variable is
21handled by [make/engine.mk](../make/engine.mk).
22
23You can set this variable in your `local.mk` like this:
24
25```makefile
26# local.mk
27ENABLE_LOCK_DEP := true
28```
29
30When the lock validator is enabled a set of global lock-free, wait-free data
31structures are generated to track the relationships between the instrumented
32locks; the acquire/release operations of the locks are augmented to update
33these data structures.
34
35## Lock Instrumentation
36
37The current incarnation of the runtime lock validator requires manually
38instrumenting each lock in kernel with a wrapper type. The wrapper type provides
39the context the validator needs to properly identify the lock and generate a
40global tracking structure for locks with the same context or role.
41
42The kernel defines utility macros for this purpose in `kernel/spinlock.h` and
43`kernel/mutex.h`.
44
45### Member Locks
46
47A type with a lock member like this:
48
49```C++
50#include <kernel/mutex.h>
51
52class MyType {
53public:
54	// ...
55private:
56	mutable fbl::Mutex lock_;
57	// ...
58};
59```
60
61May be instrumented like this:
62
63```C++
64#include <kernel/mutex.h>
65
66class MyType {
67public:
68	// ...
69private:
70	mutable DECLARE_MUTEX(MyType) lock_;
71	// ...
72};
73```
74
75Note that the containing type is passed to the macro
76`DECLARE_MUTEX(containing_type)`. This type provides the context the validator
77needs to distinguish locks that are members of `MyType` from locks that are
78members of other types.
79
80The macro `DECLARE_SPINLOCK(containing_type)` provides similar support for
81instrumenting `SpinLock` members.
82
83For those who are curious, the macro in the example above expands to this type
84expression: `::lockdep::LockDep<containing_type, fbl::Mutex, __LINE__>`. This
85expression results in a unique instantiation of the `lockdep::LockDep<>` type,
86both across different containing types, and within a containing type where
87there is more than one mutex.
88
89### Global Locks
90
91Global locks are instrumented using a singleton-type pattern. The kernel defines
92utility macros for this purpose in `kernel/mutex.h` and `kernel/spinlock.h`.
93
94In Zircon global locks are typically defined either at global/namespace scope or
95within another type as a static member.
96
97example.h:
98```C++
99#include <kernel/mutex.h>
100
101extern fbl::Mutex a_global_lock;
102
103class MyType {
104public:
105	// ...
106private:
107	static fbl::Mutex all_objects_lock_;
108};
109```
110
111example.cpp:
112```C++
113#include "example.h"
114
115fbl::Mutext a_global_lock;
116
117fbl::Mutext MyType::all_objects_lock_;
118```
119
120The instrumentation simplifies declaring locks by declaring singleton types that
121may be used in either scope and handles ODR-use automatically.
122
123example.h:
124```
125#include <kernel/mutex.h>
126
127DECLARE_SINGLETON_MUTEX(AGlobalLock);
128
129class MyType {
130public:
131	// ...
132private:
133	DECLARE_SINGLETON_MUTEX(AllObjectsLock);
134};
135```
136
137These macro invocations declare new singleton types, `AGlobalLock` and
138`MyType::AllObjectsLock` respectively. These types have a static `Get()` method
139that returns the underlying global lock with all of the necessary
140instrumentation. Note that there is no need to separately define storage for the
141locks, this is handled automatically by the supporting template types.
142
143The macro `DECLARE_SINGLETON_SPINLOCK(name)` provides similar support for
144declaring a global `SpinLock`.
145
146### Lock Guards
147
148Instrumented locks are acquired and released using the scoped capability types
149`Guard` and `GuardMultiple`. In the kernel these types are defined in
150`kernel/lockdep.h`.
151
152The operation of `Guard` for simple mutexes is similar to `AutoLock`:
153
154```C++
155#include <kernel/mutex.h>
156
157class MyType {
158public:
159	// ...
160
161	int GetData() const {
162		Guard<fbl::Mutex> guard{&lock_};
163		return data_;
164	}
165
166	int DoSomething() {
167		Guard<fbl::Mutex> guard{&lock_};
168		int data_copy = data_;
169		guard.Release();
170
171		return DoWorkUnlocked(data_copy);
172	}
173
174private:
175	mutable DECLARE_MUTEX(MyType) lock_;
176	int data_{0} TA_GUARDED(lock_);
177};
178```
179
180`SpinLock` types require an additional template argument to `Guard` to select
181one of a few possible options when acquiring the lock: `IrqSave`, `NoIrqSave`,
182and `TryLockNoIrqSave`. Omitting one of these type tags results in a
183compile-time error.
184
185```C++
186#include <kernel/spinlock.h>
187
188class MyType {
189public:
190	// ...
191
192	int GetData() const {
193		Guard<SpinLock, IrqSave> guard{&lock_};
194		return data_;
195	}
196
197	void DoSomethingInIrqContext() {
198		Guard<SpinLock, NoIrqSave> guard{&lock_};
199		// ...
200	}
201
202	bool TryToDoSomethingInIrqContext() {
203		if (Guard<SpinLock, TryLockNoIrqSave> guard{&lock_}) {
204			// ...
205			return true;
206		}
207		return false;
208	}
209
210private:
211	mutable DECLARE_SPINLOCK(MyType) lock_;
212	int data_{0} TA_GUARDED(lock_);
213};
214```
215
216Instrumented global locks work similarly:
217
218```C++
219#include <kernel/mutex.h>
220#include <fbl/intrusive_double_list.h>
221
222class MyType : public fbl::DoublyLinkedListable<MyType> {
223public:
224	// ...
225
226	void AddToList(MyType* object) {
227		Guard<fbl::Mutex> guard{AllObjectsLock::Get()};
228		all_objects_list_.push_back(*object);
229	}
230
231private:
232	DECLARE_SINGLETON_MUTEX(AllObjectsLock);
233	fbl::DoublyLinkedList<MyType> all_objects_list_ TA_GUARDED(AllObjectsLock::Get());
234};
235```
236
237Note that instrumented locks do not have manual `Acquire()` and `Release()`
238methods; using a `Guard` is the only way to acquire the locks directly. There
239are two important reasons for this:
240
2411. Manual acquire/release operations are more error prone than guard, plus
242   manual release when necessary.
2432. When lock validation is enabled the guard provides the storage that the
244   validator uses to account for actively held locks. This approach permits
245   temporary storage of validator state on the stack only for the duration the
246   lock is held, which corresponds with the use patterns of guard objects.
247   Without this approach the tracking data would either have to be stored with
248   each lock instance, increasing memory use even when locks are not held, or
249   stored in heap allocated memory. Neither of these alternatives is desirable.
250
251In rare circumstances the underlying lock may be accessed using the `lock()`
252accessor of the instrumented lock. This should be done with care as manipulating
253the underlying lock directly may result inconsistency between the state of the
254lock and the state the lock validator; at best this may lead to missing a lock
255order warning and at worst may lead to a deadlock. **You have been warned!**
256
257## Clang Static Analysis and Instrumented Locks
258
259The lock instrumentation is designed to interoperate with Clang static lock
260analysis. In general usage, an instrumented lock may be used as a "mutex"
261capability and specified in any of the static lock annotations.
262
263There are two special cases that need some extra attention:
264
2651. Returning pointers or references to capabilities.
2662. Unlocking a guard passed by reference.
267
268### Pointers and References to Capabilities
269
270When returning a lock by pointer or reference it may be convenient or necessary
271to use a uniform type. Recall from earlier that instrumented locks are wrapped
272in a type that captures the containing type, the underlying lock type, and the
273line number to disambiguate locks belonging to different types
274(`::lockdep::LockDep<Class, Locktype, Index>`). This can lead to difficulty when
275returning a lock from a uniform (virtual) interface (e.g. kernel
276`Dispatcher::get_lock()`).
277
278Fortunately there is a straightforward solution: every instrumented lock is also
279a subclass of `::lockdep::Lock<LockType>` (or simply `Lock<LockType>` in the
280kernel). This type only depends on the underlying `LockType`, not the context in
281which the instrumented lock is declared, making it convenient to use as a
282pointer or reference type to refer to an instrumented lock more generically.
283This type may be used in type annotations as well.
284
285The following illustrates the pattern, which is similar to that employed by the
286kernel `Dispatcher` types.
287
288```C++
289#include <kernel/mutex.h>
290
291
292struct LockableInterface {
293	virtual ~LockableInterface() {}
294	virtual Lock<fbl::Mutex>* get_lock() = 0;
295	virtual void DoSomethingLocked() TA_REQ(get_lock()) = 0;
296};
297
298class A : public LockableInterface {
299public:
300	Lock<fbl::Mutex>* get_lock() override { return &lock_; }
301	void DoSomethingLocked() override {
302		data_++;
303	}
304	void DoSomething() {
305		Guard<fbl::Mutex> guard{get_lock()};
306		DoSomethingLocked();
307		// ...
308	}
309private:
310	mutable DECLARE_MUTEX(A) lock_;
311	int data_ TA_GUARDED(get_lock());
312};
313
314class B : public LockableInterface {
315public:
316	Lock<fbl::Mutex>* get_lock() override { return &lock_; }
317	void DoSomethingLocked() override {
318		// ...
319	}
320	void DoSomething() {
321		Guard<fbl::Mutex> guard{get_lock()};
322		DoSomethingLocked();
323		// ...
324	}
325private:
326	mutable DECLARE_MUTEX(B) lock_;
327	char data_[32] TA_GUARDED(get_lock());
328};
329```
330
331Note that the type of `A::lock_` is
332`::lockdep::LockDep<A, fbl::Mutex, __LINE__>` and the type of `B::lock_` is
333`::lockdep::LockDep<B, fbl::Mutex, __LINE__>`. However, both of these types are
334subclasses of `Lock<fbl::Mutex>`, so we can treat them uniformly as this type in
335pointer and reference expressions.
336
337While this is very convenient, a limitation in Clang static analysis prevents it
338from understanding that `LockableInterface::get_lock()` is equivalent to
339`A::lock_` or `B::lock_`, even in their local contexts. For this reason is it
340necessary to use `get_lock()` in all of the lock annotations.
341
342### Unlocking a Guard Passed by Reference
343
344In very rare circumstances it is useful to release a `Guard` instance held in
345a function from a callee of the function.
346
347**TODO(eieio): Complete documentation of this feature.**
348
349## Lock Validation Errors
350
351The lock validator detects and reports two broad classes of violations:
352
3531. Pair-wise violations reported at the point of acquisition.
3542. Multi-lock cycles reported asynchronously by a dedicated loop detection
355   thread.
356
357### Violations Reported at Acquisition
358
359When a violation is detected at the point of lock acquisition the validator
360produces a message like the following in the kernel log:
361
362```
363[00000.817] 04704.04716> ZIRCON KERNEL PANIC
364[00000.817] 04704.04716> Lock validation failed for thread 0xffffff800a5ffa98 pid 4704 tid 4716 (thermd:initial-thread):
365[00000.817] 04704.04716> Reason: Out Of Order
366[00000.817] 04704.04716> Bad lock: name=lockdep::LockClass<SoloDispatcher<PortDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0> order=0
367[00000.817] 04704.04716> Conflict: name=lockdep::LockClass<VmObject, fbl::Mutex, 249, (lockdep::LockFlags)0> order=0
368[00000.817] 04704.04716> caller=0xffffffff00190837 frame=0xffffff98717f0970
369[00000.817] 04704.04716> BUILDID git-ce892d1b03c1a56799fb604d1d6303bb7b16e75a
370[00000.817] 04704.04716> dso: id=3ebe31f2ce250453f1210662d6f9d16e2595b9b8 base=0xffffffff00100000 name=zircon.elf
371[00000.817] 04704.04716> bt#00: 0xffffffff00190837
372[00000.817] 04704.04716> bt#01: 0xffffffff00163883
373[00000.817] 04704.04716> bt#02: 0xffffffff00165e58
374[00000.817] 04704.04716> bt#03: 0xffffffff0022a1d8
375[00000.817] 04704.04716> bt#04: 0xffffffff0022b759
376[00000.817] 04704.04716> bt#05: 0xffffffff00229eca
377[00000.817] 04704.04716> bt#06: 0xffffffff0022b759
378[00000.817] 04704.04716> bt#07: 0xffffffff00222787
379[00000.817] 04704.04716> bt#08: 0xffffffff00211f0c
380[00000.817] 04704.04716> bt#09: 0xffffffff0021d9d3
381[00000.817] 04704.04716> bt#10: 0xffffffff0019a059
382[00000.817] 04704.04716> bt#11: 0xffffffff0019ab8b
383[00000.817] 04704.04716> bt#12: 0xffffffff001a9448
384[00000.817] 04704.04716> bt#13: 0xffffffff0013f503
385[00000.817] 04704.04716> bt#14: 0xffffffff001af8be
386[00000.817] 04704.04716> bt#15: 0xffffffff001995e3
387[00000.817] 04704.04716> bt#16: end
388[00000.817] 04704.04716>
389```
390
391Although this is reported as a panic (required wording for `fx symbolize` to
392recognize the kernel stack trace) the error is informational and non-fatal. The
393first line identifies the thread and process where the kernel lock violation
394occurred. The next line identifies the type of violation. The next two lines
395identify which locks were found to be inconsistent with previous observations;
396the "Bad lock" is the lock that is about to be acquired, while "Conflict" is
397a lock that is already held by the current context and is the point of
398inconsistency with the lock that is about to be acquired. All of the lines
399following this are part of the stack trace leading up to the bad lock.
400
401### Multi-Lock Cycles
402
403Circular dependencies between three or more locks are detected with a dedicated
404loop detection thread. Because this detection happens in a separate context from
405the lock operations that caused the cycle a stack trace is not provided.
406
407Reports from the loop detection thread look like this:
408
409```
410[00002.000] 00000.00000> ZIRCON KERNEL OOPS
411[00002.000] 00000.00000> Circular lock dependency detected:
412[00002.000] 00000.00000>   lockdep::LockClass<VmObject, fbl::Mutex, 249, (lockdep::LockFlags)0>
413[00002.000] 00000.00000>   lockdep::LockClass<VmAspace, fbl::Mutex, 198, (lockdep::LockFlags)0>
414[00002.000] 00000.00000>   lockdep::LockClass<SoloDispatcher<VmObjectDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0>
415[00002.000] 00000.00000>   lockdep::LockClass<SoloDispatcher<PortDispatcher>, fbl::Mutex, 362, (lockdep::LockFlags)0>
416```
417
418Each of the locks involved in the cycle are reported in a group. Frequently only
419two of the circularly-dependent locks are acquired by a single thread at any
420given time, making manual detection difficult or impossible. However, the
421potential for deadlock between three or more threads is real and should be
422addressed for long-term system stability.
423
424## Kernel Commands
425
426When the lock validator is enabled the following kernel commands are available:
427
428* `k lockdep dump` - dumps the dependency graph and connected sets (loops) for
429  all instrumented locks.
430* `k lockdep loop` - triggers a loop detection pass and reports any loops found
431  to the kernel log.
432