1# Exception handling
2
3## Introduction
4
5Exception handling support in Zircon was inspired by similar support in Mach.
6
7Exceptions are mainly used for debugging. Outside of debugging
8one generally uses ["signals"](signals.md).
9Signals are the core Zircon mechanism for observing state changes on
10kernel Objects (a Channel becoming readable, a Process terminating,
11an Event becoming signaled, etc).
12See [Signals](#signals) below.
13
14The reader is assumed to have a basic understanding of what exceptions like
15segmentation faults, etc. are, as well as Posix signals.
16This document does not explain what a segfault is, nor what "exception
17handling" is at a high level (though it certainly can if there is a need).
18
19## The basics
20
21Exceptions are handled from userspace by binding a Zircon Port to the
22Exception Port of the desired object: thread, process, or job.
23This is done with the
24[**task_bind_exception_port**() system call](syscalls/task_bind_exception_port.md).
25
26Example:
27
28```cpp
29  zx_handle_t eport;
30  auto status = zx_port_create(0, &eport);
31  // ... check status ...
32  uint32_t options = 0;
33  // The key is anything that is useful to the code handling the exception.
34  uint64_t child_key = 0;
35  // Assume |child| is a process handle.
36  status = zx_task_bind_exception_port(child, eport, child_key, options);
37  // ... check status ...
38```
39
40When an exception occurs a report is sent to the port,
41after which the receiver must reply with either "exception handled"
42or "exception not handled".
43The thread stays paused until then, or until the port is unbound,
44either explicitly or by the port being closed (say because the handler
45process exited). If the port is unbound, for whatever reason, the
46exception is processed as if the reply was "exception not handled".
47
48Here is a simple exception handling loop.
49The main components of it are the call to the
50[**port_wait**() system call](syscalls/port_wait.md)
51to wait for an exception, or anything else that's interesting, to happen,
52and the call to the
53[**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md)
54to indicate the handler is finished processing the exception.
55
56```cpp
57  while (true) {
58    zx_port_packet_t packet;
59    auto status = zx_port_wait(eport, ZX_TIME_INFINITE, packet);
60    // ... check status ...
61    if (packet.key != child_key) {
62      // ... do something else, depending on what else the port is used for ...
63      continue;
64    }
65    if (!ZX_PKT_IS_EXCEPTION(packet.type)) {
66      // ... probably a signal, process it ...
67      continue;
68    }
69    zx_koid_t packet_tid = packet.exception.tid;
70    zx_handle_t thread;
71    status = zx_object_get_child(child, packet_tid, ZX_RIGHT_SAME_RIGHTS,
72                                 &thread);
73    // ... check status ...
74    bool handled = process_exception(child, thread, &packet);
75    uint32_t resume_flags = 0;
76    if (!handled)
77      resume_flags |= ZX_RESUME_TRY_NEXT;
78    status = zx_task_resume_from_exception(thread, eport, resume_flags);
79    // ... check status ...
80    status = zx_handle_close(thread);
81    assert(status == ZX_OK);
82  }
83```
84
85To unbind an exception port, pass **ZX_HANDLE_INVALID** for the
86exception port:
87
88```cpp
89  uint32_t options = 0;
90  status = zx_task_bind_exception_port(child, ZX_HANDLE_INVALID,
91                                       key, options);
92  // ... check status ...
93```
94
95## Exception processing details
96
97When a thread gets an exception it is paused while the kernel processes
98the exception. The kernel looks for bound exception ports in a specific order
99and if it finds one an "exception report" is sent to the bound port.
100
101Exception reports are messages sent through the port with a specific format
102defined by the port message protocol. The packet contents are defined by
103the *zx_packet_exception_t* type defined in
104[`<zircon/syscalls/port.h>`](../system/public/zircon/syscalls/port.h).
105
106The exception handler is expected to read the message, decide how it
107wants to process the exception, and then resume the thread that got the
108exception with the
109[**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md).
110
111Resuming the thread can be done in either of two ways:
112
113- Resume execution of the thread as if the exception has been resolved.
114If the thread gets another exception then exception processing begins
115again anew. An example of when one would do this is when resuming after a
116debugger breakpoint.
117
118```cpp
119  auto status = zx_task_resume_from_exception(thread, eport, 0);
120  // ... check status ...
121```
122
123- Resume exception processing, marking the exception as "unhandled" by the
124current handler, thus giving the next exception port in the search order a
125chance to process the exception. An example of when one would do this is
126when the exception is not one the handler intends to process.
127
128```cpp
129  auto status = zx_task_resume_from_exception(thread, eport,
130      ZX_RESUME_TRY_NEXT);
131  // ... check status ...
132```
133
134If there are no remaining exception ports to try the kernel terminates
135the process, as if *zx_task_kill(process)* was called.
136The return code of a process terminated by an exception is an
137unspecified non-zero value.
138The return code can be obtained with *zx_object_get_info(ZX_INFO_PROCESS)*.
139Example:
140
141```cpp
142    zx_info_process_t info;
143    auto status = zx_object_get_info(process, ZX_INFO_PROCESS, &info,
144                                     sizeof(info), nullptr, nullptr);
145    // ... check status ...
146    int return_code = info.return_code;
147```
148
149Resuming the thread requires a handle of the thread, which the handler
150may not yet have. The handle is obtained with the
151[**object_get_child**() system call](syscalls/object_get_child.md).
152The pid,tid necessary to look up the thread are contained in the
153exception report. See the above trivial exception handler example.
154
155## Types of exceptions
156
157At a high level there are two types of exceptions: architectural and
158synthetic.
159Architectural exceptions are things like a segment fault (e.g., dereferencing
160the NULL pointer) or executing an undefined instruction. Synthetic exceptions
161are things like thread start and exit notifications. Synthetic
162exceptions are further distinguished as being debugger-specific or not.
163
164We use the term "general exceptions" to describe non-debugger-specific
165exceptions, and we use the term "debugger-specific exceptions" to describe
166exceptions that are only sent to debuggers.
167
168Exception types are enumerated in the *zx_excp_type_t* enum defined
169in [`<zircon/syscalls/exception.h>`](../system/public/zircon/syscalls/exception.h).
170
171## Exception ports
172
173Exception ports are where exception packets get sent to.
174A zircon port is bound to the exception port of a task object
175(thread, process, job) and then exception packets are sent to that
176port in a manner described below.
177
178Zircon supports the following general exception ports:
179
180- *Thread*
181- *Process*
182- *Job*
183
184Zircon also supports the following debugger-specific exception ports:
185
186- *Process Debugger*
187- *Job Debugger*
188
189There is only one of each kind of these per associated object.
190Note that processes and jobs have two distinct exception ports:
191the general one and a debugger-specific one.
192
193To bind to the debugger exception port pass
194**ZX_EXCEPTION_PORT_DEBUGGER** in *options* when binding an
195exception port to the process or job.
196
197## Exception delivery
198
199### Debugger only exceptions
200
201Debugger-only exceptions are only sent to one potential handler
202if it is present: a debugger.
203
204The job debugger exception port receives the following synthetic
205exception:
206
207- **ZX_EXCP_PROCESS_STARTING**
208
209The process debugger exception port receives the following synthetic
210exceptions:
211
212- **ZX_EXCP_THREAD_STARTING**
213- **ZX_EXCP_THREAD_EXITING**
214
215Note that there is no **ZX_EXCP_PROCESS_EXITING** exception.
216Also note that the process debugger exception port also receives
217all general exceptions: We want the debugger to be notified if, for
218example, a thread being debugged segfaults.
219
220### General exceptions
221
222Exceptions that are not debugger specific are all architectural
223exceptions and all synthetic exceptions not previously listed as
224debugger-specific, e.g., **ZX_EXCP_POLICY_ERROR**.
225
226General exceptions are sent to exception ports in the following order:
227
228- *Process Debugger* - The process debugger exception port is for
229things like zxdb and gdb.
230
231- *Thread* - This is for exception ports bound directly to the thread.
232
233- *Process* - This is for exception ports bound directly to the process.
234
235- *Job* - This is for exception ports bound to the process's job. Note that
236jobs have a hierarchy. First the process's job is searched. If it has a bound
237exception port then the exception is delivered to that port. If it does not
238have a bound exception port, or if the handler returns **ZX_RESUME_TRY_NEXT**,
239then that job's parent job is searched, and so on right up to the root job.
240
241If no exception port handles the exception then the kernel finishes
242exception processing by killing the process.
243
244Notes:
245
246- The search order is different than that of Mach. In Zircon the
247debugger exception port is tried first, before all other ports.
248This is useful for at least a few reasons:
249
250    - Allows "fix and continue" debugging. E.g., if a thread gets a segfault,
251      the debugger user can fix the segfault and resume the thread before the
252      thread even knows it got a segfault.
253    - Makes debugger breakpoints easier to reason about.
254
255## Interaction with thread suspension
256
257Exceptions and thread suspensions are treated separately.
258In other words, a thread can be both in an exception and be suspended.
259This can happen if the thread is suspended while waiting for a response
260from an exception handler. The thread stays paused until it is resumed
261for both the exception and the suspension:
262
263```cpp
264  auto status = zx_task_resume_from_exception(thread, eport, 0);
265  // ... check status ...
266```
267
268and one for the suspension:
269
270```cpp
271  // suspend_token was obtained by an earlier call to zx_task_suspend().
272  auto status = zx_handle_close(suspend_token);
273  // ... check status ...
274```
275
276The order does not matter.
277
278## Signals
279
280Signals are the core Zircon mechanism for observing state changes on
281kernel Objects (a Channel becoming readable, a Process terminating,
282an Event becoming signaled, etc). See ["signals"](signals.md).
283
284Unlike exceptions, signals do not require a response from an exception handler.
285On the other hand signals are sent to whomever is waiting on the thread's
286handle, instead of being sent to the exception port that could be
287bound to the thread's process.
288This is generally not a problem for exception handlers because they generally
289keep track of thread handles anyway. For example, they need the thread handle
290to resume the thread after an exception.
291
292It does, however, mean that an exception handler must wait on the
293port *and* every thread handle that it wishes to monitor.
294Fortunately, one can reduce this to continuing to just have to wait
295on the port by using the
296[**object_wait_async**() system call](syscalls/object_wait_async.md)
297to have signals regarding each thread sent to the port.
298In other words, there is still just one system call involved to wait
299for something interesting to happen.
300
301```cpp
302  uint64_t key = some_key_denoting_the_thread;
303  bool is_suspended = thread_is_suspended(thread);
304  zx_signals_t signals = ZX_THREAD_TERMINATED;
305  if (is_suspended)
306    signals |= ZX_THREAD_RUNNING;
307  else
308    signals |= ZX_THREAD_SUSPENDED;
309  uint32_t options = ZX_WAIT_ASYNC_ONCE;
310  auto status = zx_object_wait_async(thread, eport, key, signals, options);
311  // ... check status ...
312```
313
314When the thread gets any of the specified signals a **ZX_PKT_TYPE_SIGNAL_ONE**
315packet will be sent to the port. After processing the signal the above
316call to **zx_object_wait_async**() must be done again, that is the nature
317of **ZX_WAIT_ASYNC_ONCE**.
318
319*Note:* There is both an exception and a signal for thread termination.
320The **ZX_EXCP_THREAD_EXITING** exception is sent first. When the thread
321is finally terminated the **ZX_THREAD_TERMINATED** signal is sent.
322
323The following signals are relevant to exception handlers:
324
325- **ZX_THREAD_TERMINATED**
326- **ZX_THREAD_SUSPENDED**
327- **ZX_THREAD_RUNNING**
328
329When a thread is started **ZX_THREAD_RUNNING** is asserted.
330When it is suspended **ZX_THREAD_RUNNING** is deasserted, and
331**ZX_THREAD_SUSPENDED** is asserted. When the thread is resumed
332**ZX_THREAD_SUSPENDED** is deasserted and **ZX_THREAD_RUNNING** is
333asserted. When a thread terminates both **ZX_THREAD_RUNNING** and
334**ZX_THREAD_SUSPENDED** are deasserted and **ZX_THREAD_TERMINATED**
335is asserted. However, signals are OR'd into the state maintained by
336the port thus you may see any combination of requested signals
337when **zx_port_wait**() returns.
338
339## Comparison with Posix (and Linux)
340
341This table shows equivalent terms, types, and function calls between
342Zircon and Posix/Linux for exceptions and the kinds of things exception
343handlers generally do.
344
345```
346Zircon                       Posix/Linux
347------                       -----------
348Exception/Signal             Signal
349ZX_EXCP_*                    SIG*
350task_bind_exception_port()   ptrace(ATTACH,DETACH)
351task_suspend()               kill(SIGSTOP),ptrace(KILL(SIGSTOP))
352handle_close(suspend_token)  kill(SIGCONT),ptrace(CONT)
353task_resume_from_exception   kill(SIGCONT),ptrace(CONT)
354N/A                          kill(everything_other_than_SIGKILL)
355task_kill()                  kill(SIGKILL)
356TBD                          signal()/sigaction()
357port_wait()                  wait*()
358various                      W*() macros from sys/wait.h
359zx_packet_exception_t        siginfo_t
360zx_exception_context_t       siginfo_t
361thread_read_state            ptrace(GETREGS,GETREGSET)
362thread_write_state           ptrace(SETREGS,SETREGSET)
363process_read_memory          ptrace(PEEKTEXT)
364process_write_memory         ptrace(POKETEXT)
365```
366
367Zircon does not have asynchronous signals like SIGINT, SIGQUIT, SIGTERM,
368SIGUSR1, SIGUSR2, and so on.
369
370Another significant different from Posix is that the exception handler
371is always run on a separate thread.
372
373## Example programs
374
375There are three good example programs in the Zircon tree to use to
376further one's understanding of exceptions and signals in Zircon.
377
378- `system/core/svchost/crashsvc`
379
380`crash-svc` is the crash service thread hosted in `svchost`. It
381delegates the processing of the crash to either `ulib/inspector` in a
382standalone zircon build or to a upper layer FIDL service if the build
383contains garnet.
384
385- `system/utest/exception`
386
387The basic exception handling testcase.
388
389- `system/utest/debugger`
390
391Testcase for the rest of the system calls a debugger would use, beyond
392those exercised by system/utest/exception.
393There are tests for segfault recovery, reading/writing thread registers,
394reading/writing process memory, as well as various other tests.
395
396## Todo
397
398There are a few outstanding issues:
399
400- signal()/sigaction() replacement
401
402In Posix one is able to specify handlers for particular signals,
403whereas in Zircon there is currently just the exception port,
404and the handler is expected to understand all possible exceptions.
405This is tracked as ZX-560.
406
407- W*() macros from sys/wait.h
408
409When a process exits because of an exception, no information is provided
410on which exception the process got (e.g., segfault). At present only a
411non-specific non-zero exit code is returned.
412This is tracked as ZX-1974.
413
414- more selectiveness in which exceptions to see
415
416In addition to ZX-560 IWBN to be able to specify to the kernel
417when binding the exception port that one is only interested in
418seeing a particular subset of exceptions.
419This is tracked as ZX-990.
420
421- ability to say exception ports unbind quietly when closed
422
423The default behaviour when a port is unbound implicitly due to
424the port being closed is to resume exception processing, i.e.,
425given the next exception port in the search order a try.
426In debugging sessions it is useful to change the default behavior
427and have the port unbound "quietly", in other words leave things as
428is, with the thread still waiting for an exception response.
429This is because debuggers can crash, and obliterating an active debugging
430session is counterproductive.
431This is tracked as ZX-988.
432
433- rights for binding exception ports and getting debuggable thread handles
434
435In Zircon rights can, in general, only be taken away, they can't be added.
436However, one doesn't want to have "debuggability" a default right:
437debuggers are privileged processes. Thus we need a way to obtain handles
438with sufficient rights for debugging.
439This is tracked as ZX-509, ZX-911, and ZX-923.
440
441- no way to obtain currently bound port or to chain handlers
442
443Currently, there's no way to get the currently bound exception port.
444Possible use-cases are for debugging purposes (e.g, to see what's going on
445in the system).
446Another possible use-case is to allow chaining exception handlers, though for
447the case of in-process chaining it's likely better to use a
448signal()/sigaction() replacement (see ZX-560).
449This is tracked as ZX-1216.
450