# Exception handling ## Introduction Exception handling support in Zircon was inspired by similar support in Mach. Exceptions are mainly used for debugging. Outside of debugging one generally uses ["signals"](signals.md). Signals are the core Zircon mechanism for observing state changes on kernel Objects (a Channel becoming readable, a Process terminating, an Event becoming signaled, etc). See [Signals](#signals) below. The reader is assumed to have a basic understanding of what exceptions like segmentation faults, etc. are, as well as Posix signals. This document does not explain what a segfault is, nor what "exception handling" is at a high level (though it certainly can if there is a need). ## The basics Exceptions are handled from userspace by binding a Zircon Port to the Exception Port of the desired object: thread, process, or job. This is done with the [**task_bind_exception_port**() system call](syscalls/task_bind_exception_port.md). Example: ```cpp zx_handle_t eport; auto status = zx_port_create(0, &eport); // ... check status ... uint32_t options = 0; // The key is anything that is useful to the code handling the exception. uint64_t child_key = 0; // Assume |child| is a process handle. status = zx_task_bind_exception_port(child, eport, child_key, options); // ... check status ... ``` When an exception occurs a report is sent to the port, after which the receiver must reply with either "exception handled" or "exception not handled". The thread stays paused until then, or until the port is unbound, either explicitly or by the port being closed (say because the handler process exited). If the port is unbound, for whatever reason, the exception is processed as if the reply was "exception not handled". Here is a simple exception handling loop. The main components of it are the call to the [**port_wait**() system call](syscalls/port_wait.md) to wait for an exception, or anything else that's interesting, to happen, and the call to the [**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md) to indicate the handler is finished processing the exception. ```cpp while (true) { zx_port_packet_t packet; auto status = zx_port_wait(eport, ZX_TIME_INFINITE, packet); // ... check status ... if (packet.key != child_key) { // ... do something else, depending on what else the port is used for ... continue; } if (!ZX_PKT_IS_EXCEPTION(packet.type)) { // ... probably a signal, process it ... continue; } zx_koid_t packet_tid = packet.exception.tid; zx_handle_t thread; status = zx_object_get_child(child, packet_tid, ZX_RIGHT_SAME_RIGHTS, &thread); // ... check status ... bool handled = process_exception(child, thread, &packet); uint32_t resume_flags = 0; if (!handled) resume_flags |= ZX_RESUME_TRY_NEXT; status = zx_task_resume_from_exception(thread, eport, resume_flags); // ... check status ... status = zx_handle_close(thread); assert(status == ZX_OK); } ``` To unbind an exception port, pass **ZX_HANDLE_INVALID** for the exception port: ```cpp uint32_t options = 0; status = zx_task_bind_exception_port(child, ZX_HANDLE_INVALID, key, options); // ... check status ... ``` ## Exception processing details When a thread gets an exception it is paused while the kernel processes the exception. The kernel looks for bound exception ports in a specific order and if it finds one an "exception report" is sent to the bound port. Exception reports are messages sent through the port with a specific format defined by the port message protocol. The packet contents are defined by the *zx_packet_exception_t* type defined in [``](../system/public/zircon/syscalls/port.h). The exception handler is expected to read the message, decide how it wants to process the exception, and then resume the thread that got the exception with the [**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md). Resuming the thread can be done in either of two ways: - Resume execution of the thread as if the exception has been resolved. If the thread gets another exception then exception processing begins again anew. An example of when one would do this is when resuming after a debugger breakpoint. ```cpp auto status = zx_task_resume_from_exception(thread, eport, 0); // ... check status ... ``` - Resume exception processing, marking the exception as "unhandled" by the current handler, thus giving the next exception port in the search order a chance to process the exception. An example of when one would do this is when the exception is not one the handler intends to process. ```cpp auto status = zx_task_resume_from_exception(thread, eport, ZX_RESUME_TRY_NEXT); // ... check status ... ``` If there are no remaining exception ports to try the kernel terminates the process, as if *zx_task_kill(process)* was called. The return code of a process terminated by an exception is an unspecified non-zero value. The return code can be obtained with *zx_object_get_info(ZX_INFO_PROCESS)*. Example: ```cpp zx_info_process_t info; auto status = zx_object_get_info(process, ZX_INFO_PROCESS, &info, sizeof(info), nullptr, nullptr); // ... check status ... int return_code = info.return_code; ``` Resuming the thread requires a handle of the thread, which the handler may not yet have. The handle is obtained with the [**object_get_child**() system call](syscalls/object_get_child.md). The pid,tid necessary to look up the thread are contained in the exception report. See the above trivial exception handler example. ## Types of exceptions At a high level there are two types of exceptions: architectural and synthetic. Architectural exceptions are things like a segment fault (e.g., dereferencing the NULL pointer) or executing an undefined instruction. Synthetic exceptions are things like thread start and exit notifications. Synthetic exceptions are further distinguished as being debugger-specific or not. We use the term "general exceptions" to describe non-debugger-specific exceptions, and we use the term "debugger-specific exceptions" to describe exceptions that are only sent to debuggers. Exception types are enumerated in the *zx_excp_type_t* enum defined in [``](../system/public/zircon/syscalls/exception.h). ## Exception ports Exception ports are where exception packets get sent to. A zircon port is bound to the exception port of a task object (thread, process, job) and then exception packets are sent to that port in a manner described below. Zircon supports the following general exception ports: - *Thread* - *Process* - *Job* Zircon also supports the following debugger-specific exception ports: - *Process Debugger* - *Job Debugger* There is only one of each kind of these per associated object. Note that processes and jobs have two distinct exception ports: the general one and a debugger-specific one. To bind to the debugger exception port pass **ZX_EXCEPTION_PORT_DEBUGGER** in *options* when binding an exception port to the process or job. ## Exception delivery ### Debugger only exceptions Debugger-only exceptions are only sent to one potential handler if it is present: a debugger. The job debugger exception port receives the following synthetic exception: - **ZX_EXCP_PROCESS_STARTING** The process debugger exception port receives the following synthetic exceptions: - **ZX_EXCP_THREAD_STARTING** - **ZX_EXCP_THREAD_EXITING** Note that there is no **ZX_EXCP_PROCESS_EXITING** exception. Also note that the process debugger exception port also receives all general exceptions: We want the debugger to be notified if, for example, a thread being debugged segfaults. ### General exceptions Exceptions that are not debugger specific are all architectural exceptions and all synthetic exceptions not previously listed as debugger-specific, e.g., **ZX_EXCP_POLICY_ERROR**. General exceptions are sent to exception ports in the following order: - *Process Debugger* - The process debugger exception port is for things like zxdb and gdb. - *Thread* - This is for exception ports bound directly to the thread. - *Process* - This is for exception ports bound directly to the process. - *Job* - This is for exception ports bound to the process's job. Note that jobs have a hierarchy. First the process's job is searched. If it has a bound exception port then the exception is delivered to that port. If it does not have a bound exception port, or if the handler returns **ZX_RESUME_TRY_NEXT**, then that job's parent job is searched, and so on right up to the root job. If no exception port handles the exception then the kernel finishes exception processing by killing the process. Notes: - The search order is different than that of Mach. In Zircon the debugger exception port is tried first, before all other ports. This is useful for at least a few reasons: - Allows "fix and continue" debugging. E.g., if a thread gets a segfault, the debugger user can fix the segfault and resume the thread before the thread even knows it got a segfault. - Makes debugger breakpoints easier to reason about. ## Interaction with thread suspension Exceptions and thread suspensions are treated separately. In other words, a thread can be both in an exception and be suspended. This can happen if the thread is suspended while waiting for a response from an exception handler. The thread stays paused until it is resumed for both the exception and the suspension: ```cpp auto status = zx_task_resume_from_exception(thread, eport, 0); // ... check status ... ``` and one for the suspension: ```cpp // suspend_token was obtained by an earlier call to zx_task_suspend(). auto status = zx_handle_close(suspend_token); // ... check status ... ``` The order does not matter. ## Signals Signals are the core Zircon mechanism for observing state changes on kernel Objects (a Channel becoming readable, a Process terminating, an Event becoming signaled, etc). See ["signals"](signals.md). Unlike exceptions, signals do not require a response from an exception handler. On the other hand signals are sent to whomever is waiting on the thread's handle, instead of being sent to the exception port that could be bound to the thread's process. This is generally not a problem for exception handlers because they generally keep track of thread handles anyway. For example, they need the thread handle to resume the thread after an exception. It does, however, mean that an exception handler must wait on the port *and* every thread handle that it wishes to monitor. Fortunately, one can reduce this to continuing to just have to wait on the port by using the [**object_wait_async**() system call](syscalls/object_wait_async.md) to have signals regarding each thread sent to the port. In other words, there is still just one system call involved to wait for something interesting to happen. ```cpp uint64_t key = some_key_denoting_the_thread; bool is_suspended = thread_is_suspended(thread); zx_signals_t signals = ZX_THREAD_TERMINATED; if (is_suspended) signals |= ZX_THREAD_RUNNING; else signals |= ZX_THREAD_SUSPENDED; uint32_t options = ZX_WAIT_ASYNC_ONCE; auto status = zx_object_wait_async(thread, eport, key, signals, options); // ... check status ... ``` When the thread gets any of the specified signals a **ZX_PKT_TYPE_SIGNAL_ONE** packet will be sent to the port. After processing the signal the above call to **zx_object_wait_async**() must be done again, that is the nature of **ZX_WAIT_ASYNC_ONCE**. *Note:* There is both an exception and a signal for thread termination. The **ZX_EXCP_THREAD_EXITING** exception is sent first. When the thread is finally terminated the **ZX_THREAD_TERMINATED** signal is sent. The following signals are relevant to exception handlers: - **ZX_THREAD_TERMINATED** - **ZX_THREAD_SUSPENDED** - **ZX_THREAD_RUNNING** When a thread is started **ZX_THREAD_RUNNING** is asserted. When it is suspended **ZX_THREAD_RUNNING** is deasserted, and **ZX_THREAD_SUSPENDED** is asserted. When the thread is resumed **ZX_THREAD_SUSPENDED** is deasserted and **ZX_THREAD_RUNNING** is asserted. When a thread terminates both **ZX_THREAD_RUNNING** and **ZX_THREAD_SUSPENDED** are deasserted and **ZX_THREAD_TERMINATED** is asserted. However, signals are OR'd into the state maintained by the port thus you may see any combination of requested signals when **zx_port_wait**() returns. ## Comparison with Posix (and Linux) This table shows equivalent terms, types, and function calls between Zircon and Posix/Linux for exceptions and the kinds of things exception handlers generally do. ``` Zircon Posix/Linux ------ ----------- Exception/Signal Signal ZX_EXCP_* SIG* task_bind_exception_port() ptrace(ATTACH,DETACH) task_suspend() kill(SIGSTOP),ptrace(KILL(SIGSTOP)) handle_close(suspend_token) kill(SIGCONT),ptrace(CONT) task_resume_from_exception kill(SIGCONT),ptrace(CONT) N/A kill(everything_other_than_SIGKILL) task_kill() kill(SIGKILL) TBD signal()/sigaction() port_wait() wait*() various W*() macros from sys/wait.h zx_packet_exception_t siginfo_t zx_exception_context_t siginfo_t thread_read_state ptrace(GETREGS,GETREGSET) thread_write_state ptrace(SETREGS,SETREGSET) process_read_memory ptrace(PEEKTEXT) process_write_memory ptrace(POKETEXT) ``` Zircon does not have asynchronous signals like SIGINT, SIGQUIT, SIGTERM, SIGUSR1, SIGUSR2, and so on. Another significant different from Posix is that the exception handler is always run on a separate thread. ## Example programs There are three good example programs in the Zircon tree to use to further one's understanding of exceptions and signals in Zircon. - `system/core/svchost/crashsvc` `crash-svc` is the crash service thread hosted in `svchost`. It delegates the processing of the crash to either `ulib/inspector` in a standalone zircon build or to a upper layer FIDL service if the build contains garnet. - `system/utest/exception` The basic exception handling testcase. - `system/utest/debugger` Testcase for the rest of the system calls a debugger would use, beyond those exercised by system/utest/exception. There are tests for segfault recovery, reading/writing thread registers, reading/writing process memory, as well as various other tests. ## Todo There are a few outstanding issues: - signal()/sigaction() replacement In Posix one is able to specify handlers for particular signals, whereas in Zircon there is currently just the exception port, and the handler is expected to understand all possible exceptions. This is tracked as ZX-560. - W*() macros from sys/wait.h When a process exits because of an exception, no information is provided on which exception the process got (e.g., segfault). At present only a non-specific non-zero exit code is returned. This is tracked as ZX-1974. - more selectiveness in which exceptions to see In addition to ZX-560 IWBN to be able to specify to the kernel when binding the exception port that one is only interested in seeing a particular subset of exceptions. This is tracked as ZX-990. - ability to say exception ports unbind quietly when closed The default behaviour when a port is unbound implicitly due to the port being closed is to resume exception processing, i.e., given the next exception port in the search order a try. In debugging sessions it is useful to change the default behavior and have the port unbound "quietly", in other words leave things as is, with the thread still waiting for an exception response. This is because debuggers can crash, and obliterating an active debugging session is counterproductive. This is tracked as ZX-988. - rights for binding exception ports and getting debuggable thread handles In Zircon rights can, in general, only be taken away, they can't be added. However, one doesn't want to have "debuggability" a default right: debuggers are privileged processes. Thus we need a way to obtain handles with sufficient rights for debugging. This is tracked as ZX-509, ZX-911, and ZX-923. - no way to obtain currently bound port or to chain handlers Currently, there's no way to get the currently bound exception port. Possible use-cases are for debugging purposes (e.g, to see what's going on in the system). Another possible use-case is to allow chaining exception handlers, though for the case of in-process chaining it's likely better to use a signal()/sigaction() replacement (see ZX-560). This is tracked as ZX-1216.