1# Exception handling 2 3## Introduction 4 5Exception handling support in Zircon was inspired by similar support in Mach. 6 7Exceptions are mainly used for debugging. Outside of debugging 8one generally uses ["signals"](signals.md). 9Signals are the core Zircon mechanism for observing state changes on 10kernel Objects (a Channel becoming readable, a Process terminating, 11an Event becoming signaled, etc). 12See [Signals](#signals) below. 13 14The reader is assumed to have a basic understanding of what exceptions like 15segmentation faults, etc. are, as well as Posix signals. 16This document does not explain what a segfault is, nor what "exception 17handling" is at a high level (though it certainly can if there is a need). 18 19## The basics 20 21Exceptions are handled from userspace by binding a Zircon Port to the 22Exception Port of the desired object: thread, process, or job. 23This is done with the 24[**task_bind_exception_port**() system call](syscalls/task_bind_exception_port.md). 25 26Example: 27 28```cpp 29 zx_handle_t eport; 30 auto status = zx_port_create(0, &eport); 31 // ... check status ... 32 uint32_t options = 0; 33 // The key is anything that is useful to the code handling the exception. 34 uint64_t child_key = 0; 35 // Assume |child| is a process handle. 36 status = zx_task_bind_exception_port(child, eport, child_key, options); 37 // ... check status ... 38``` 39 40When an exception occurs a report is sent to the port, 41after which the receiver must reply with either "exception handled" 42or "exception not handled". 43The thread stays paused until then, or until the port is unbound, 44either explicitly or by the port being closed (say because the handler 45process exited). If the port is unbound, for whatever reason, the 46exception is processed as if the reply was "exception not handled". 47 48Here is a simple exception handling loop. 49The main components of it are the call to the 50[**port_wait**() system call](syscalls/port_wait.md) 51to wait for an exception, or anything else that's interesting, to happen, 52and the call to the 53[**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md) 54to indicate the handler is finished processing the exception. 55 56```cpp 57 while (true) { 58 zx_port_packet_t packet; 59 auto status = zx_port_wait(eport, ZX_TIME_INFINITE, packet); 60 // ... check status ... 61 if (packet.key != child_key) { 62 // ... do something else, depending on what else the port is used for ... 63 continue; 64 } 65 if (!ZX_PKT_IS_EXCEPTION(packet.type)) { 66 // ... probably a signal, process it ... 67 continue; 68 } 69 zx_koid_t packet_tid = packet.exception.tid; 70 zx_handle_t thread; 71 status = zx_object_get_child(child, packet_tid, ZX_RIGHT_SAME_RIGHTS, 72 &thread); 73 // ... check status ... 74 bool handled = process_exception(child, thread, &packet); 75 uint32_t resume_flags = 0; 76 if (!handled) 77 resume_flags |= ZX_RESUME_TRY_NEXT; 78 status = zx_task_resume_from_exception(thread, eport, resume_flags); 79 // ... check status ... 80 status = zx_handle_close(thread); 81 assert(status == ZX_OK); 82 } 83``` 84 85To unbind an exception port, pass **ZX_HANDLE_INVALID** for the 86exception port: 87 88```cpp 89 uint32_t options = 0; 90 status = zx_task_bind_exception_port(child, ZX_HANDLE_INVALID, 91 key, options); 92 // ... check status ... 93``` 94 95## Exception processing details 96 97When a thread gets an exception it is paused while the kernel processes 98the exception. The kernel looks for bound exception ports in a specific order 99and if it finds one an "exception report" is sent to the bound port. 100 101Exception reports are messages sent through the port with a specific format 102defined by the port message protocol. The packet contents are defined by 103the *zx_packet_exception_t* type defined in 104[`<zircon/syscalls/port.h>`](../system/public/zircon/syscalls/port.h). 105 106The exception handler is expected to read the message, decide how it 107wants to process the exception, and then resume the thread that got the 108exception with the 109[**task_resume_from_exception**() system call](syscalls/task_resume_from_exception.md). 110 111Resuming the thread can be done in either of two ways: 112 113- Resume execution of the thread as if the exception has been resolved. 114If the thread gets another exception then exception processing begins 115again anew. An example of when one would do this is when resuming after a 116debugger breakpoint. 117 118```cpp 119 auto status = zx_task_resume_from_exception(thread, eport, 0); 120 // ... check status ... 121``` 122 123- Resume exception processing, marking the exception as "unhandled" by the 124current handler, thus giving the next exception port in the search order a 125chance to process the exception. An example of when one would do this is 126when the exception is not one the handler intends to process. 127 128```cpp 129 auto status = zx_task_resume_from_exception(thread, eport, 130 ZX_RESUME_TRY_NEXT); 131 // ... check status ... 132``` 133 134If there are no remaining exception ports to try the kernel terminates 135the process, as if *zx_task_kill(process)* was called. 136The return code of a process terminated by an exception is an 137unspecified non-zero value. 138The return code can be obtained with *zx_object_get_info(ZX_INFO_PROCESS)*. 139Example: 140 141```cpp 142 zx_info_process_t info; 143 auto status = zx_object_get_info(process, ZX_INFO_PROCESS, &info, 144 sizeof(info), nullptr, nullptr); 145 // ... check status ... 146 int return_code = info.return_code; 147``` 148 149Resuming the thread requires a handle of the thread, which the handler 150may not yet have. The handle is obtained with the 151[**object_get_child**() system call](syscalls/object_get_child.md). 152The pid,tid necessary to look up the thread are contained in the 153exception report. See the above trivial exception handler example. 154 155## Types of exceptions 156 157At a high level there are two types of exceptions: architectural and 158synthetic. 159Architectural exceptions are things like a segment fault (e.g., dereferencing 160the NULL pointer) or executing an undefined instruction. Synthetic exceptions 161are things like thread start and exit notifications. Synthetic 162exceptions are further distinguished as being debugger-specific or not. 163 164We use the term "general exceptions" to describe non-debugger-specific 165exceptions, and we use the term "debugger-specific exceptions" to describe 166exceptions that are only sent to debuggers. 167 168Exception types are enumerated in the *zx_excp_type_t* enum defined 169in [`<zircon/syscalls/exception.h>`](../system/public/zircon/syscalls/exception.h). 170 171## Exception ports 172 173Exception ports are where exception packets get sent to. 174A zircon port is bound to the exception port of a task object 175(thread, process, job) and then exception packets are sent to that 176port in a manner described below. 177 178Zircon supports the following general exception ports: 179 180- *Thread* 181- *Process* 182- *Job* 183 184Zircon also supports the following debugger-specific exception ports: 185 186- *Process Debugger* 187- *Job Debugger* 188 189There is only one of each kind of these per associated object. 190Note that processes and jobs have two distinct exception ports: 191the general one and a debugger-specific one. 192 193To bind to the debugger exception port pass 194**ZX_EXCEPTION_PORT_DEBUGGER** in *options* when binding an 195exception port to the process or job. 196 197## Exception delivery 198 199### Debugger only exceptions 200 201Debugger-only exceptions are only sent to one potential handler 202if it is present: a debugger. 203 204The job debugger exception port receives the following synthetic 205exception: 206 207- **ZX_EXCP_PROCESS_STARTING** 208 209The process debugger exception port receives the following synthetic 210exceptions: 211 212- **ZX_EXCP_THREAD_STARTING** 213- **ZX_EXCP_THREAD_EXITING** 214 215Note that there is no **ZX_EXCP_PROCESS_EXITING** exception. 216Also note that the process debugger exception port also receives 217all general exceptions: We want the debugger to be notified if, for 218example, a thread being debugged segfaults. 219 220### General exceptions 221 222Exceptions that are not debugger specific are all architectural 223exceptions and all synthetic exceptions not previously listed as 224debugger-specific, e.g., **ZX_EXCP_POLICY_ERROR**. 225 226General exceptions are sent to exception ports in the following order: 227 228- *Process Debugger* - The process debugger exception port is for 229things like zxdb and gdb. 230 231- *Thread* - This is for exception ports bound directly to the thread. 232 233- *Process* - This is for exception ports bound directly to the process. 234 235- *Job* - This is for exception ports bound to the process's job. Note that 236jobs have a hierarchy. First the process's job is searched. If it has a bound 237exception port then the exception is delivered to that port. If it does not 238have a bound exception port, or if the handler returns **ZX_RESUME_TRY_NEXT**, 239then that job's parent job is searched, and so on right up to the root job. 240 241If no exception port handles the exception then the kernel finishes 242exception processing by killing the process. 243 244Notes: 245 246- The search order is different than that of Mach. In Zircon the 247debugger exception port is tried first, before all other ports. 248This is useful for at least a few reasons: 249 250 - Allows "fix and continue" debugging. E.g., if a thread gets a segfault, 251 the debugger user can fix the segfault and resume the thread before the 252 thread even knows it got a segfault. 253 - Makes debugger breakpoints easier to reason about. 254 255## Interaction with thread suspension 256 257Exceptions and thread suspensions are treated separately. 258In other words, a thread can be both in an exception and be suspended. 259This can happen if the thread is suspended while waiting for a response 260from an exception handler. The thread stays paused until it is resumed 261for both the exception and the suspension: 262 263```cpp 264 auto status = zx_task_resume_from_exception(thread, eport, 0); 265 // ... check status ... 266``` 267 268and one for the suspension: 269 270```cpp 271 // suspend_token was obtained by an earlier call to zx_task_suspend(). 272 auto status = zx_handle_close(suspend_token); 273 // ... check status ... 274``` 275 276The order does not matter. 277 278## Signals 279 280Signals are the core Zircon mechanism for observing state changes on 281kernel Objects (a Channel becoming readable, a Process terminating, 282an Event becoming signaled, etc). See ["signals"](signals.md). 283 284Unlike exceptions, signals do not require a response from an exception handler. 285On the other hand signals are sent to whomever is waiting on the thread's 286handle, instead of being sent to the exception port that could be 287bound to the thread's process. 288This is generally not a problem for exception handlers because they generally 289keep track of thread handles anyway. For example, they need the thread handle 290to resume the thread after an exception. 291 292It does, however, mean that an exception handler must wait on the 293port *and* every thread handle that it wishes to monitor. 294Fortunately, one can reduce this to continuing to just have to wait 295on the port by using the 296[**object_wait_async**() system call](syscalls/object_wait_async.md) 297to have signals regarding each thread sent to the port. 298In other words, there is still just one system call involved to wait 299for something interesting to happen. 300 301```cpp 302 uint64_t key = some_key_denoting_the_thread; 303 bool is_suspended = thread_is_suspended(thread); 304 zx_signals_t signals = ZX_THREAD_TERMINATED; 305 if (is_suspended) 306 signals |= ZX_THREAD_RUNNING; 307 else 308 signals |= ZX_THREAD_SUSPENDED; 309 uint32_t options = ZX_WAIT_ASYNC_ONCE; 310 auto status = zx_object_wait_async(thread, eport, key, signals, options); 311 // ... check status ... 312``` 313 314When the thread gets any of the specified signals a **ZX_PKT_TYPE_SIGNAL_ONE** 315packet will be sent to the port. After processing the signal the above 316call to **zx_object_wait_async**() must be done again, that is the nature 317of **ZX_WAIT_ASYNC_ONCE**. 318 319*Note:* There is both an exception and a signal for thread termination. 320The **ZX_EXCP_THREAD_EXITING** exception is sent first. When the thread 321is finally terminated the **ZX_THREAD_TERMINATED** signal is sent. 322 323The following signals are relevant to exception handlers: 324 325- **ZX_THREAD_TERMINATED** 326- **ZX_THREAD_SUSPENDED** 327- **ZX_THREAD_RUNNING** 328 329When a thread is started **ZX_THREAD_RUNNING** is asserted. 330When it is suspended **ZX_THREAD_RUNNING** is deasserted, and 331**ZX_THREAD_SUSPENDED** is asserted. When the thread is resumed 332**ZX_THREAD_SUSPENDED** is deasserted and **ZX_THREAD_RUNNING** is 333asserted. When a thread terminates both **ZX_THREAD_RUNNING** and 334**ZX_THREAD_SUSPENDED** are deasserted and **ZX_THREAD_TERMINATED** 335is asserted. However, signals are OR'd into the state maintained by 336the port thus you may see any combination of requested signals 337when **zx_port_wait**() returns. 338 339## Comparison with Posix (and Linux) 340 341This table shows equivalent terms, types, and function calls between 342Zircon and Posix/Linux for exceptions and the kinds of things exception 343handlers generally do. 344 345``` 346Zircon Posix/Linux 347------ ----------- 348Exception/Signal Signal 349ZX_EXCP_* SIG* 350task_bind_exception_port() ptrace(ATTACH,DETACH) 351task_suspend() kill(SIGSTOP),ptrace(KILL(SIGSTOP)) 352handle_close(suspend_token) kill(SIGCONT),ptrace(CONT) 353task_resume_from_exception kill(SIGCONT),ptrace(CONT) 354N/A kill(everything_other_than_SIGKILL) 355task_kill() kill(SIGKILL) 356TBD signal()/sigaction() 357port_wait() wait*() 358various W*() macros from sys/wait.h 359zx_packet_exception_t siginfo_t 360zx_exception_context_t siginfo_t 361thread_read_state ptrace(GETREGS,GETREGSET) 362thread_write_state ptrace(SETREGS,SETREGSET) 363process_read_memory ptrace(PEEKTEXT) 364process_write_memory ptrace(POKETEXT) 365``` 366 367Zircon does not have asynchronous signals like SIGINT, SIGQUIT, SIGTERM, 368SIGUSR1, SIGUSR2, and so on. 369 370Another significant different from Posix is that the exception handler 371is always run on a separate thread. 372 373## Example programs 374 375There are three good example programs in the Zircon tree to use to 376further one's understanding of exceptions and signals in Zircon. 377 378- `system/core/svchost/crashsvc` 379 380`crash-svc` is the crash service thread hosted in `svchost`. It 381delegates the processing of the crash to either `ulib/inspector` in a 382standalone zircon build or to a upper layer FIDL service if the build 383contains garnet. 384 385- `system/utest/exception` 386 387The basic exception handling testcase. 388 389- `system/utest/debugger` 390 391Testcase for the rest of the system calls a debugger would use, beyond 392those exercised by system/utest/exception. 393There are tests for segfault recovery, reading/writing thread registers, 394reading/writing process memory, as well as various other tests. 395 396## Todo 397 398There are a few outstanding issues: 399 400- signal()/sigaction() replacement 401 402In Posix one is able to specify handlers for particular signals, 403whereas in Zircon there is currently just the exception port, 404and the handler is expected to understand all possible exceptions. 405This is tracked as ZX-560. 406 407- W*() macros from sys/wait.h 408 409When a process exits because of an exception, no information is provided 410on which exception the process got (e.g., segfault). At present only a 411non-specific non-zero exit code is returned. 412This is tracked as ZX-1974. 413 414- more selectiveness in which exceptions to see 415 416In addition to ZX-560 IWBN to be able to specify to the kernel 417when binding the exception port that one is only interested in 418seeing a particular subset of exceptions. 419This is tracked as ZX-990. 420 421- ability to say exception ports unbind quietly when closed 422 423The default behaviour when a port is unbound implicitly due to 424the port being closed is to resume exception processing, i.e., 425given the next exception port in the search order a try. 426In debugging sessions it is useful to change the default behavior 427and have the port unbound "quietly", in other words leave things as 428is, with the thread still waiting for an exception response. 429This is because debuggers can crash, and obliterating an active debugging 430session is counterproductive. 431This is tracked as ZX-988. 432 433- rights for binding exception ports and getting debuggable thread handles 434 435In Zircon rights can, in general, only be taken away, they can't be added. 436However, one doesn't want to have "debuggability" a default right: 437debuggers are privileged processes. Thus we need a way to obtain handles 438with sufficient rights for debugging. 439This is tracked as ZX-509, ZX-911, and ZX-923. 440 441- no way to obtain currently bound port or to chain handlers 442 443Currently, there's no way to get the currently bound exception port. 444Possible use-cases are for debugging purposes (e.g, to see what's going on 445in the system). 446Another possible use-case is to allow chaining exception handlers, though for 447the case of in-process chaining it's likely better to use a 448signal()/sigaction() replacement (see ZX-560). 449This is tracked as ZX-1216. 450