1# Zircon program loading and dynamic linking
2
3In Zircon, the kernel is not directly involved in normal program loading.
4(The one necessary exception is bootstrapping the userspace environment at
5system startup; see [`userboot`](userboot.md).)  Instead, the kernel merely
6provides the building blocks
7([VMO](objects/vm_object.md), [process](objects/process.md),
8[VMAR](objects/vm_address_region.md), [thread](objects/thread.md)) from
9which userspace program loading is built.
10
11[TOC]
12
13## ELF and the system ABI
14
15The standard Zircon userspace environment uses
16the [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)
17format for machine-code executable files, and provides a dynamic linker and
18C/C++ execution environment that are based on ELF.  Zircon processes can
19use [system calls](syscalls.md) only via the [vDSO](vdso.md), which is
20provided by the kernel in ELF format and uses the C/C++ calling conventions
21common to ELF-based systems for the machine.  Userspace code (given the
22appropriate capabilities) can use the [system call](syscalls.md) building
23blocks directly to create processes and load programs into them without
24using ELF.  But Zircon's standard ABI for machine code uses ELF as
25described here.
26
27## Background: traditional ELF program loading
28
29[ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) was
30introduced with Unix System V Release 4 and became the common standard
31executable file format for most Unix-like systems, today including Linux and
32all the BSD variants as well as Solaris and many others.  In these systems,
33the kernel integrates program loading with filesystem access via the POSIX
34`execve` API.  There are some variations in how they load ELF programs, but
35most follow a pattern close to this:
36
37 1. The kernel loads the file by name, and checks whether it's ELF or some
38    other kind of file that system supports.  This is where `#!` script
39    handling is done, as well non-ELF format support when present.
40 2. The kernel maps the ELF image according to its `PT_LOAD` program
41    headers.  For an `ET_EXEC` file, this places the program's segments at
42    fixed addresses in memory specified in `p_vaddr`.  For an `ET_DYN`
43    file, the system chooses the base address where the program's first
44    `PT_LOAD` gets loaded, and following segments are placed according to
45    their `p_vaddr` relative to the first segment's `p_vaddr`.  Usually the
46    base address is chosen randomly (ASLR).
47 3. If there was a `PT_INTERP` program header, its contents (a range of
48    bytes in the ELF file given by `p_offset` and `p_filesz`) is looked up
49    as a file name to find another ELF file called the *ELF interpreter*.
50    This must be an `ET_DYN` file; the kernel loads it in the same way as it
51    loaded the executable, but always at a location of its own choosing.
52    The interpreter program is usually the ELF dynamic linker with a name
53    like `/lib/ld.so.1` or `/lib/ld-linux.so.2`, but the kernel loads
54    whatever file is named.
55 4. The kernel sets up the stack and registers for the initial thread, and
56    starts the thread running with the PC at the chosen entry point address.
57
58     * The entry point is the `e_entry` value from the ELF file header,
59       adjusted by base address.  When there was a `PT_INTERP`, the entry
60       point is that of the interepreter rather than the main executable.
61     * There is an assembly-level protocol of register and stack contents
62       that the kernel sets up for the program to receive its argument and
63       environment strings and an *auxiliary vector* of useful values.  When
64       there was a `PT_INTERP`, these include the base address, entry point,
65       and program header table address from the main executable's ELF
66       headers.  This information allows the dynamic linker to find the main
67       executable's ELF dynamic linking metadata in memory and do its work.
68       When dynamic linking startup is complete, the dynamic linker jumps to
69       the main executable's entry point address.
70
71Zircon program loading is inspired by this tradition, but does it somewhat
72differently.  A key reason for the traditional pattern of loading the
73executable before loading the dynamic linker is that the dynamic linker's
74randomly-chosen base address must not intersect with the fixed addresses
75used by an `ET_EXEC` executable file.  Zircon does not support
76fixed-address program loading (ELF `ET_EXEC` files) at all, only
77position-independent executables or *PIE*s, which are ELF `ET_DYN` files.
78
79## The **launchpad** library
80
81The main implementation of program loading resides in
82the [`launchpad` library](../system/ulib/launchpad/).  It has a C API
83in
84[`<launchpad/launchpad.h>`](../system/ulib/launchpad/include/launchpad/launchpad.h) but
85is not formally documented.  The `launchpad` API is not described here.  Its
86treatment of executable files and process startup forms the Zircon system
87ABI for program loading.
88The [lowest userspace layers of the system](userboot.md) implement the same
89protocols.  It's anticipated that in the future most process launching in
90the system will be done by a system service that uses `launchpad` in its
91implementation, rather than by direct use of the library.
92
93Filesystems are not part of the lower layers of Zircon API.  Instead,
94program loading is based on [VMOs](objects/vm_object.md) and on IPC
95protocols used via [channels](objects/channel.md).
96
97A program loading request starts with:
98
99 * a handle to a VMO containing the executable file (`ZX_RIGHT_READ` and
100   `ZX_RIGHT_EXECUTE` rights are required)
101 * a list of argument strings (to become `argv[]` in a C/C++ program)
102 * a list of environment strings (to become `environ[]` in a C/C++ program)
103 * a list of initial [handles](handles.md), each with
104   a [*handle info entry*](#handle-info-entry)
105
106Three types of file are handled:
107
108{#hashbang}
109* a script file starting with `#!`
110
111  The first line of the file starts with `#!` and must be no more than 127
112  characters long.  The first non-whitespace word following `#!` is the
113  *script interpreter name*.  If there's anything after that, it all
114  together becomes the *script interpreter argument*.
115
116   * The script interpreter name is prepended to the original argument
117     list (to become `argv[0]`).
118   * If there was a script interpreter argument, it's inserted between the
119     interpreter name and the original argument list (to become `argv[1]`,
120     with the original `argv[0]` becoming `argv[2]`).
121   * The program loader looks up the script interpreter name via
122     the [loader service](#the-loader-service) to get a new VMO.
123   * Program loading restarts on that script interpreter VMO with the
124     modified argument list but everything else the same.  The VMO handle
125     for the original executable is just closed; the script interpreter only
126     gets the original `argv[0]` string to work with, not the original VMO.
127     There is a maximum nesting limit (currently 5) constraining how many
128     such restarts will be allowed before program loading just fails.
129
130* an ELF `ET_DYN` file with no `PT_INTERP`
131
132  * The system chooses a random base address for the first `PT_LOAD` segment
133    and then maps in each `PT_LOAD` segment relative to that base address.
134    This is done by creating a [VMAR](objects/vm_address_region.md) covering
135    the whole range from the first page of the first segment to the last
136    page of the last segment.
137  * A VMO is created and mapped at another random address to hold the stack
138    for the initial thread.  If there was a `PT_GNU_STACK` program header
139    with a nonzero `p_memsz`, that determines the size of the stack (rounded
140    up to whole pages).  Otherwise, a reasonable default stack size is used.
141  * The [vDSO](vdso.md) is mapped into the process
142    (another VMO containing an ELF image), also at a random base address.
143  * A new thread is created in the process with [**thread_create**()](syscalls/thread_create.md).
144  * A new [channel](objects/channel.md) is created, called the *bootstrap
145    channel*.  The program loader writes into this channel a message
146    in [the `processargs` protocol](#the-processargs-protocol) format. This
147    *bootstrap message* includes the argument and environment strings and
148    the initial handles from the original request.  That list is augmented
149    with handles for:
150
151     * the new [process](objects/process.md) itself
152     * its root [VMAR](objects/vm_address_region.md)
153     * its initial [thread](objects/thread.md)
154     * the VMAR covering where the executable was loaded
155     * the VMO just created for the stack
156     * optionally, a default [job](objects/job.md) so the new
157       process itself can create more processes
158     * optionally, the vDSO VMO so the new process can let the processes
159       it creates make system calls themselves
160
161    The program loader then closes its end of the channel.
162   * The initial thread is launched with
163     the [**process_start**() system call](syscalls/process_start.md):
164
165      * `entry` sets the new thread's PC to `e_entry` from the executable's
166        ELF header, adjusted by base address.
167      * `stack` sets the new thread's stack pointer to the top of the
168        stack mapping.
169      * `arg1` transfers the handle to the *bootstrap channel* into the
170        first argument register in the C ABI.
171      * `arg2` passes the base address of the vDSO into the second argument
172        register in the C ABI.
173
174     Thus, the program entry point can be written as a C function:
175     ```c
176     noreturn void _start(zx_handle_t bootstrap_channel, uintptr_t vdso_base);
177     ```
178
179{#PT_INTERP}
180* an ELF `ET_DYN` file with a `PT_INTERP`
181
182  In this case, the program loader does not directly use the VMO containing
183  the ELF executable after reading its `PT_INTERP` header.  Instead, it
184  uses the `PT_INTERP` contents as the name of an *ELF interpreter*.  This
185  name is used in a request to the [loader service](#the-loader-service) to
186  get a new VMO containing the ELF interpreter, which is another ELF
187  `ET_DYN` file.  Then that VMO is loaded instead of the main executable's
188  VMO.  Startup is as described above, with these differences:
189
190   * An extra message
191     in [the `processargs` protocol](#the-processargs-protocol) is written
192     to the *bootstrap channel*, preceding the main bootstrap message.  The
193     ELF interpreter is expected to consume this *loader bootstrap message*
194     itself so that it can do its work, but then leave the second bootstrap
195     message in the channel and hand off the bootstrap channel handle to
196     the main program's entry point.  The *loader bootstrap message*
197     includes only the necessary handles added by the program loader, not
198     the full set that go into the main *bootstrap message*, plus these:
199
200      * the original VMO handle for main ELF executable
201      * a channel handle to the [loader service](#the-loader-service)
202
203     These allow the ELF interpreter to do its own loading of the
204     executable from the VMO and to use the loader service to get
205     additional VMOs for shared libraries to load.  The message also
206     includes the argument and environment strings, which lets the ELF
207     interpreter use `argv[0]` in its log messages, and check for
208     environment variables like `LD_DEBUG`.
209
210   * `PT_GNU_STACK` program headers are ignored.  Instead, the program
211     loader chooses a minimal stack size that is just large enough to
212     contain the *loader bootstrap message* plus some breathing room for
213     the ELF interpreter's startup code to use as call frames.  This
214     "breathing room" size is `PTHREAD_STACK_MIN` in the source, and is
215     tuned such that with a small bootstrap message size the whole stack is
216     only a single page, but a careful dynamic linker implementation has
217     enough space to work in.  The dynamic linker is expected to read the
218     main executable's `PT_GNU_STACK` and switch to a stack of reasonable
219     size for normal use before it jumps to the main executable's entry
220     point.
221
222*** aside
223
224The program loader chooses three randomly-placed chunks of the new
225process's address space before the program (or dynamic linker) gets
226control: the vDSO, the stack, and the dynamic linker itself.  To make it
227possible for the program's own startup to control its address space more
228fully, the program loader currently ensures that these random placements
229are always somewhere in the **upper half of the address space**.  This is
230for the convenience of sanitizer runtimes, which need to reserve some lower
231fraction of the address space.  This behavior will change in the future so
232there is some way to support the sanitizer cases but other processes will
233get fully random placement to maximize the benefits of ASLR.
234
235***
236
237## The **processargs** protocol
238
239[`<zircon/processargs.h>`](../system/public/zircon/processargs.h) defines
240the protocol for the *bootstrap message* sent on the *bootstrap channel* by
241the program loader.  When a process starts up, it has a handle to this
242bootstrap channel and it has access to [system calls](syscalls.md) via
243the [vDSO](vdso.md).  The process has only this one handle and so it can
244see only global system information and its own memory until it gets more
245information and handles via the bootstrap channel.
246
247The `processargs` protocol is a one-way protocol for messages sent on the
248bootstrap channel.  The new process is never expected to write back onto
249the channel.  The program loader usually sends its messages and then closes
250its end of the channel before the new process has even started.  These
251messages must communicate everything a new process will ever need, but the
252code that receives and decodes messages in this format must run in a very
253constrained environment.  Heap allocation is impossible and nontrivial
254library facilities may not be available.
255
256See the [header file](../system/public/zircon/processargs.h) for full
257details of the message format.  It's anticipated that this ad hoc protocol
258will be replaced with a formal IDL-based protocol eventually, but the
259format will be kept simple enough to be decoded by simple hand-written
260code.
261
262A bootstrap message conveys:
263
264 * a list of initial [handles](handles.md)
265 * a 32-bit *handle info entry* corresponding to each handle
266 * a list of name strings that a *handle info entry* can refer to
267 * a list of argument strings (to become `argv[]` in a C/C++ program)
268 * a list of environment strings (to become `environ[]` in a C/C++ program)
269
270{#handle-info-entry}
271The handles serve many purposes, indicated by the *handle info entry* type:
272
273 * essential handles for the process to make [system calls](syscalls.md):
274   [process](objects/process.md), [VMAR](objects/vm_address_region.md),
275   [thread](objects/thread.md), [job](objects/job.md)
276 * [channel](objects/channel.md) to the [loader service](#the-loader-service)
277 * [vDSO](vdso.md) [VMO](objects/vm_object.md)
278 * filesystem-related handles: current directory, file descriptors, name
279   space bindings (these encode an index into the list of name strings)
280 * special handles for system processes:
281   [resource](objects/resource.md), [VMO](objects/vm_object.md)
282 * other types used for higher-layer or private protocol purposes
283
284Most of these are just passed through by the program loader,
285which does not need to know what they're for.
286
287## The **loader service**
288
289In dynamic linking systems, an executable file refers to and uses at
290runtime additional files containing shared libraries and plugins.  The
291dynamic linker is loaded as an [*ELF interpreter*](#PT_INTERP) and is
292responsible getting access to all these additional files to complete
293dynamic linking before the main program's entry point gets control.
294
295All of Zircon's standard userspace uses dynamic linking, down to the very
296first process loaded by [`userboot`](userboot.md).  Device drivers and
297filesystems are implemented by userspace programs loaded this way.  So
298program loading cannot be defined in terms of higher-layer abstractions
299such as a filesystem paradigm,
300as
301[traditional systems have done](#background_traditional-elf-program-loading).
302Instead, program loading is based only on [VMOs](objects/vm_object.md) and
303a simple [channel](objects/channel.md)-based protocol.
304
305This *loader service* protocol is how a dynamic linker acquires VMOs
306representing the additional files it needs to load as shared libraries.
307
308This is a simple RPC protocol, defined in
309[`<zircon/processargs.h>`](../system/public/zircon/processargs.h).
310As with [the `processargs` protocol](#the-processargs-protocol),
311it's anticipated that this ad hoc protocol will be replaced with a formal
312IDL-based protocol eventually, but the format will be kept simple enough to
313be decoded by simple hand-written code.  The code sending loader service
314requests and receiving their replies during dynamic linker startup may
315not have access to nontrivial library facilities.
316
317An ELF interpreter receives a channel handle for its loader service in its
318`processargs` bootstrap message, identified by the *handle info entry*
319`PA_HND(PA_LDSVC_LOADER, 0)`.  All requests are synchronous RPCs made
320with [**channel_call**()](syscalls/channel_call.md).  Both requests and
321replies start with the `zx_loader_svc_msg_t` header; some contain
322additional data; some contain a VMO handle.  Request opcodes are:
323
324 * `LOADER_SVC_OP_LOAD_SCRIPT_INTERP`: *string* -> *VMO handle*
325
326   The program loader sends the *script interpreter name* from
327   a [`#!` script](#hashbang) and gets back a VMO to execute in place of
328   the script.
329
330 * `LOADER_SVC_OP_LOAD_OBJECT`: *string* -> *VMO handle*
331
332   The dynamic linker sends the name of an *object* (shared library or
333   plugin) and gets back a VMO handle containing the file.
334
335 * `LOADER_SVC_OP_CONFIG` : *string* -> `reply ignored`
336
337   The dynamic linker sends a string identifying its *load configuration*.
338   This is intended to affect how later `LOADER_SVC_OP_LOAD_OBJECT`
339   requests decide what particular implementation file to supply for a
340   given name.
341
342 * `LOADER_SVC_OP_DEBUG_PRINT`: *string* -> `reply ignored`
343
344   This is a simple ad hoc logging facility intended for debugging the
345   dynamic linker and early program startup issues.  It's convenient
346   because the early startup code is using the loader service but doesn't
347   have access to many other handles or complex facilities yet.  This will
348   be replaced in the future with some simple-to-use logging facility that
349   does not go through the loader service.
350
351 * `LOADER_SVC_OP_LOAD_DEBUG_CONFIG`: *string* -> *VMO handle*
352
353   **This is intended to be a developer-oriented feature and might not
354   ordinarily be available in production runs.**
355
356   The program runtime sends a string naming a *debug configuration* of
357   some kind and gets back a VMO to read configuration data from.  The
358   sanitizer runtimes use this to allow large options text to be stored in
359   a file rather than passed directly in environment strings.
360
361 * `LOADER_SVC_OP_PUBLISH_DATA_SINK`: *string*, *VMO handle* -> `reply ignored`
362
363   **This is intended to be a developer-oriented feature and might not
364   ordinarily be available in production runs.**
365
366   The program runtime sends a string naming a *data sink* and transfers
367   the sole handle to a VMO it wants published there.  The *data sink*
368   string identifies a type of data, and the VMO's object name can
369   specifically identify the data set in this VMO.  The client must
370   transfer the only handle to the VMO (which prevents the VMO being
371   resized without the receiver's knowledge), but it might still have the
372   VMO mapped in and continue to write data to it.  Code instrumentation
373   runtimes use this to deliver large binary trace results.
374
375## Zircon's standard ELF dynamic linker
376
377The ELF conventions described above and
378the [`processargs`](#the-processargs-protocol)
379and [loader service](#the-loader-service) protocols are the permanent
380system ABI for program loading.  Programs can use any implementation of a
381machine code executable that meets the basic ELF format conventions.  The
382implementation can use the [vDSO](vdso.md) [system call](syscalls.md)
383ABI, the `processargs` data, and the loader service facilities as it sees
384fit.  The exact details of what handles and data they will receive via
385these protocols depend on the higher-layer program environment.  Zircon's
386system processes use an ELF interpreter that implements basic ELF dynamic
387linking, and a simple implementation of the loader service.
388
389Zircon's standard C library and dynamic linker have
390a [unified implementation](../third_party/ulib/musl/) originally derived
391from [`musl`](http://www.musl-libc.org/).  It's identified by the
392`PT_INTERP` string `ld.so.1`.  It uses the `DT_NEEDED` strings naming
393shared libraries as [loader service](#the-loader-service) *object* names.
394
395The simple loader service maps requests into filesystem access:
396 * *script interpreter* and *debug configuration* names must start with `/`
397   and are used as absolute file names.
398 * *data sink* names become subdirectories in `/tmp`, and each VMO
399   published becomes a file in that subdirectory with the VMO's object name
400 * *object* names are searched for as files in system `lib/` directories.
401 * *load configuration* strings are taken as a subdirectory name,
402   optionally preceded by a `!` character.  Subdirectories by that name in
403   system `lib/` directories searched are searched before `lib/` itself.
404   If there was a `!` prefix, *only* those subdirectories are searched.
405   For example, sanitizer runtimes use `asan` because that instrumentation
406   is compatible with uninstrumented library code, but `!dfsan` because
407   that instrumentation requires that all code in the process be
408   instrumented.
409
410A version of the standard runtime instrumented with
411LLVM [AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html)
412is identified by the `PT_INTERP` string `asan/ld.so.1`.  This version sends
413the *load configuration* string `asan` before loading shared libraries.
414When [SanitizerCoverage](https://clang.llvm.org/docs/SanitizerCoverage.html)
415is enabled, it publishes a VMO to the *data sink* name `sancov` and uses a
416VMO name including the process KOID.
417