1# Fuchsia Tracing System Design 2 3This document describes a mechanism for collecting diagnostic trace information 4from running applications on the Fuchsia operating system. 5 6## Overview 7 8The purpose of Fuchsia tracing is to provide a means to collect, aggregate, 9and visualize diagnostic tracing information from Fuchsia user space 10processes and from the Zircon kernel. 11 12## Design Goals 13 14- Lightweight Instrumentation 15 - Enabling tracing should not significantly affect the performance of running 16 applications. Trace providers should not need to acquire locks, make 17 syscalls, or perform dynamic memory allocation required between the time 18 when tracing is activated and when it is disabled. 19- Compact Memory Footprint 20 - Trace records are stored compactly in memory so that buffers can remain 21 small but hold many events. 22- Crash-proof 23 - It is possible to collect partial traces even if trace providers 24 terminate (normally or abnormally) during trace collection. 25- Flexible and Universal 26 - Can trace code written in any language given a suitable implementation of 27 the tracing library. 28 - Trace points can be manually inserted by the developer or generated 29 dynamically by tools. 30- General 31 - The trace format defines general purpose record types which support a 32 wide range of data collection needs. 33 - Trace data can be transformed into other formats for visualization using 34 tools such as Catapult or TraceViz. 35- Extensible 36 - New record types can be added in the future without breaking existing tools. 37- Robust 38 - Enabling tracing does not compromise the integrity of running components 39 or expose them to manipulation by tracing clients. 40 41## Moving Parts 42 43### Trace Manager 44 45The trace manager is a system service which coordinates registration of 46trace providers. It ensures that tracing proceeds in an orderly manner 47and isolates components which offer trace providers from trace clients. 48 49The trace manager implements two FIDL interfaces: 50 51- `TraceController`: Provides trace clients with the ability to enumerate 52 trace providers and collect trace data. 53- `TraceRegistry`: Provides trace providers with the ability to register 54 themselves at runtime so that they can be discovered by the tracing system. 55 56TODO: The `TraceRegistry` should be replaced by a `Namespace` based approach 57to publish trace providers from components. 58 59### Trace Providers 60 61Components which can be traced or offer tracing information to the system 62implement the `TraceProvider` FIDL interface and register it with the 63`TraceRegistry`. Once registered, they will receive messages whenever 64tracing is started or stopped and will have the opportunity to provide 65trace data encoded in the [Fuchsia Trace Format](trace_format.md). 66 67#### Kernel Trace Provider 68 69The `ktrace_provider` program ingests kernel trace events and publishes 70trace records. This allows kernel trace data to be captured and visualized 71together with userspace trace data. 72 73### Trace Client 74 75The `trace` program offers command-line access to tracing functionality 76for developers. It also supports converting Fuchsia trace archives into 77other formats, such as Catapult JSON records which can be visualized 78using Catapult (aka. chrome:://tracing). 79 80Trace information can also be collected programmatically by using the 81`TraceController` FIDL interface directly. 82 83## Libraries 84 85### libtrace: The C and C++ Trace Event Library 86 87Provides macros and inline functions for instrumenting C and C++ programs 88with trace points for capturing trace data during trace execution. 89 90See `<trace/event.h>`. 91 92#### C++ Example 93 94This example records trace events marking the beginning and end of the 95execution of the "DoSomething" function together with its parameters. 96 97```c++ 98#include <trace/event.h> 99 100void DoSomething(int a, std::string b) { 101 TRACE_DURATION("example", "DoSomething", "a", a, "b", b); 102 103 // Do something 104} 105``` 106 107#### C Example 108 109This example records trace events marking the beginning and end of the 110execution of the "DoSomething" function together with its parameters. 111 112Unlike in C++, it is necessary to specify the type of each trace argument. 113In C++ such annotations are supported but are optional since the compiler 114can infer the type itself. 115 116```c 117#include <trace/event.h> 118 119void DoSomething(int a, const char* b) { 120 TRACE_DURATION("example", "DoSomething", "a", TA_INT32(a), "b", TA_STRING(b)); 121 122 // Do something 123} 124``` 125 126#### Suppressing Tracing Within a Compilation Unit 127 128To completely suppress tracing within a compilation unit, define the NTRACE 129macro prior to including the trace headers. This causes the macros to 130behave as if tracing is always disabled so they will not produce trace 131records and they will have zero runtime overhead. 132 133```c 134#define NTRACE 135#include <trace/event.h> 136 137void DoSomething(void) { 138 // This will never produce trace records because the NTRACE macro was 139 // defined above. 140 TRACE_DURATION("example", "DoSomething"); 141} 142``` 143 144### libtrace-provider: Trace Provider Library 145 146This library provides C and C++ functions to register a process's trace 147engine with the Fuchsia tracing system. For tracing to work in your process, 148you must initialize the trace provider at some point during its execution 149(or implement your own trace handler to register the trace engine some 150other way). 151 152The trace provider requires an asynchronous dispatcher to operate. 153 154#### C++ Example 155 156```c++ 157#include <lib/async-loop/cpp/loop.h> 158#include <trace-provider/provider.h> 159 160int main(int argc, char** argv) { 161 // Create a message loop. 162 async::Loop loop(&kAsyncLoopConfigNoAttachToThread); 163 164 // Start a thread for the loop to run on. 165 // We could instead use async_loop_run() to run on the current thread. 166 zx_status_t status = loop.StartThread(); 167 if (status != ZX_OK) exit(1); 168 169 // Create the trace provider. 170 trace::TraceProvider trace_provider(loop.dispatcher()); 171 172 // Do something... 173 174 // The loop and trace provider will shut down once the scope exits. 175 return 0; 176} 177``` 178 179#### C Example 180 181```c 182#include <lib/async-loop/cpp/loop.h> 183#include <trace-provider/provider.h> 184 185int main(int argc, char** argv) { 186 zx_status_t status; 187 async_loop_t* loop; 188 trace_provider_t* trace_provider; 189 190 // Create a message loop. 191 status = async_loop_create(&kAsyncLoopConfigNoAttachToThread, &loop); 192 if (status != ZX_OK) exit(1); 193 194 // Start a thread for the loop to run on. 195 // We could instead use async_loop_run() to run on the current thread. 196 status = async_loop_start_thread(loop, "loop", NULL); 197 if (status != ZX_OK) exit(1); 198 199 // Create the trace provider. 200 async_dispatcher_t* dispatcher = async_loop_get_dispatcher(loop); 201 trace_provider = trace_provider_create(dispatcher); 202 if (!trace_provider) exit(1); 203 204 // Do something... 205 206 // Tear down. 207 trace_provider_destroy(trace_provider); 208 async_loop_shutdown(loop); 209 return 0; 210} 211``` 212 213### libtrace-reader: Trace Reader Library 214 215Provides C++ types and functions for reading trace archives. 216 217See `<trace-reader/reader.h>`. 218 219## Transport Protocol 220 221When the developer initiates tracing, the trace manager asks all relevant 222trace providers to start tracing and provides each one with a trace buffer 223VMO into which they should write their trace records. 224 225While a trace is running, the trace manager continues watching for newly 226registered trace providers and activates them if needed. 227 228What happens when a trace provider's trace buffer becomes full while a trace 229is running depends on the buffering mode. 230See [Buffering Modes](#Buffering Modes) below. 231 232When tracing finishes, the trace manager asks all of the active trace providers 233to stop tracing then waits a short time for them to acknowledge that they 234have finished writing out their trace events. 235 236The trace manager then reads and validates trace data written into the trace 237buffer VMOs by trace providers and creates a trace archive. The trace manager 238can often recover partial data even when trace providers terminate abnormally 239as long as they managed to store some data into their trace buffers. 240Note that in streaming mode the trace manager only needs to save the 241currently active rolling buffer. 242See [Buffering Modes](#Buffering Modes) below. 243 244The trace manager delivers the resulting trace archive to its client through 245a socket. This data is guaranteed to be well-formed according to the 246Fuchsia trace format (but it may be nonsensical if trace providers 247deliberately emit garbage data). 248 249These are some important invariants of the transport protocol: 250- There are no synchronization points between the trace manager and trace 251 providers other than starting or stopping collection. 252- Trace providers (components being traced) only ever write to trace buffers; 253 they never read from them. 254- The trace manager only ever reads from trace buffers; it never writes to them. 255- Trace clients never see the original trace buffers; they receive trace 256 archives over a socket from the trace manager. This protects trace providers 257 from manipulation by trace clients. 258 259## Buffering Modes 260 261There are three buffering modes: oneshot, circular, and streaming. 262They specify different behaviors when the trace buffer fills. 263 264Note that in all cases trace provider behavior is independent of each other. 265Other trace providers can continue to record trace events into their own 266buffers as usual until the trace stops, even as one provider's buffer fills. 267This may result in a partially incomplete trace. 268 269### Oneshot 270 271If the buffer becomes full then that trace provider will stop recording events. 272 273### Circular 274 275The trace buffer is effectively divided into three pieces: the "durable" buffer 276and two "rolling" buffers. The durable buffer is for records important enough 277that we don't want to risk dropping them. These include records for thread and 278string references. 279 280Tracing begins by writing to the first rolling buffer. Once one rolling buffer 281fills tracing continues by writing to the other one. 282 283If the durable buffer fills then tracing for the provider stops. Tracing in 284other providers continues as usual. 285 286### Streaming 287 288The trace buffer is effectively divided into three pieces: the "durable" buffer 289and two "rolling" buffers. The durable buffer is for records important enough 290that we don't want to risk dropping them. These include records for thread and 291string references. 292 293Tracing begins by writing to the first rolling buffer. Once one rolling buffer 294fills tracing continues by writing to the other one, if it is available, and 295notifying the trace manager that the buffer is full. If the other rolling 296buffer is not available, then records are dropped until it becomes available. 297The other rolling buffer is unavailable between the point when it filled and 298when the manager reports back that the buffer's contents have been saved. 299 300Whether records get dropped depends on the rate at which records are created 301vs the rate at which the trace manager can save the buffers. This can result 302in a partially incomplete trace, but is less important than perturbing program 303performance by waiting for a buffer to be saved. 304 305If the durable buffer fills then tracing for the provider stops. Tracing in 306other providers continues as usual. 307 308## Trace Manager/Provider FIFO Protocol 309 310Notification of trace provider startup and shutdown is done via a FIFO, 311the handle of which is passed from the trace manager to each trace provider 312as part of the initial "start tracing" request. The form of each message is 313defined in `<trace-provider/provider.h>`. Packets are fixed size with the 314following format: 315 316```cpp 317typedef struct trace_provider_packet { 318 // One of TRACE_PROVIDER_*. 319 uint16_t request; 320 321 // For alignment and future concerns, must be zero. 322 uint16_t reserved; 323 324 // Optional data for the request. 325 // The contents depend on the request. 326 // If unused they must be passed as zero. 327 uint32_t data32; 328 uint64_t data64; 329} trace_provider_packet_t; 330``` 331 332### FIFO Packets 333 334The following packets are defined: 335 336**TRACE_PROVIDER_STARTED** 337 338Sent from trace providers to the trace manager. 339Notify the trace manager that the provider has received the "start tracing" 340request and is starting to collect trace data. 341The `data32` field of the packet contains the version number of the FIFO 342protocol that the provider is using. The value is specified by 343**TRACE_PROVIDER_FIFO_PROTOCOL_VERSION** in `<trace-provider/provider.h>`. 344If the trace manager sees a protocol it doesn't understand it will close 345its side of the FIFO and ignore all trace data from the provider. 346 347**TRACE_PROVIDER_SAVE_BUFFER** 348 349Sent from trace providers to the trace manager in streaming mode. 350Notify the trace manager that a buffer is full and needs saving. 351This request is only used in streaming mode. 352The `data32` field contains the "wrap count" which is the number of times 353writing has switched from one buffer to the next. The buffer that needs saving 354is `(data32 & 1)`. 355The `data64` field contains the offset of the end of data written to the 356"durable" buffer. 357 358Only one buffer save request may be sent at a time. The next one cannot be 359sent until **TRACE_PROVIDER_BUFFER_SAVED** is received acknowledging the 360previous request. 361 362**TRACE_PROVIDER_BUFFER_SAVED** 363 364Sent from the trace manager to trace providers in streaming mode. 365Notify the trace provider that the requested buffer has been saved. 366The `data32` and `data64` fields must have the same values from the 367originating **TRACE_PROVIDER_SAVE_BUFFER** request. 368