1# Audio Driver Streaming Interface
2
3This document describes the audio streaming interface exposed by audio drivers
4in Zircon.  It is meant to serve as a reference for both users and
5driver-authors, and to unambiguously define the interface contract which drivers
6must implement and users must follow.
7
8## Overview
9
10Audio streams are device nodes published by driver services intended to be used
11by applications in order to capture and/or render audio on a Zircon device.
12Each stream in the system (input or output) represents a stream of digital audio
13information which may be either received or transmitted by device.  Streams are
14dynamic and may created or destroyed by the system at any time.  Which streams
15exist at any given point in time, and what controls their lifecycles are
16considered to be issues of audio policy and codec management and are not discussed
17in this document.  Additionally, the information present in audio outputs
18streams is exclusive to the application owner of the stream.  Mixing of audio is
19_not_ a service provided by the audio stream interface.
20
21> TODO: extend this interface to support the concept of low-latency hardware
22> mixers.
23
24### Basic Vocabulary
25
26Term | Definition
27-----|-----------
28Sample | A representation of the sound rendered by a single speaker, or captured by a single microphone, at a single instant in time.
29LPCM | Linear pulse code modulation.  The specific representation of audio samples present in all Zircon uncompressed audio streams.  LPCM audio samples are representations of the amplitude of the audio signal at an instant in time where the numeric values of the encoded audio are linearly distributed across the amplitude levels of the rendering or capture device.  This is in contrast to A-law and μ-law encodings which have non-linear mappings from numeric value to amplitude level.
30Channel | Within an audio stream, the subset of information which will be rendered by a single speaker, or which was captured by a single microphone in a stream.
31Frame | A set of audio samples for every channel of a audio stream captured/rendered at a single instant in time.
32Frame Rate | a.k.a. "Sample Rate".  The rate (in Hz) at which audio frames are produced or consumed.  Common sample rates include 44.1 KHz, 48 KHz, 96 KHz, and so on.
33Client or User or Application | These terms are used interchangeably in this document. They refer to modules that use these interfaces to communicate with an audio driver/device.
34
35> TODO: do we need to extend this interface to support non-linear audio sample
36> encodings?  This may be important for telephony oriented microphones which
37> deliver μ-law encoded samples.
38
39### Basic Operation
40
41Communication with an audio stream device is performed using messages sent over
42a [channel](../objects/channel.md).  Applications open the device node for a
43stream and obtain a channel by issuing an ioctl.  After obtaining the channel,
44the device node may be closed.  All subsequent communication with the stream
45will occur using channels.
46
47The "stream" channel will then be used for most command and control tasks,
48including:
49 * Capability interrogation
50 * Format negotiation
51 * Hardware gain control
52 * Determining outboard latency
53 * Plug detection notification
54 * Access control capability detection and signalling
55 * Policy level stream purpose indication/Stream Association (TBD)
56
57> TODO: Should plug/unplug detection be done by sending notifications over the
58> stream channel (as it is today), or by publishing/unpublishing the device
59> nodes (and closing all channels in the case of unpublished channels)?
60
61In order to actually send or receive audio information on the stream, the
62specific format to be used must first be set.  The response to a successful
63SetFormat operation will contain a new "ring-buffer" channel.  The ring-buffer
64channel may be used to request a shared buffer from the stream (delivered in the
65form of a [VMO](../objects/vm_object.md)) which may be mapped into the address
66space of the application and used to send or receive audio data as appropriate.
67Generally, the operations conducted over the ring buffer channel include...
68 * Requesting a shared buffer
69 * Starting and Stopping stream playback/capture
70 * Receiving notifications of playback/capture progress
71 * Receiving notifications of error conditions such as HW FIFO under/overflow,
72   bus transaction failure, etc...
73 * Receiving clock recovery information in the case that the audio output clock
74   is based on a different oscillator than the oscillator which backs
75   [ZX_CLOCK_MONOTONIC](../syscalls/clock_get.md)
76
77## Operational Details
78
79### Protocol definition
80
81In order to use the C API definitions of the
82[audio](../../system/public/zircon/device/audio.h) protocol, applications and
83drivers simply say
84```C
85#include <device/audio.h>
86```
87
88### Device nodes
89
90Audio stream device nodes **must** be published by drivers using the protocol
91preprocessor symbol given in the table below.  This will cause stream device
92nodes to be published in the locations given in the table.  Applications can
93monitor these directories in order to discover new streams as they are published
94by the drivers.
95
96Stream Type | Protocol | Location
97------------|----------|---------
98Input | `ZX_PROTOCOL_AUDIO_INPUT` | /dev/class/audio-input
99Output | `ZX_PROTOCOL_AUDIO_OUTPUT` | /dev/class/audio-output
100
101### Establishing the stream channel
102
103After opening the device node, client applications may obtain a stream channel
104for subsequent communication using the `AUDIO_IOCTL_GET_CHANNEL` ioctl.  For
105example...
106```C
107zx_handle_t OpenStream(const char* dev_node_path) {
108    zx_handle_t ret = ZX_HANDLE_INVALID;
109    int fd = open(dev_node_path, O_RDONLY);
110
111    if (fd < 0) {
112        LOG("Failed to open \"%s\" (res %d)\n", dev_node_path, fd);
113        return ret;
114    }
115
116    ssize_t res = fdio_ioctl(fd, AUDIO_IOCTL_GET_CHANNEL,
117                             nullptr, 0,
118                             &ret, sizeof(ret));
119    close(fd);
120
121    if (res != ZX_OK)
122        printf("Failed to obtain channel (res %zd)\n", res);
123
124    return ret;
125}
126```
127
128### Client side termination of the stream channel
129
130Clients **may** terminate the connection to the stream at any time simply by
131calling [zx_handle_close(...)](../syscalls/handle_close.md) on the stream
132channel.  Drivers **must** close any active ring-buffer channels established
133using this stream channel and **must** make every attempt to gracefully quiesce
134any on-going streaming operations in the process.
135
136### Sending and receiving messages on the stream and ring-buffer channels
137
138All of the messages and message payloads which may be sent or received over
139stream and ring buffer channels are defined in the
140[audio](../../system/public/zircon/device/audio.h) protocol header.  Messages
141may be sent to the driver using the
142[zx_channel_write(...)](../syscalls/channel_write.md) syscall.  If a response is
143expected, it may be read using the
144[zx_channel_read(...)](../syscalls/channel_read.md) syscall.  Best practice,
145however, is to queue packets for your [channel(s)](../objects/channel.md)
146[port](../objects/port.md) using the
147[zx_port_queue(...)](../syscalls/port_queue.md) syscall, and use the
148[zx_port_wait(...)](../syscalls/port_wait.md) syscall to determine when your set
149of channels have messages (either expected responses or asynchronous
150notifications) to be read.
151
152All messages either sent or received over stream an ring buffer channels are
153prefaced with an `audio_cmd_hdr_t` structure which contains a 32-bit
154transaction ID and an `audio_cmd_hdr_t` enumeration value indicating the
155specific command being requested by the application, the specific command being
156responded to by the driver, or the asynchronous notification being delivered by
157the driver to the application.
158
159When sending a command to the driver, applications **must** place a transaction
160ID in the header's `transaction_id` field which is not equal to
161`AUDIO_INVALID_TRANSACTION_ID`.  If a response to a command needs to be sent by
162the driver to the application, the driver **must** use the transaction ID and
163`audio_cmd_t` values sent by the client during the request.  When sending
164asynchronous notification to the application, the driver **must** use
165`AUDIO_INVALID_TRANSACTION_ID` as the transaction ID for the message.
166Transaction IDs may be used by clients for whatever purpose they desire, however
167if the IDs are kept unique across all transactions in-flight, the
168[zx_channel_call(...)](../syscalls/channel_call.md) may be used to implement a
169simple synchronous calling interface.
170
171### Validation requirements
172
173All drivers **must** validate requests and enforce the protocol described above.
174In case of any violation, drivers **should** immediately quiesce their hardware
175and **must** close the channel, terminating any operations which happen to be in
176flight at the time.  Additionally, they **may** log a message to a central
177logging service to assist in application developers in debugging the cause of
178the protocol violation.  Examples of protocol violation include...
179 * Using `AUDIO_INVALID_TRANSACTION_ID` as the value of
180   `message.hdr.transaction_id`
181 * Using a value not present in the `audio_cmd_t` enumeration as the value of
182   `message.hdr.cmd`
183 * Supplying a payload whose size does not match the size of the request
184   payload for a given command.
185
186## Format Negotiation
187
188### Sample Formats
189
190Sample formats are described using the `audio_sample_format_t` type.  It is a
191bitfield style enumeration which describes either the numeric encoding of the
192uncompressed LPCM audio samples as they reside in memory, or indicating that the
193audio stream consists of a compressed bitstream instead of uncompressed LPCM
194samples.  Refer to the [audio](../../system/public/zircon/device/audio.h)
195protocol header for exact symbol definitions.
196
197Notes
198 * With the exception of FORMAT_BITSTREAM, samples are always assumed to
199   use linear PCM encoding.  BITSTREAM is used for transporting compressed
200   audio encodings (such as AC3, DTS, and so on) over a digital interconnect
201   to a decoder device somewhere outside of the system.
202 * Be default, multi-byte sample formats are assumed to use host-endianness.
203   If the INVERT_ENDIAN flag is set on the format, the format uses the
204   opposite of host endianness.  eg. A 16 bit little endian PCM audio format
205   would have the INVERT_ENDIAN flag set on it in a when used on a big endian
206   host.  The INVERT_ENDIAN flag has no effect on COMPRESSED, 8BIT or FLOAT
207   encodings.
208 * The 32BIT_FLOAT encoding uses specifically the IEE 754 floating point
209   representation.
210 * Be default, non-floating point PCM encodings are assumed expressed using
211   2s compliment signed integers.  eg. the bit values for a 16 bit PCM sample
212   format would range from [0x8000, 0x7FFF] with 0x0000 representing zero
213   speaker deflection.  If the UNSIGNED flag is set on the format, the bit
214   values would range from [0x0000, 0xFFFF] with 0x8000 representing zero
215   deflection.
216 * When used to set formats, exactly one non-flag bit **must** be set.
217 * When used to describe supported formats, any number of non-flag bits **may**
218   be set.  Flags (when present) apply to all of the relevant non-flag bits in
219   the bitfield.  eg.  If a stream supports COMPRESSED, 16BIT and 32BIT_FLOAT,
220   and the UNSIGNED bit is set, it applies only to the 16BIT format.
221 * When encoding a smaller sample size in a larger container (eg 20 or 24bit
222   in 32), the most significant bits of the 32 bit container are used while
223   the least significant bits should be zero.  eg. a 20 bit sample would be
224   mapped onto the range [12,32] of the 32 bit container.
225
226> TODO: can we make the claim that the LSBs will be ignored, or do we have to
227> require that they be zero?
228
229> TODO: describe what 20-bit packed audio looks like in memory.  Does it need to
230> have an even number of channels in the overall format?  Should we strike it
231> from this list if we cannot find a piece of hardware which demands this format
232> in memory?
233
234### Enumeration of supported formats
235
236In order to determine the formats supported by a given audio stream,
237applications send an `AUDIO_STREAM_CMD_GET_FORMATS` message over the stream
238channel.  No additional parameters are required.  Drivers **must** respond to
239this request using one or more `audio_stream_cmd_get_formats_resp_t` messages,
240even if only to report that there are no formats currently supported.
241
242### Range structures
243
244Drivers indicate support for formats by sending messages containing zero or more
245`audio_stream_format_range_t` structures.  Each structure contains field which
246describe...
247 * A bitmask of supported sample formats.
248 * A minimum and maximum number of channels.
249 * A set of frame rates.
250
251A single range structure indicates support for each of the combinations of the
252three different sets of values (sample formats, channel counts, and frame
253rates).  For example, if a range structure indicated support for...
254 * 16 bit signed LPCM samples
255 * 48000, and 44100 Hz frame rates
256 * 1 and 2 channels
257
258Then the fully expanded set of supported formats indicated by the range
259structure would be...
260 * Stereo 16-bit 48 KHz audio
261 * Stereo 16-bit 44.1 KHz audio
262 * Mono 16-bit 48 KHz audio
263 * Mono 16-bit 44.1 KHz audio
264
265See the Sample Formats section (above) for a description of how sample formats
266are encoded in the `sample_formats` member of a range structure.
267
268Supported channel counts are indicated using a pair of min/max channels fields
269which indicate an exclusive range of channel counts which apply to this range.
270For example, a min/max channels range of [1, 4] would indicate that this audio
271stream supports 1, 2, 3 or 4 channels.  A range of [2, 2] would indicate that
272this audio stream supports only stereo audio.
273
274Supported frame rates are signalled similarly to channel counts using a pair of
275min/max frame per second fields along with a flags field.  While the min/max
276values provide an inclusive range of frame rates, the flags determine how to
277interpret this range.  Currently defined flags include...
278Flag | Definition
279-----|-----------
280`ASF_RANGE_FLAG_FPS_CONTINUOUS` | The frame rate range is continuous.  All frame rates in the range [min, max] are valid.
281`ASF_RANGE_FLAG_FPS_48000_FAMILY` | The frame rate range includes the members of the 48 KHz family which exist in the range [min, max]
282`ASF_RANGE_FLAG_FPS_44100_FAMILY` | The frame rate range includes the members of the 44.1 KHz family which exist in the range [min, max]
283
284So, conceptually, the valid frame rates are the union of the sets produced by
285applying each of the flags which are set to the inclusive [min, max] range.  For
286example, if both the 48 KHz and 44.1 KHz were set, and the range given was
287[16000, 47999], then the supported frame rates for this range would be
288 * 16000 Hz
289 * 22050 Hz
290 * 32000 Hz
291 * 44100 Hz
292
293The official members of the 48 KHz and 44.1 KHz families are
294Family | Frame Rates
295-------|------------
296`ASF_RANGE_FLAG_FPS_48000_FAMILY` | 8000 16000 32000 48000 96000 192000 384000 768000
297`ASF_RANGE_FLAG_FPS_44100_FAMILY` | 11025 22050 44100 88200 176400
298
299Drivers **must** set at least one of the flags, or else the set of supported
300frame rates is empty and there was no reason to transmit this range structure.
301Also note that the set of valid frame rates is the union of the frame rates
302produce by applying each of the set flags.  This implies that there is never any
303good reason to set the `ASF_RANGE_FLAG_FPS_CONTINUOUS` in conjunction with any
304of the other flags.  While it is technically legal to do so, drivers **should**
305avoid this behavior.
306
307### Transporting range structures
308
309Range structures are transmitted from drivers to applications using the
310`audio_stream_cmd_get_formats_resp_t` message.  Because of the large number of
311formats which may be supported by a stream, drivers may need to send multiple
312messages in order to enumerate all available modes.  Messages include the
313following fields.
314 * A standard `audio_cmd_hdr_t` header.  **All** messages involved in the
315   response to an application request **must** use the transaction ID of the
316   original request, and **must** set the cmd field of the header to
317   `AUDIO_STREAM_CMD_GET_FORMATS`.
318 * A `format_range_count` field.  This indicates the total number of format
319   range structures which will be sent in this response to the application.
320   This number **must** be present in **all** messages involved in the response,
321   and **must not** change from message to message.
322 * A `first_format_range_ndx` field indicating the zero-based index of the first
323   format range being specified in this particular message.  See below for
324   details.
325 * An array of `audio_stream_cmd_get_formats_resp_t` structures which is at most
326   `AUDIO_STREAM_CMD_GET_FORMATS_MAX_RANGES_PER_RESPONSE` elements long.
327
328Drivers **must**
329 * Always transmit all of the available audio format ranges.
330 * Always transmit the available audio format ranges in ascending index order.
331 * Always pack as many ranges as possible in the fixed size message structure.
332 * Never overlap index regions or leave gaps.
333
334Given these requirements, if the maximum number of ranges per response were 15,
335and a driver needed to send 35 ranges in response to an application's request,
336then 3 messages in total would be needed, and the `format_range_count` and
337`first_format_range_ndx` fields for each message would be as follows.
338Msg # | `format_range_count` | `first_format_range_ndx`
339------|----------------------|-------------------------
3401 | 35 | 0
3412 | 35 | 15
3423 | 35 | 30
343
344`first_format_range_ndx` **must** never be greater than `format_range_count`,
345however `format_range_count` **may** be zero if an audio stream currently
346supports no formats.  The total number of `audio_stream_format_range_t`
347structures in an `audio_stream_cmd_get_formats_resp_t` message is given by the
348formula
349
350```C
351valid_ranges = MIN(AUDIO_STREAM_CMD_GET_FORMATS_MAX_RANGES_PER_RESPONSE,
352                   msg.format_range_count - msg.first_format_range_ndx);
353```
354
355Drivers **may** choose to always send an entire
356`audio_stream_cmd_get_formats_resp_t` message, or to send a truncated message
357which ends after the last valid range structure in the `format_ranges` array.
358Applications **must** be prepared to receive up to
359`sizeof(audio_stream_cmd_get_formats_resp_t)` bytes for each message, but also
360accept messages as short as
361`offsetof(audio_stream_cmd_get_formats_resp_t, format_ranges)`
362
363> TODO: how do devices signal a change of supported formats (e.g., HDMI hot-plug
364> event)?  Are such devices required to simply remove and republish the device?
365
366> TODO: define how to enumerate supported compressed bitstream formats.
367
368### Setting the desired stream format
369
370In order to select a stream format, applications send an
371`AUDIO_STREAM_CMD_SET_FORMAT` message over the stream channel.  In the message,
372for uncompressed audio streams, the application specifies
373 * The frame rate of the stream in Hz using the `frames_per_second` field (in
374   the case of an uncompressed audio stream).
375 * The number of channels packed into each frame using the `channels` field.
376 * The format of the samples in the frame using the `sample_format` field (see
377   Sample Formats, above)
378
379Success or failure, drivers **must** respond to a request to set format using a
380`audio_stream_cmd_set_format_resp_t`.
381
382In the case of success, drivers **must** set the `result` field of the response
383to `ZX_OK` and **must** return a new ring buffer channel over which streaming
384operations will be conducted.  If a previous ring buffer channel had been
385established and was still active, the driver **must** close this channel and
386make every attempt to gracefully quiesce any on-going streaming operations in
387the process.
388
389In the case of failure, drivers **must** indicate the cause of failure using the
390`result` field of the message and **must not** simply close the stream channel
391as is done for a generic protocol violation.  Additionally, they **may** choose
392to preserve a pre-existing ring-buffer channel, or to simply close such a
393channel as is mandated for a successful operation.
394
395> TODO: specify how compressed bitstream formats will be set
396
397## Hardware Gain Control
398
399### Hardware gain control capability reporting
400
401In order to determine a stream's gain control capabilities, applications send an
402`AUDIO_STREAM_CMD_GET_GAIN` message over the stream channel.  No parameters
403need to be supplied with this message.  All stream drivers **must** respond to
404this message, regardless of whether or not the stream hardware is capable of any
405gain control.  All gain values are expressed using 32 bit floating point numbers
406expressed in dB.
407
408Drivers respond to this message with values which indicate the current gain
409settings of the stream, as well as the stream's gain control capabilities.
410Current gain settings are expressed using a bool/float tuple indicating if the
411stream is currently muted or not along with the current dB gain of the stream.
412Gain capabilities consist of bool and 3 floats.  The bool indicates whether or
413not the stream can be muted.  The floats give the minimum and maximum gain
414settings, along with the `gain step size`.  The `gain step size` indicates the
415smallest increment with which the gain can be controlled counting from the
416minimum gain value.
417
418For example, an amplifier which has 5 gain steps of 7.5 dB each and a maximum
4190 dB gain would indicate a range of (-30.0, 0.0) and a step size of 7.5.
420Amplifiers capable of functionally continuous gain control **may** encode their
421gain step size as 0.0.
422
423Regardless of mute capabilities, drivers for fixed gain streams **must** report
424their min/max gain as (0.0, 0.0).  The gain step size is meaningless in this
425situation, but drivers **should** report their step size as 0.0.
426
427### Setting hardware gain control levels
428
429In order to change a stream's current gain settings, applications send an
430`AUDIO_STREAM_CMD_SET_GAIN` message over the stream channel.  Two parameters
431are supplied with this message, a set of flags which control the request, and a
432float indicating the dB gain which should be applied to the stream.
433
434Three valid flags are currently defined.
435 * `AUDIO_SGF_MUTE_VALID`.  Set when the application wishes to set the
436   muted/un-muted state of the stream.  Clear if the application wishes to
437   preserve the current muted/un-muted state.
438 * `AUDIO_SGF_GAIN_VALID`.  Set when the application wishes to set the
439   dB gain state of the stream.  Clear if the application wishes to
440   preserve the current gain state.
441 * `AUDIO_SGF_MUTE`.  Indicates the application's desired mute/un-mute state
442   for the stream.  Significant only if `AUDIO_SGF_MUTE_VALID` is also set.
443
444Drivers **must** fail the request with an `ZX_ERR_INVALID_ARGS` result if the
445application's request is incompatible with the stream's capabilities.
446Incompatible requests include.
447 * The requested gain is less than the minimum support gain for the stream.
448 * The requested gain is more than the maximum support gain for the stream.
449 * Mute was requested, but the stream does not support an explicit mute.
450
451Presuming that the request is valid, drivers **should** round the request to the
452nearest supported gain step size.  For example, if a stream can control its
453gain on the range from -60.0 to 0.0 dB, a request to set the gain to -33.3 dB
454will result in a gain of -33.5 being applied.  A request for a gain of -33.2 dB
455will result in a gain of -33.0 being applied.
456
457Applications **may** choose not to receive an acknowledgement of a SET_GAIN
458command by setting the `AUDIO_FLAG_NO_ACK` flag on their command.  No response
459message will be sent to the application, regardless of the success or failure of
460the command.  If an acknowledgement was requested by the application, drivers
461respond with a message indicating the success or failure of the operation as
462well as the current gain/mute status of the system (regardless of whether the
463request was a success).
464
465## Determining external latency
466
467The external latency of an audio stream is defined as the amount of time it
468takes outbound audio to travel from the system's interconnect to the speakers
469themselves, or inbound audio to travel from the microphone to the system's
470interconnect.  For example, if an external codec connected to the system using a
471TDM interconnect introduced a 4 frame delay between reception of a TDM frame and
472rendering of the frame at the speakers themselves, the external delay of this
473audio path would be 4 audio frames.
474
475External delay is reported in the `external_delay_nsec` field of a successful
476`AUDIO_STREAM_CMD_SET_FORMAT` response as a non-negative number of nanoseconds.
477Drivers **should** make their best attempt to accurately report the total of all
478of the sources of delay the driver knows about.  Information about this delay
479can frequently be found in codec data sheets, dynamically reported as properties
480of codecs using protocols such as Intel HDA or the USB Audio specifications, or
481reported by down stream devices using mechanisms such as EDID when using HDMI or
482DisplayPort interconnects.
483
484## Plug Detection
485
486In addition to streams being published/unpublished in response to being
487connected or disconnected to/from their bus, streams may have the ability to be
488plugged or unplugged at any given point in time.  For example, a set of USB
489headphones may publish a new output stream when connected to USB, but choose to
490be "hardwired" from a plug detection standpoint.  A different USB audio adapter
491with a standard 3.5mm phono jack might publish an output stream when connected
492via USB, but choose to change its plugged/unplugged state as the user plugs and
493unplugs an analog device via the 3.5mm jack.
494
495The ability to query the currently plugged or unplugged state of a stream, and
496to register for asynchonous notifications of plug state changes (if supported)
497is handled via plug detection messages.
498
499### AUDIO_STREAM_CMD_PLUG_DETECT
500
501In order to determine a stream's plug detection capabilities and current plug
502state, and to enable or disable for asynchronous plug detection notifications,
503applications send a `AUDIO_STREAM_CMD_PLUG_DETECT` command over the stream
504channel.  Drivers respond with a set of `audio_pd_notify_flags_t`, along with a
505timestamp referenced from `ZX_CLOCK_MONOTONIC` indicating the last time the plug
506state changed.
507
508Three valid plug-detect notification flags (PDNF) are currently defined:
509 * `AUDIO_PDNF_HARDWIRED` Set when the stream hardware is considered to be
510   "hardwired".  In other words, the stream is considered to be connected as
511   long as the device is published.  Examples include a set of built-in
512   speakers, a pair of USB headphones, or a pluggable audio device with no plug
513   detection functionality.
514 * `AUDIO_PDNF_CAN_NOTIFY` Set when the stream hardware is capable of
515   asynchronously detecting that a device's plug state has changed, then sending
516   a notification message if requested by the client.
517 * `AUDIO_PDNF_PLUGGED` Set when the stream hardware considers the stream to be
518   currently in the "plugged-in" state.
519
520Drivers for "hardwired" streams **must not** set the `CAN_NOTIFY` flag, and
521**must** set the `PLUGGED` flag.  In addition, the plug state time of the
522response to the `PLUG_DETECT` message **should** always be set to the time at
523which the stream device was published by the driver.
524
525Applications **may** choose not to receive an acknowledgement of a `PLUG_DETECT`
526command by setting the `AUDIO_FLAG_NO_ACK` flag on their command.  No response
527message will be sent to the application, regardless of the success or failure of
528the command.  The most common use for this would be when an application wanted
529to disable asynchronous plug state detection messages and was not actually
530interested in the current plugged/unplugged state of the stream.
531
532### AUDIO_STREAM_PLUG_DETECT_NOTIFY
533
534Applications may request that streams send them asynchronous notifications of
535plug state changes, using the flags field of the `AUDIO_STREAM_CMD_PLUG_DETECT`
536command.
537
538Two valid flags are currently defined:
539 * `AUDIO_PDF_ENABLE NOTIFICATIONS` Set by clients in order to request that the
540   stream proactively generate `AUDIO_STREAM_PLUG_DETECT_NOTIFY` messages when
541   its plug state changes, if the stream has this capability.
542 * `AUDIO_PDF_DISABLE_NOTIFICATIONS` Set by clients in order to request that NO
543   subsequent `AUDIO_STREAM_PLUG_DETECT_NOTIFY` messages should be sent,
544   regardless of the stream's ability to generate them.
545
546In order to request the current plug state without altering the current
547notification behavior, clients simply set neither `ENABLE` nor `DISABLE` --
548passing either 0, or the value `AUDIO_PDF_NONE`.  Clients **should** not set
549both flags at the same time.  If they do, drivers **must** interpret this to
550mean that the final state of the system should be _disabled_.
551
552Clients which request asynchronous notifications of plug state changes
553**should** always check the `CAN_NOTIFY` flag in the driver response.  Streams
554may be capable of plug detection (i.e. if `HARDWIRED` is not set), yet be
555incapable of detecting plug state changes asynchronously.  Clients may still
556learn of plug state changes, but only by periodically polling the state with
557`PLUG_DETECT` commands.  Drivers for streams which do not set the `CAN_NOTIFY`
558flag are free to ignore enable/disable notification requests from applications,
559and **must** not ever send an `AUDIO_STREAM_PLUG_DETECT_NOTIFY` message. Note
560that even such a driver must always respond to a `AUDIO_STREAM_CMD_PLUG_DETECT`
561message.
562
563## Access control capability detection and signaling
564
565> TODO: specify how this works.  In particular, specify how drivers indicate
566> to applications support for various digital access control mechanisms such as
567> S/PDIF control words and HDCP.
568
569## Stream purpose and association
570
571> TODO: specify how drivers can indicate the general "purpose" of an audio
572> stream in the system (if known), as well as its relationship to other streams
573> (if known).  For example, an embedded target like a phone or a tablet needs to
574> indicate which output stream is the built-in speaker vs. which is the headset
575> jack output.  In addition, it needs to make clear which input stream is the
576> microphone associated with the headset output vs. the builtin speaker.
577
578## Ring-Buffer Channels
579
580### Overview
581
582Generally speaking, client use the ring-buffer channel to establish a shared
583memory buffer, and then start/stop playback/capture of audio stream data.  Once
584started, stream consumption/production is assumed to proceed at the nominal rate
585from the point in time given in a successful response to the start command,
586allowing clients to operate without the need to receive any periodic
587notifications about consumption/production position from the ring buffer itself.
588Note that the ring-buffer will almost certainly have some form of FIFO buffer
589between the memory bus and the audio hardware which causes it to either
590read-ahead in the stream (in the case of playback), or potentially hold onto
591data (in the case of capturing).  In the case of open-loop operation, it is
592important for clients to query the size of this buffer before beginning
593operation so they know how far ahead/behind the stream's nominal inferred
594read/write position they need to stay in order to prevent audio glitching.
595
596Also note that because of the shared buffer nature of the system, and the fact
597that drivers are likely to be DMA-ing directly from this buffer to hardware, it
598is important for clients running on architectures which are not automatically
599cache coherent to be sure that they have properly written-back their cache after
600writing playback data to the buffer, or invalidated their cache before reading
601captured data.
602
603### Determining the FIFO depth
604
605Applications determine stream's FIFO depth using the
606`AUDIO_RB_CMD_GET_FIFO_DEPTH` command.  Drivers **must** return their FIFO
607depth, expressed in bytes, in the `fifo_depth` field of the response.  In order
608to ensure proper playback or capture of audio, applications and drivers must be
609careful to respect this value.  This is to say that drivers must not read beyond
610the nominal playback position of the stream plus this number of bytes when
611playing audio stream data.  Applications must stay this number of bytes behind
612the nominal capture point of the stream when capturing audio stream data.
613
614Once the format of a stream is set and a ring-buffer channel has been opened,
615the driver **must not** change this value.  From an application's point of view,
616it is a constant property of the ring-buffer channel.
617
618### Obtaining a shared buffer
619
620Once an application has successfully set the format of a stream, it will receive
621a new [channel](../objects/channel.md) representing its connection to the
622stream's ring-buffer.  In order to send or receive audio, the application must
623first establish a shared memory buffer.  This is done by sending an
624`AUDIO_RB_CMD_GET_BUFFER` request over the ring-buffer channel.  This may only
625be done while the ring-buffer is stopped.  Applications **must** specify two
626parameters when requesting a ring buffer.
627
628#### `min_ring_buffer_frames`
629The minimum number of frames of audio the client need allocated for the ring
630buffer.  Drivers may need to make this buffer larger in order to meet hardware
631requirement.  Clients **must** use the returned VMOs size (in bytes) to
632determine the actual size of the ring buffer may not assume that the size of
633the buffer (as determined by the driver) is exactly the size they requested.
634Drivers **must** ensure that the size of the ring buffer is an integral number
635of audio frames.
636
637> TODO : Is it reasonable to require that drivers produce buffers which are an
638> integral number of audio frames in length?  It certainly makes the audio
639> client's life easier (client code never needs to split or re-assemble a frame
640> before processing), but it might make it difficult for some audio hardware to
641> meet its requirements without making the buffer significantly larger than the
642> client asked for.
643
644#### `notifications_per_ring`
645The number of position update notifications (`audio_rb_position_notify_t`) the
646client would like the driver to send per cycle through the ring buffer.  Drivers
647should attempt to space the notifications as uniformly throughout the ring as
648their hardware design allows, but clients may not rely on perfectly uniform
649spacing of the update notifications.  Client's are not required to request any
650notifications at all and may choose to run using only start time and FIFO depth
651information to determine the driver's playout or capture position.
652
653Success or failure, drivers **must** respond to a `GET_BUFFER` request using an
654`audio_rb_cmd_get_buffer_resp_t` message.  If the driver fails the request
655because a buffer has already been established and the ring-buffer has already
656been started, it **must not** either stop the ring-buffer, or discard the
657existing shared memory.  If the application requests a new buffer after having
658already established a buffer while the ring buffer is stopped, it **must**
659consider the existing buffer is has to be invalid.  Success or failure, the old
660buffer is now gone.
661
662Upon succeeding, the driver **must** return a handle to a
663[VMO](../objects/vm_object.md) with permissions which allow applications to map
664the VMO into their address space using [zx_vmar_map](../syscalls/vmar_map.md),
665and to read/write data in the buffer in the case of readback, or simply read the
666data in the buffer in the case of capture.  Additionally, the driver **must**
667report the actual number of frames of audio it will use in the buffer via the
668`num_ring_buffer_frames` field of the `audio_rb_cmd_get_buffer_resp_t` message.
669This number **may** be larger than the `min_ring_buffer_frames` request from the
670client but **must not** be either smaller than this number, nor larger than the
671size (when converted to bytes) of the VMO as reported by
672[zx_vmo_get_size()](../syscalls/vmo_get_size.md)
673
674### Starting and Stopping the ring-buffer
675
676Clients may request that a ring-buffer start or stop using the
677`AUDIO_RB_CMD_START` and `AUDIO_RB_CMD_STOP` commands.  Success or failure,
678drivers **must** send a response to these requests.  Attempting to start a
679stream which is already started **must** be considered a failure.  Attempting to
680stop a stream which is already stopped **should** be considered a success.
681Ring-buffers cannot be either stopped or started until after a shared buffer has
682been established using the `GET_BUFFER` operation.
683
684Upon successfully starting a stream, drivers **must** provide their best
685estimate of the time at which their hardware began to transmit or capture the
686stream in the `start_time` field of the response.  This time stamp **must** be
687taken from the clock exposed via the
688[ZX_CLOCK_MONOTONIC](../syscalls/clock_get.md) syscall.  Along with the FIFO
689depth property of the ring buffer, this timestamp allows applications to send or
690receive stream data without the need for periodic position updates from the
691driver.  Along with the outboard latency estimate provided by the stream
692channel, this timestamp allows applications to synchronize presentation of audio
693information across multiple streams, or even multiple devices (provided that an
694external time synchronization protocol is used to synchronize the
695[ZX_CLOCK_MONOTONIC](../syscalls/clock_get.md) timelines across the cohort of
696synchronized devices).
697
698> TODO: Redefine `start_time` to allow it to be an arbitrary 'audio stream
699> clock' instead of the `ZX_CLOCK_MONOTONIC` clock.  If the stream clock is made
700> to count in audio frames since start, then this `start_time` can be replaced
701> with the terms for a segment of a piecewise linear transformation which can be
702> subsequently updated via notifications sent by the driver in the case that the
703> audio hardware clock is rooted in a different oscillator from the system's
704> tick counter.  Clients can then use this transformation either to control the
705> rate of consumption of input streams, or to determine where to sample in the
706> input stream to effect clock correction.
707
708Upon successfully starting a stream, drivers **must** guarantee that no position
709notifications will be sent before the start response has been enqueued into the
710ring-buffer channel.
711
712Upon successfully stopping a stream, drivers **must** guarantee that no position
713notifications will be enqueued into the ring-buffer channel after the stop
714response has been enqueued.
715
716### Position notifications
717
718If requested by the application during the `GET_BUFFER` operation, the driver
719will periodically send updates to the application informing it of its current
720production or consumption position in the buffer.  This position is expressed in
721bytes in the `ring_buffer_pos` field of the `audio_rb_position_notify_t`
722message.  These messages will only ever be sent while the ring-buffer is
723started.  Note, these position notifications indicate where in the buffer the
724driver has consumed or produced data, *not* where the nominal playback or
725capture position is.  Their arrival is not guaranteed to be perfectly uniform
726and they should not be used in an attempt to effect clock recovery.  If an
727application discovers that a driver has consumed past the point in the ring
728buffer where it has written data for playback, the audio presentation has
729certainly glitched.  Applications should increase their clock lead time and be
730certain to stay ahead of this point in the stream in the future.  Likewise,
731applications which are capturing audio data should make no attempt to read
732beyond the point in the ring buffer indicated by the most recent position
733notification sent by the driver.
734
735Driver playback/capture positions *always* begin at byte 0 in the ring buffer
736immediately following a successful start command.  When they reach the size of
737the VMO (as determined by [zx_vmo_get_size(...)](../syscalls/vmo_get_size.md))
738they wrap back to zero.  Drivers are not required to consume or produce data in
739integral numbers of audio frames.  Client whose notion of stream position
740depends on position notifications should take care to request that a sufficient
741number of notifications per ring be sent (minimum 2) and that they are processed
742quickly enough that aliasing does not occur.
743
744### Error notifications
745
746> TODO: define these and what the behavior of drivers should be in case they
747> occur.
748
749### Unexpected application termination
750
751If the other side of a ring buffer control channel is closed for any reason,
752drivers **must** immediately close the control channel, and shut down the ring
753buffer such that playback ring buffers begin to produce silence.  While drivers
754are encouraged to do so in a way which produces a graceful transition to
755silence, the requirement is that the audio stream go silent instead of looping.
756Once the transition to silence is complete, the resources associated with
757playback may be released and reused by the driver.
758
759This way, if an application teminates unexpectedly, the kernel will close the
760application's channels and cause audio playback to stop instead of continuing to
761loop.
762
763### Clock recovery
764
765> TODO: define a way that clock recovery information can be sent to clients in
766> the case that the audio output oscillator is not derived from the
767> `ZX_CLOCK_MONOTONIC` oscillator.  In addition, if the oscillator is slew-able
768> in hardware, provide the ability to discover this capability and control the
769> slew rate.  Given the fact that this oscillator is likely to be shared by
770> multiple streams, it might be best to return some form of system wide clock
771> identifier and provide the ability to obtain a channel on which clock
772> recovery notifications can be delivered to clients and HW slewing command can
773> be sent from clients to the clock.
774