1
2
3<!--
4    (C) Copyright 2018 The Fuchsia Authors. All rights reserved.
5    Use of this source code is governed by a BSD-style license that can be
6    found in the LICENSE file.
7-->
8
9# RAMdisk Device
10
11This document is part of the [Driver Development Kit tutorial](ddk-tutorial.md) documentation.
12
13## Overview
14
15In this section, we'll examine a simplified RAM-disk driver.
16
17This driver introduces:
18
19*   the block protocol's **query()** and **queue()** ops
20*   Virtual Memory Address Regions ([VMAR](../objects/vm_address_region.md)s)
21    and Virtual Memory Objects ([VMO](../objects/vm_object.md)s)
22
23The source is in `//zircon/system/dev/sample/ramdisk/demo-ramdisk.c`.
24
25As with all drivers, the first thing to look at is how the driver initializes itself:
26
27```c
28static zx_status_t ramdisk_driver_bind(void* ctx, zx_device_t* parent) {
29    zx_status_t status = ZX_OK;
30
31    // (1) create the device context block
32    ramdisk_device_t* ramdev = calloc(1, sizeof((*ramdev)));
33    if (ramdev == NULL) {
34        return ZX_ERR_NO_MEMORY;
35    }
36
37    // (2) create a VMO
38    status = zx_vmo_create(RAMDISK_SIZE, 0, &ramdev->vmo);
39    if (status != ZX_OK) {
40        goto cleanup;
41    }
42
43    // (3) map the VMO into our address space
44    status = zx_vmar_map(zx_vmar_root_self(), 0, ramdev->vmo, 0, RAMDISK_SIZE,
45                         ZX_VM_FLAG_PERM_READ | ZX_VM_FLAG_PERM_WRITE, &ramdev->mapped_addr);
46    if (status != ZX_OK) {
47        goto cleanup;
48    }
49
50    // (4) add the device
51    device_add_args_t args = {
52        .version = DEVICE_ADD_ARGS_VERSION,
53        .name = "demo-ramdisk",
54        .ctx = ramdev,
55        .ops = &ramdisk_proto,
56        .proto_id = ZX_PROTOCOL_BLOCK_IMPL,
57        .proto_ops = &block_ops,
58    };
59
60    if ((status = device_add(parent, &args, &ramdev->zxdev)) != ZX_OK) {
61        ramdisk_release(ramdev);
62    }
63    return status;
64
65    // (5) clean up after ourselves
66cleanup:
67    zx_handle_close(ramdev->vmo);
68    free(ramdev);
69    return status;
70}
71
72static zx_driver_ops_t ramdisk_driver_ops = {
73    .version = DRIVER_OPS_VERSION,
74    .bind = ramdisk_driver_bind,
75};
76
77ZIRCON_DRIVER_BEGIN(ramdisk, ramdisk_driver_ops, "zircon", "0.1", 1)
78    BI_MATCH_IF(EQ, BIND_PROTOCOL, ZX_PROTOCOL_MISC_PARENT),
79ZIRCON_DRIVER_END(ramdisk)
80
81```
82
83At the bottom, you can see that this driver binds to a `ZX_PROTOCOL_MISC_PARENT` type of
84protocol, and provides `ramdisk_driver_ops` as the list of operations supported.
85This is no different than any of the other drivers we've seen so far.
86
87The binding function, **ramdisk_driver_bind()**, does the following:
88
891.  Allocates the device context block.
902.  Creates a [VMO](../objects/vm_object.md).
91    The [VMO](../objects/vm_object.md)
92    is a kernel object that represents a chunk of memory.
93    In this simplified RAM-disk driver, we're creating a
94    [VMO](../objects/vm_object.md) that's `RAMDISK_SIZE`
95    bytes long.
96    This chunk of memory **is** the RAM-disk &mdash; that's where the data is stored.
97    The [VMO](../objects/vm_object.md)
98    creation call, [**zx_vmo_create()**](../syscalls/vmo_create.md),
99    returns the [VMO](../objects/vm_object.md) handle through
100    its third argument, which is a member in our context block.
1013.  Maps the [VMO](../objects/vm_object.md) into our address space via
102    [**zx_vmar_map()**](../syscalls/vmar_map.md).
103    This function returns a pointer to a
104    [VMAR](../objects/vm_address_region.md)
105    that points to the entire
106    [VMO](../objects/vm_object.md) (because
107    we specified `RAMDISK_SIZE` as the mapping size argument) and gives us read and
108    write access (because of the `ZX_VM_FLAG_PERM_*` flags).
109    The pointer is stored in our context block's `mapped_addr` member.
1104.  Adds our device via **device_add()**,
111    just like all the examples we've seen above.
112    The difference here, though is that we see two new members: `proto_id` and
113    `proto_ops`.
114    These are defined as "optional custom protocol" members.
115    As usual, we store the newly created device in the `zxdev` member of our
116    context block.
1175.  Cleans up resources if there were any problems along the way.
118
119For completeness, here's the context block:
120
121```c
122typedef struct ramdisk_device {
123    zx_device_t*    zxdev;
124    uintptr_t       mapped_addr;
125    uint32_t        flags;
126    zx_handle_t     vmo;
127    bool            dead;
128} ramdisk_device_t;
129```
130
131The fields are:
132
133Type            | Field         | Description
134----------------|---------------|----------------
135`zx_device_t*`  | zxdev         | the ramdisk device
136`uintptr_t`     | mapped_addr   | address of the [VMAR](../objects/vm_address_region.md)
137`uin32_t`       | flags         | device flags
138`zx_handle_t`   | vmo           | a handle to our [VMO](../objects/vm_object.md)
139`bool`          | dead          | indicates if the device is still alive
140
141### Operations
142
143Where this device is different from the others that we've seen, though,
144is that the **device_add()**
145function adds two sets of operations; the "regular" one, and an
146optional "protocol specific" one:
147
148```c
149static zx_protocol_device_t ramdisk_proto = {
150    .version = DEVICE_OPS_VERSION,
151    .ioctl = ramdisk_ioctl,
152    .get_size = ramdisk_getsize,
153    .unbind = ramdisk_unbind,
154    .release = ramdisk_release,
155};
156
157static block_protocol_ops_t block_ops = {
158    .query = ramdisk_query,
159    .queue = ramdisk_queue,
160};
161```
162
163The `zx_protocol_device_t` one handles control ops (**ramdisk_ioctl()**), device size
164queries (**ramdisk_getsize()**), and device cleanups (**ramdisk_unbind()** and
165**ramdisk_release()**).
166
167> @@@ should I discuss the ioctls, or were they to have been removed as part of the simplification?
168
169The `block_protocol_ops_t` one contains protocol operations particular to the
170block protocol.
171We bound these to the device in the `device_add_args_t` structure (step (4) above) via
172the `.proto_ops` field.
173We also set the `.proto_id` field to `ZX_PROTOCOL_BLOCK_IMPL` &mdash; this is what
174identifies this driver as being able to handle block protocol operations.
175
176Let's tackle the trivial functions first:
177
178```c
179static zx_off_t ramdisk_getsize(void* ctx) {
180    return RAMDISK_SIZE;
181}
182
183static void ramdisk_unbind(void* ctx) {
184    ramdisk_device_t* ramdev = ctx;
185    ramdev->dead = true;
186    device_remove(ramdev->zxdev);
187}
188
189static void ramdisk_release(void* ctx) {
190    ramdisk_device_t* ramdev = ctx;
191
192    if (ramdev->vmo != ZX_HANDLE_INVALID) {
193        zx_vmar_unmap(zx_vmar_root_self(), ramdev->mapped_addr, RAMDISK_SIZE);
194        zx_handle_close(ramdev->vmo);
195    }
196    free(ramdev);
197}
198
199static void ramdisk_query(void* ctx, block_info_t* bi, size_t* bopsz) {
200    ramdisk_get_info(ctx, bi);
201    *bopsz = sizeof(block_op_t);
202}
203```
204
205**ramdisk_getsize()** is the easiest &mdash; it simply returns the size of the resource, in bytes.
206In our simplified RAM-disk driver, this is hardcoded as a `#define` near the top of the file.
207
208Next, **ramdisk_unbind()** and **ramdisk_release()** work together.
209When the driver is being shut down, the **ramdisk_unbind()** hook is called.
210It sets the `dead` flag to indicate that the driver is shutting down (this is checked
211in the **ramdisk_queue()** handler, below).
212It's expected that the driver will finish up any I/O operations that are in progress (there
213won't be any in our RAM-disk), and it should call
214**device_remove()**
215to remove itself from the parent.
216
217After **device_remove()** is called,
218the driver's **ramdisk_release()** will be called.
219Here we unmap the [VMAR](../objects/vm_address_region.md),
220via [**zx_vmar_unmap()**](../syscalls/vmar_unmap.md), and close the
221[VMO](../objects/vm_object.md),
222via [**zx_handle_close()**](../syscalls/handle_close.md).
223As our final act, we release the device context block.
224At this point, the device is finished.
225
226### Block Operations
227
228The **ramdisk_query()** function is called by the block protocol in order to get
229information about the device.
230There's a data structure (the `block_info_t`) that's filled out by the driver:
231
232```c
233// from .../system/public/zircon/device/block.h:
234typedef struct {
235    uint64_t    block_count;        // The number of blocks in this block device
236    uint32_t    block_size;         // The size of a single block
237    uint32_t    max_transfer_size;  // Max size in bytes per transfer.
238                                    // May be BLOCK_MAX_TRANSFER_UNBOUNDED if there
239                                    // is no restriction.
240    uint32_t    flags;
241    uint32_t    reserved;
242} block_info_t;
243
244// our helper function
245static void ramdisk_get_info(void* ctx, block_info_t* info) {
246    ramdisk_device_t* ramdev = ctx;
247    memset(info, 0, sizeof(*info));
248    info->block_size = BLOCK_SIZE;
249    info->block_count = BLOCK_COUNT;
250    // Arbitrarily set, but matches the SATA driver for testing
251    info->max_transfer_size = BLOCK_MAX_TRANSFER_UNBOUNDED;
252    info->flags = ramdev->flags;
253}
254```
255
256In this simplified driver, the `block_size`, `block_count`, and `max_transfer_size`
257fields are hardcoded numbers.
258
259The `flags` member is used to identify if the device is read-only (`BLOCK_FLAG_READONLY`,
260otherwise it's read/write), removable (`BLOCK_FLAG_REMOVABLE`, otherwise it's not
261removable) or has a bootable partition (`BLOCK_FLAG_BOOTPART`, otherwise it doesn't).
262
263The final value that **ramdisk_query()** returns is the "block operation size" value
264through the pointer to `bopsz`.
265This is a host-maintained block that's big enough to contain the `block_op_t` *plus*
266any additional data the driver wants (appended to the `block_op_t`), like an
267extended context block.
268
269### Reading and writing
270
271Finally, it's time to discuss the actual "block" data transfers; that is, how does
272data get read from / written to the device?
273
274The second block protocol handler, **ramdisk_queue()**, performs that function.
275
276As you might suspect from the name, it's intended that this hook starts whatever
277transfer operation (a read or a write) is requested, but doesn't require that
278the operation completes before the hook returns.
279This is a little like what we saw in earlier chapters
280in the **read()** and **write()** handlers
281for devices like `/dev/misc/demo-fifo` &mdash; there, we could either return
282data immediately, or put the client to sleep, waking it up later when data (or room
283for data) became available.
284
285With **ramdisk_queue()** we get passed a block operations structure that indicates
286the expected operation: `BLOCK_OP_READ`, `BLOCK_OP_WRITE`, or `BLOCK_OP_FLUSH`.
287The structure also contains additional fields telling us the offset and size of
288the transfer (from `//zircon/system/ulib/ddk/include/ddk/protocol/block.h`):
289
290```c
291// simplified from original
292struct block_op {
293    struct {
294        uint32_t    command;    // command and flags
295        uint32_t    extra;      // available for temporary use
296        zx_handle_t vmo;        // vmo of data to read or write
297        uint32_t    length;     // transfer length in blocks (0 is invalid)
298        uint64_t    offset_dev; // device offset in blocks
299        uint64_t    offset_vmo; // vmo offset in blocks
300        uint64_t*   pages;      // optional physical page list
301    } rw;
302
303    void (*completion_cb)(block_op_t* block, zx_status_t status);
304};
305```
306
307The transfer takes place to or from the `vmo` in the structure &mdash; in the case of
308a read, we transfer data to the [VMO](../objects/vm_object.md),
309and vice versa for a write.
310The `length` indicates the number of *blocks* (not bytes) to transfer, and the
311two offset fields, `offset_dev` and `offset_vmo`, indicate the relative offsets (again,
312in blocks not bytes) into the device and the [VMO](../objects/vm_object.md)
313of where the transfer should take place.
314
315The implementation is straightforward:
316
317```c
318static void ramdisk_queue(void* ctx, block_op_t* bop) {
319    ramdisk_device_t* ramdev = ctx;
320
321    // (1) see if we should still be handling requests
322    if (ramdev->dead) {
323        bop->completion_cb(bop, ZX_ERR_IO_NOT_PRESENT);
324        return;
325    }
326
327    // (2) what operation are we performing?
328    switch ((bop->command &= BLOCK_OP_MASK)) {
329    case BLOCK_OP_READ:
330    case BLOCK_OP_WRITE: {
331        // (3) perform validation common for both
332        if ((bop->rw.offset_dev >= BLOCK_COUNT)
333            || ((BLOCK_COUNT - bop->rw.offset_dev) < bop->rw.length)
334            || bop->rw.length * BLOCK_SIZE > MAX_TRANSFER_BYTES) {
335            bop->completion_cb(bop, ZX_ERR_OUT_OF_RANGE);
336            return;
337        }
338
339        // (4) compute address
340        void* addr = (void*) ramdev->mapped_addr + bop->rw.offset_dev * BLOCK_SIZE;
341        zx_status_t status;
342
343        // (5) now perform actions specific to each
344        if (bop->command == BLOCK_OP_READ) {
345            status = zx_vmo_write(bop->rw.vmo, addr, bop->rw.offset_vmo * BLOCK_SIZE,
346                                  bop->rw.length * BLOCK_SIZE);
347        } else {
348            status = zx_vmo_read(bop->rw.vmo, addr, bop->rw.offset_vmo * BLOCK_SIZE,
349                                 bop->rw.length * BLOCK_SIZE);
350        }
351
352        // (6) indicate completion
353        bop->completion_cb(bop, status);
354        break;
355        }
356
357    case BLOCK_OP_FLUSH:
358        bop->completion_cb(bop, ZX_OK);
359        break;
360
361    default:
362        bop->completion_cb(bop, ZX_ERR_NOT_SUPPORTED);
363        break;
364    }
365}
366```
367
368As usual, we establish a context block at the top by casting the `ctx` argument.
369The `bop` argument is the "block operation" structure we saw above.
370The `command` field indicates what the **ramdisk_queue()** function should do.
371
372In step (1), we check to see if we've set the `dead` flag (**ramdisk_unbind()**
373sets it when required).
374If so, it means that our device is no longer accepting new requests, so we return
375`ZX_ERR_IO_NOT_PRESENT` in order to encourage clients to close the device.
376
377In step (3), we handle some common validation for both read and write &mdash;
378neither should allow offsets that exceed the size of the device, nor transfer
379more than the maximum transfer size.
380
381Similarly, in step (4) we compute the device address (that is, we establish a
382pointer to our [VMAR](../objects/vm_address_region.md)
383that's offset by the appropriate number of blocks as per the request).
384
385In step (5) we perform either a [**zx_vmo_read()**](../syscalls/vmo_read.md)
386or a [**zx_vmo_write()**](../syscalls/vmo_write.md), depending
387on the command.
388This is what transfers data between a pointer within our
389[VMAR](../objects/vm_address_region.md) (`addr`)
390and the client's [VMO](../objects/vm_object.md) (`bop->rw.vmo`).
391Notice that in the read case, we *write* to the [VMO](../objects/vm_object.md),
392and in the write case, we *read* from the [VMO](../objects/vm_object.md).
393
394Finally, in step (6) (and the other two cases), we signal completion via the
395`completion` callback in the block ops structure.
396
397The interesting thing about completion is that:
398
399*   it doesn't have to happen right away &mdash; we could have queued this
400    operation and signalled completion some time later,
401*   it is allowed to be called before this function returns (like we did).
402
403The last point simply means that we are not *forced* to defer completion until
404after the queuing function returns.
405This allows us to complete the operation directly in the function.
406For our trivial RAM-disk example, this makes sense &mdash; we have the ability to
407do the data transfer to or from media instantly; no need to defer.
408
409## How is the real one more complicated?
410
411The RAM-disk presented above is somewhat simplified from the "real" RAM-disk
412device (present at `//zircon/system/dev/block/ramdisk/ramdisk.c`).
413
414The real one adds the following functionality:
415
416*   dynamic device creation via new VMO
417*   ability to use an existing VMO
418*   background thread
419*   sleep mode
420
421> @@@ how much, if anything, do we want to say about this one? I found the
422> dynamic device creation of interest, for example...
423
424