1DMOP
2====
3
4Introduction
5------------
6
7The DMOP hypercall has a new ABI design to solve problems in the Xen
8ecosystem.  First, the ABI is fully stable, to reduce the coupling between
9device models and the version of Xen.  Specifically, device model software
10using DMOP (be it user, stub domain or kernel software) need not be recompiled
11to match the version of the running hypervisor.
12
13Secondly, for device models in userspace, the ABI is designed specifically to
14allow a kernel to audit the memory ranges used, without having to know the
15internal structure of sub-ops.
16
17The problem occurs when you a device model issues an hypercall that
18includes references to user memory other than the operation structure
19itself, such as with Track dirty VRAM (as used in VGA emulation).
20Is this case, the address of this other user memory needs to be vetted,
21to ensure it is not within restricted address ranges, such as kernel
22memory. The real problem comes down to how you would vet this address -
23the idea place to do this is within the privcmd driver, without privcmd
24having to have specific knowledge of the hypercall's semantics.
25
26The Design
27----------
28
29The privcmd driver implements a new restriction ioctl, which takes a domid
30parameter.  After that restriction ioctl is issued, all unaudited operations
31on the privcmd driver will cease to function, including regular hypercalls.
32DMOP hypercalls will continue to function as they can be audited.
33
34A DMOP hypercall consists of a domid (which is audited to verify that it
35matches any restriction in place) and an array of buffers and lengths,
36with the first one containing the specific DMOP parameters. These can
37then reference further buffers from within in the array. Since the only
38user buffers passed are that found with that array, they can all can be
39audited by privcmd.
40
41The following code illustrates this idea:
42
43struct xen_dm_op {
44    uint32_t op;
45};
46
47struct xen_dm_op_buf {
48    XEN_GUEST_HANDLE(void) h;
49    unsigned long size;
50};
51typedef struct xen_dm_op_buf xen_dm_op_buf_t;
52
53enum neg_errnoval
54HYPERVISOR_dm_op(domid_t domid,
55                 xen_dm_op_buf_t bufs[],
56                 unsigned int nr_bufs)
57
58@domid is the domain the hypercall operates on.
59@bufs points to an array of buffers where @bufs[0] contains a struct
60dm_op, describing the specific device model operation and its parameters.
61@bufs[1..] may be referenced in the parameters for the purposes of
62passing extra information to or from the domain.
63@nr_bufs is the number of buffers in the @bufs array.
64
65It is forbidden for the above struct (xen_dm_op) to contain any guest
66handles. If they are needed, they should instead be in
67HYPERVISOR_dm_op->bufs.
68
69Validation by privcmd driver
70----------------------------
71
72If the privcmd driver has been restricted to specific domain (using a
73 new ioctl), when it received an op, it will:
74
751. Check hypercall is DMOP.
76
772. Check domid == restricted domid.
78
793. For each @nr_bufs in @bufs: Check @h and @size give a buffer
80   wholly in the user space part of the virtual address space. (e.g.
81   Linux will use access_ok()).
82
83
84Xen Implementation
85------------------
86
87Since a DMOP buffers need to be copied from or to the guest, functions for
88doing this would be written as below.  Note that care is taken to prevent
89damage from buffer under- or over-run situations.  If the DMOP is called
90with incorrectly sized buffers, zeros will be read, while extra is ignored.
91
92static bool copy_buf_from_guest(xen_dm_op_buf_t bufs[],
93                                unsigned int nr_bufs, void *dst,
94                                unsigned int idx, size_t dst_size)
95{
96    size_t size;
97
98    if ( idx >= nr_bufs )
99        return false;
100
101    memset(dst, 0, dst_size);
102
103    size = min_t(size_t, dst_size, bufs[idx].size);
104
105    return !copy_from_guest(dst, bufs[idx].h, size);
106}
107
108static bool copy_buf_to_guest(xen_dm_op_buf_t bufs[],
109                              unsigned int nr_bufs, unsigned int idx,
110                              void *src, size_t src_size)
111{
112    size_t size;
113
114    if ( idx >= nr_bufs )
115        return false;
116
117    size = min_t(size_t, bufs[idx].size, src_size);
118
119    return !copy_to_guest(bufs[idx].h, src, size);
120}
121
122This leaves do_dm_op easy to implement as below:
123
124static int dm_op(domid_t domid,
125                 unsigned int nr_bufs,
126                 xen_dm_op_buf_t bufs[])
127{
128    struct domain *d;
129    struct xen_dm_op op;
130    bool const_op = true;
131    long rc;
132
133    rc = rcu_lock_remote_domain_by_id(domid, &d);
134    if ( rc )
135        return rc;
136
137    if ( !is_hvm_domain(d) )
138        goto out;
139
140    rc = xsm_dm_op(XSM_DM_PRIV, d);
141    if ( rc )
142        goto out;
143
144    if ( !copy_buf_from_guest(bufs, nr_bufs, &op, 0, sizeof(op)) )
145    {
146        rc = -EFAULT;
147        goto out;
148    }
149
150    switch ( op.op )
151    {
152    default:
153        rc = -EOPNOTSUPP;
154        break;
155    }
156
157    if ( !rc &&
158         !const_op &&
159         !copy_buf_to_guest(bufs, nr_bufs, 0, &op, sizeof(op)) )
160        rc = -EFAULT;
161
162 out:
163    rcu_unlock_domain(d);
164
165    return rc;
166}
167
168long do_dm_op(domid_t domid,
169              unsigned int nr_bufs,
170              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
171{
172    struct xen_dm_op_buf nat[MAX_NR_BUFS];
173
174    if ( nr_bufs > MAX_NR_BUFS )
175        return -EINVAL;
176
177    if ( copy_from_guest_offset(nat, bufs, 0, nr_bufs) )
178        return -EFAULT;
179
180    return dm_op(domid, nr_bufs, nat);
181}
182