1DMOP 2==== 3 4Introduction 5------------ 6 7The DMOP hypercall has a new ABI design to solve problems in the Xen 8ecosystem. First, the ABI is fully stable, to reduce the coupling between 9device models and the version of Xen. Specifically, device model software 10using DMOP (be it user, stub domain or kernel software) need not be recompiled 11to match the version of the running hypervisor. 12 13Secondly, for device models in userspace, the ABI is designed specifically to 14allow a kernel to audit the memory ranges used, without having to know the 15internal structure of sub-ops. 16 17The problem occurs when you a device model issues an hypercall that 18includes references to user memory other than the operation structure 19itself, such as with Track dirty VRAM (as used in VGA emulation). 20Is this case, the address of this other user memory needs to be vetted, 21to ensure it is not within restricted address ranges, such as kernel 22memory. The real problem comes down to how you would vet this address - 23the idea place to do this is within the privcmd driver, without privcmd 24having to have specific knowledge of the hypercall's semantics. 25 26The Design 27---------- 28 29The privcmd driver implements a new restriction ioctl, which takes a domid 30parameter. After that restriction ioctl is issued, all unaudited operations 31on the privcmd driver will cease to function, including regular hypercalls. 32DMOP hypercalls will continue to function as they can be audited. 33 34A DMOP hypercall consists of a domid (which is audited to verify that it 35matches any restriction in place) and an array of buffers and lengths, 36with the first one containing the specific DMOP parameters. These can 37then reference further buffers from within in the array. Since the only 38user buffers passed are that found with that array, they can all can be 39audited by privcmd. 40 41The following code illustrates this idea: 42 43struct xen_dm_op { 44 uint32_t op; 45}; 46 47struct xen_dm_op_buf { 48 XEN_GUEST_HANDLE(void) h; 49 unsigned long size; 50}; 51typedef struct xen_dm_op_buf xen_dm_op_buf_t; 52 53enum neg_errnoval 54HYPERVISOR_dm_op(domid_t domid, 55 xen_dm_op_buf_t bufs[], 56 unsigned int nr_bufs) 57 58@domid is the domain the hypercall operates on. 59@bufs points to an array of buffers where @bufs[0] contains a struct 60dm_op, describing the specific device model operation and its parameters. 61@bufs[1..] may be referenced in the parameters for the purposes of 62passing extra information to or from the domain. 63@nr_bufs is the number of buffers in the @bufs array. 64 65It is forbidden for the above struct (xen_dm_op) to contain any guest 66handles. If they are needed, they should instead be in 67HYPERVISOR_dm_op->bufs. 68 69Validation by privcmd driver 70---------------------------- 71 72If the privcmd driver has been restricted to specific domain (using a 73 new ioctl), when it received an op, it will: 74 751. Check hypercall is DMOP. 76 772. Check domid == restricted domid. 78 793. For each @nr_bufs in @bufs: Check @h and @size give a buffer 80 wholly in the user space part of the virtual address space. (e.g. 81 Linux will use access_ok()). 82 83 84Xen Implementation 85------------------ 86 87Since a DMOP buffers need to be copied from or to the guest, functions for 88doing this would be written as below. Note that care is taken to prevent 89damage from buffer under- or over-run situations. If the DMOP is called 90with incorrectly sized buffers, zeros will be read, while extra is ignored. 91 92static bool copy_buf_from_guest(xen_dm_op_buf_t bufs[], 93 unsigned int nr_bufs, void *dst, 94 unsigned int idx, size_t dst_size) 95{ 96 size_t size; 97 98 if ( idx >= nr_bufs ) 99 return false; 100 101 memset(dst, 0, dst_size); 102 103 size = min_t(size_t, dst_size, bufs[idx].size); 104 105 return !copy_from_guest(dst, bufs[idx].h, size); 106} 107 108static bool copy_buf_to_guest(xen_dm_op_buf_t bufs[], 109 unsigned int nr_bufs, unsigned int idx, 110 void *src, size_t src_size) 111{ 112 size_t size; 113 114 if ( idx >= nr_bufs ) 115 return false; 116 117 size = min_t(size_t, bufs[idx].size, src_size); 118 119 return !copy_to_guest(bufs[idx].h, src, size); 120} 121 122This leaves do_dm_op easy to implement as below: 123 124static int dm_op(domid_t domid, 125 unsigned int nr_bufs, 126 xen_dm_op_buf_t bufs[]) 127{ 128 struct domain *d; 129 struct xen_dm_op op; 130 bool const_op = true; 131 long rc; 132 133 rc = rcu_lock_remote_domain_by_id(domid, &d); 134 if ( rc ) 135 return rc; 136 137 if ( !is_hvm_domain(d) ) 138 goto out; 139 140 rc = xsm_dm_op(XSM_DM_PRIV, d); 141 if ( rc ) 142 goto out; 143 144 if ( !copy_buf_from_guest(bufs, nr_bufs, &op, 0, sizeof(op)) ) 145 { 146 rc = -EFAULT; 147 goto out; 148 } 149 150 switch ( op.op ) 151 { 152 default: 153 rc = -EOPNOTSUPP; 154 break; 155 } 156 157 if ( !rc && 158 !const_op && 159 !copy_buf_to_guest(bufs, nr_bufs, 0, &op, sizeof(op)) ) 160 rc = -EFAULT; 161 162 out: 163 rcu_unlock_domain(d); 164 165 return rc; 166} 167 168long do_dm_op(domid_t domid, 169 unsigned int nr_bufs, 170 XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs) 171{ 172 struct xen_dm_op_buf nat[MAX_NR_BUFS]; 173 174 if ( nr_bufs > MAX_NR_BUFS ) 175 return -EINVAL; 176 177 if ( copy_from_guest_offset(nat, bufs, 0, nr_bufs) ) 178 return -EFAULT; 179 180 return dm_op(domid, nr_bufs, nat); 181} 182