1 ----------------------- 2 XSM/FLASK Configuration 3 ----------------------- 4 5Xen provides a security framework called XSM, and FLASK is an implementation of 6a security model using this framework (at the time of writing, it is the only 7one). FLASK defines a mandatory access control policy providing fine-grained 8controls over Xen domains, allowing the policy writer to define what 9interactions between domains, devices, and the hypervisor are permitted. 10 11Some examples of what FLASK can do: 12 - Prevent two domains from communicating via event channels or grants 13 - Control which domains can use device passthrough (and which devices) 14 - Restrict or audit operations performed by privileged domains 15 - Prevent a privileged domain from arbitrarily mapping pages from other domains 16 17Some of these examples require dom0 disaggregation to be useful, since the 18domain build process requires the ability to write to the new domain's memory. 19 20Security Status of dom0 disaggregation 21-------------------------------------- 22 23Xen supports disaggregation of various support and management 24functions into their own domains, via the XSM mechanisms described in 25this document. 26 27However the implementations of these support and management interfaces 28were originally written to be used only by the totally-privileged 29dom0, and have not been reviewed for security when exposed to 30supposedly-only-semi-privileged disaggregated management domains. But 31such management domains are (in such a design) to be seen as 32potentially hostile, e.g. due to privilege escalation following 33exploitation of a bug in the management domain. 34 35Until the interfaces have been properly reviewed for security against 36hostile callers, the Xen.org security team intends (subject of course 37to the permission of anyone disclosing to us) to handle these and 38future vulnerabilities in these interfaces in public, as if they were 39normal non-security-related bugs. 40 41This applies only to bugs which do no more than reduce the security of 42a radically disaggregated system to the security of a 43non-disaggregated one. Here a "radically disaggregated system" is one 44which uses the XSM mechanism to delegate the affected interfaces to 45other-than-fully-trusted domains. 46 47This policy does not apply to bugs which affect stub device models, 48driver domains, or stub xenstored - even if those bugs do no worse 49than reduce the security of such a system to one whose device models, 50backend drivers, or xenstore, run in dom0. 51 52For more information see http://xenbits.xen.org/xsa/advisory-77.html. 53 54The following interfaces are covered by this statement. Interfaces 55not listed here are considered safe for disaggregation, security 56issues found in interfaces not listed here will be handled according 57to the normal security problem response policy 58http://www.xenproject.org/security-policy.html. 59 60__HYPERVISOR_domctl (xen/include/public/domctl.h) 61 62 All subops except the following are covered by this statement. (That 63 is, only the subops below are considered safe for disaggregation.) 64 65 * XEN_DOMCTL_ioport_mapping 66 * XEN_DOMCTL_memory_mapping 67 * XEN_DOMCTL_bind_pt_irq 68 * XEN_DOMCTL_unbind_pt_irq 69 70__HYPERVISOR_sysctl (xen/include/public/sysctl.h) 71 72 All subops are covered by this statement. (That is, no subops are 73 considered safe for disaggregation.) 74 75__HYPERVISOR_memory_op (xen/include/public/memory.h) 76 77 The following subops are covered by this statement. subops not listed 78 here are considered safe for disaggregation. 79 80 * XENMEM_set_pod_target 81 * XENMEM_get_pod_target 82 * XENMEM_claim_pages 83 84__HYPERVISOR_tmem_op (xen/include/public/tmem.h) 85 86 The following tmem control ops, that is the sub-subops of 87 TMEM_CONTROL, are covered by this statement. 88 89 Note that TMEM is also subject to a similar policy arising from 90 XSA-15 http://lists.xen.org/archives/html/xen-announce/2012-09/msg00006.html. 91 Due to this existing policy all TMEM Ops are already subject to 92 reduced security support. 93 94 * TMEMC_THAW 95 * TMEMC_FREEZE 96 * TMEMC_FLUSH 97 * TMEMC_DESTROY 98 * TMEMC_LIST 99 * TMEMC_SET_WEIGHT 100 * TMEMC_SET_CAP 101 * TMEMC_SET_COMPRESS 102 * TMEMC_QUERY_FREEABLE_MB 103 * TMEMC_SAVE_BEGIN 104 * TMEMC_SAVE_GET_VERSION 105 * TMEMC_SAVE_GET_MAXPOOLS 106 * TMEMC_SAVE_GET_CLIENT_WEIGHT 107 * TMEMC_SAVE_GET_CLIENT_CAP 108 * TMEMC_SAVE_GET_CLIENT_FLAGS 109 * TMEMC_SAVE_GET_POOL_FLAGS 110 * TMEMC_SAVE_GET_POOL_NPAGES 111 * TMEMC_SAVE_GET_POOL_UUID 112 * TMEMC_SAVE_GET_NEXT_PAGE 113 * TMEMC_SAVE_GET_NEXT_INV 114 * TMEMC_SAVE_END 115 * TMEMC_RESTORE_BEGIN 116 * TMEMC_RESTORE_PUT_PAGE 117 * TMEMC_RESTORE_FLUSH_PAGE 118 119 120 121Setting up FLASK 122---------------- 123 124Xen must be compiled with XSM and FLASK enabled; by default, the security 125framework is disabled. Running 'make -C xen menuconfig' and enabling XSM 126and FLASK inside 'Common Features'; this change requires a make clean and 127rebuild. 128 129FLASK uses only one domain configuration parameter (seclabel) defining the 130full security label of the newly created domain. If using the example policy, 131"seclabel='system_u:system_r:domU_t'" is an example of a normal domain. The 132labels are in the same format as SELinux labels; see http://selinuxproject.org 133for more details on the use of the user, role, and optional MLS/MCS labels. 134 135FLASK policy overview 136--------------------- 137 138Most of FLASK policy consists of defining the interactions allowed between 139different types (domU_t would be the type in this example). For simple policies, 140only type enforcement is used and the user and role are set to system_u and 141system_r for all domains. 142 143The FLASK security framework is mostly configured using a security policy file. 144It relies on the SELinux compiler "checkpolicy"; if this is available, the 145policy will be compiled as part of the tools build. If hypervisor support for a 146built-in policy is enabled ("Compile Xen with a built-in security policy"), the 147policy will be built during the hypervisor build. 148 149The policy is generated from definition files in tools/flask/policy. Most 150changes to security policy will involve creating or modifying modules found in 151tools/flask/policy/modules/. The modules.conf file there defines what modules 152are enabled and has short descriptions of each module. 153 154If not using the built-in policy, the XSM policy file needs to be copied to 155/boot and loaded as a module by grub. The exact position and filename of the 156module does not matter as long as it is after the Xen kernel; it is normally 157placed either just above the dom0 kernel or at the end. Once dom0 is running, 158the policy can be reloaded using "xl loadpolicy". 159 160The example policy included with Xen demonstrates most of the features of FLASK 161that can be used without dom0 disaggregation. The main types for domUs are: 162 163 - domU_t is a domain that can communicate with any other domU_t 164 - isolated_domU_t can only communicate with dom0 165 - prot_domU_t is a domain type whose creation can be disabled with a boolean 166 - nomigrate_t is a domain that must be created via the nomigrate_t_building 167 type, and whose memory cannot be read by dom0 once created 168 169HVM domains with stubdomain device models also need a type for the stub domain. 170The example policy defines dm_dom_t for the device model of a domU_t domain; 171there are no device model types defined for the other domU types. 172 173One disadvantage of using type enforcement to enforce isolation is that a new 174type is needed for each group of domains. The user field can be used to address 175this for the most common case of groups that can communicate internally but not 176externally; see "Users and roles" below. 177 178Type transitions 179---------------- 180 181Xen defines a number of operations such as memory mapping that are necessary for 182a domain to perform on itself, but are also undesirable to allow a domain to 183perform on every other domain of the same label. While it is possible to address 184this by only creating one domain per type, this solution significantly limits 185the flexibility of the type system. Another method to address this issue is to 186duplicate the permission names for every operation that can be performed on the 187current domain or on other domains; however, this significantly increases the 188necessary number of permissions and complicates the XSM hooks. Instead, this is 189addressed by allowing a distinct type to be used for a domain's access to 190itself. The same applies for a device model domain's access to its designated 191target, allowing the IS_PRIV_FOR checks used in Xen's DAC model to be 192implemented in FLASK. 193 194Upon domain creation (or relabel), a type transition is computed using the 195domain's label as the source and target. The result of this computation is used 196as the target when the domain accesses itself. In the example policy, this 197computed type is the result of appending _self to a domain's type: domU_t_self 198for domU_t. If no type transition rule exists, the domain will continue to use 199its own label for both the source and target. An AVC message will look like: 200 201 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self 202 203A similar type transition is done when a device model domain is associated with 204its target using the set_target operation. The transition is computed with the 205target domain as the source and the device model domain as the target: this 206ordering was chosen in order to preserve the original label for the target when 207no type transition rule exists. In the example policy, these computed types are 208the result of appending _target to the domain. 209 210Type transitions are also used to compute the labels of event channels. 211 212Users and roles 213--------------- 214 215The default user and role used for domains is system_u and system_r. Users are 216visible in the labels of domains and associated objects (event channels); when 217the vm_role module is enabled, "user_1:vm_r:domU_t" is a valid label for a 218domain created by the user_1 user. 219 220Access control rules involving users and roles are defined in a module's 221constraints file (for example, vm_rule.cons). The vm_role module defines one 222role (vm_r) and three users (user_1 .. user_3), along with constraints that 223prevent different users from communicating using grants or event channels, while 224still allowing communication with the system_u user where dom0 resides. 225 226Resource Policy 227--------------- 228 229The example policy also includes a resource type (nic_dev_t) for device 230passthrough, configured to allow use by domU_t. To label the PCI device 3:2.0 231for passthrough, run: 232 233 tools/flask/utils/flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t 234 235This command must be rerun on each boot or after any policy reload. 236 237When first loading or writing a policy, you should run FLASK in permissive mode 238(flask=permissive on the command line) and check the Xen logs (xl dmesg) for AVC 239denials before using it in enforcing mode (the default value of the boot 240parameter, which can also be changed using xl setenforce). When using the 241default types for domains (domU_t), the example policy shipped with Xen should 242allow the same operations on or between domains as when not using FLASK. 243 244 245MLS/MCS policy 246-------------- 247 248If you want to use the MLS policy, then set TYPE=xen-mls in the policy Makefile 249before building the policy. Note that the MLS constraints in policy/mls 250are incomplete and are only a sample. 251 252 253AVC denials 254----------- 255 256XSM:Flask will emit avc: denied messages when a permission is denied by the 257policy, just like SELinux. For example, if the HVM rules are removed from the 258declare_domain and create_domain interfaces: 259 260# xl dmesg | grep avc 261(XEN) avc: denied { setparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 262(XEN) avc: denied { getparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 263(XEN) avc: denied { irqlevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 264(XEN) avc: denied { pciroute } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 265(XEN) avc: denied { setparam } for domid=4 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t tclass=hvm 266(XEN) avc: denied { cacheattr } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 267(XEN) avc: denied { pcilevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 268 269Existing SELinux tools such as audit2allow can be applied to these denials, e.g. 270xl dmesg | audit2allow 271 272The generated allow rules can then be fed back into the policy by adding them to 273a module, although manual review is advised and will often lead to adding 274parameterized rules to the interfaces in xen.if to address the general case. 275 276 277Device Labeling in Policy 278------------------------- 279 280FLASK is capable of labeling devices and enforcing policies associated with 281them. There are two methods to label devices: dynamic labeling using 282flask-label-pci or similar tools run in dom0, or static labeling defined in 283policy. Static labeling will make security policy machine-specific and may 284prevent the system from booting after any hardware changes (adding PCI cards, 285memory, or even changing certain BIOS settings). Dynamic labeling requires that 286the domain performing the labeling be trusted to label all the devices in the 287system properly. 288 289IRQs, PCI devices, I/O memory and x86 IO ports can all have labels defined. 290There are examples commented out in tools/flask/policy/policy/device_contexts. 291 292Device Labeling 293--------------- 294 295The "lspci -vvn" command can be used to output all the devices and identifiers 296associated with them. For example, to label an Intel e1000e ethernet card the 297lspci output is.. 298 29900:19.0 0200: 8086:10de (rev 02) 300 Subsystem: 1028:0276 301 Interrupt: pin A routed to IRQ 33 302 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] 303 Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K] 304 Region 2: I/O ports at ecc0 [size=32] 305 Kernel modules: e1000e 306 307The labeling can be done with these lines in device_contexts: 308 309pirqcon 33 system_u:object_r:nicP_t 310iomemcon 0xfebe0-0xfebff system_u:object_r:nicP_t 311iomemcon 0xfebd9 system_u:object_r:nicP_t 312ioportcon 0xecc0-0xecdf system_u:object_r:nicP_t 313pcidevicecon 0xc800 system_u:object_r:nicP_t 314 315The PCI device label must be computed as the 32-bit SBDF number for the PCI 316device. It the PCI device is aaaa:bb:cc.d or bb:cc.d, then the SBDF can be 317calculated using: 318 SBDF = (a << 16) | (b << 8) | (c << 3) | d 319 320The AVC denials for IRQs, memory, ports, and PCI devices will normally contain 321the ranges being denied to more easily determine what resources are required. 322When running in permissive mode, only the first denial of a given 323source/destination is printed to the log, so labeling devices using this method 324may require multiple passes to find all required ranges. 325