1 ----------------------- 2 XSM/FLASK Configuration 3 ----------------------- 4 5Xen provides a security framework called XSM, and FLASK is an implementation of 6a security model using this framework (at the time of writing, it is the only 7one). FLASK defines a mandatory access control policy providing fine-grained 8controls over Xen domains, allowing the policy writer to define what 9interactions between domains, devices, and the hypervisor are permitted. 10 11Some examples of what FLASK can do: 12 - Prevent two domains from communicating via event channels or grants 13 - Control which domains can use device passthrough (and which devices) 14 - Restrict or audit operations performed by privileged domains 15 - Prevent a privileged domain from arbitrarily mapping pages from other domains 16 17Some of these examples require dom0 disaggregation to be useful, since the 18domain build process requires the ability to write to the new domain's memory. 19 20Security Status of dom0 disaggregation 21-------------------------------------- 22 23Xen supports disaggregation of various support and management 24functions into their own domains, via the XSM mechanisms described in 25this document. 26 27However the implementations of these support and management interfaces 28were originally written to be used only by the totally-privileged 29dom0, and have not been reviewed for security when exposed to 30supposedly-only-semi-privileged disaggregated management domains. But 31such management domains are (in such a design) to be seen as 32potentially hostile, e.g. due to privilege escalation following 33exploitation of a bug in the management domain. 34 35Until the interfaces have been properly reviewed for security against 36hostile callers, the Xen.org security team intends (subject of course 37to the permission of anyone disclosing to us) to handle these and 38future vulnerabilities in these interfaces in public, as if they were 39normal non-security-related bugs. 40 41This applies only to bugs which do no more than reduce the security of 42a radically disaggregated system to the security of a 43non-disaggregated one. Here a "radically disaggregated system" is one 44which uses the XSM mechanism to delegate the affected interfaces to 45other-than-fully-trusted domains. 46 47This policy does not apply to bugs which affect stub device models, 48driver domains, or stub xenstored - even if those bugs do no worse 49than reduce the security of such a system to one whose device models, 50backend drivers, or xenstore, run in dom0. 51 52For more information see https://xenbits.xen.org/xsa/advisory-77.html. 53 54The following interfaces are covered by this statement. Interfaces 55not listed here are considered safe for disaggregation, security 56issues found in interfaces not listed here will be handled according 57to the normal security problem response policy 58https://www.xenproject.org/security-policy.html. 59 60__HYPERVISOR_domctl (xen/include/public/domctl.h) 61 62 All subops except the following are covered by this statement. (That 63 is, only the subops below are considered safe for disaggregation.) 64 65 * XEN_DOMCTL_ioport_mapping 66 * XEN_DOMCTL_memory_mapping 67 * XEN_DOMCTL_bind_pt_irq 68 * XEN_DOMCTL_unbind_pt_irq 69 70__HYPERVISOR_sysctl (xen/include/public/sysctl.h) 71 72 All subops are covered by this statement. (That is, no subops are 73 considered safe for disaggregation.) 74 75__HYPERVISOR_memory_op (xen/include/public/memory.h) 76 77 The following subops are covered by this statement. subops not listed 78 here are considered safe for disaggregation. 79 80 * XENMEM_set_pod_target 81 * XENMEM_get_pod_target 82 * XENMEM_claim_pages 83 84 85Setting up FLASK 86---------------- 87 88Xen must be compiled with XSM and FLASK enabled; by default, the security 89framework is disabled. Running 'make -C xen menuconfig' and enabling XSM 90and FLASK inside 'Common Features'; this change requires a make clean and 91rebuild. 92 93FLASK uses only one domain configuration parameter (seclabel) defining the 94full security label of the newly created domain. If using the example policy, 95"seclabel='system_u:system_r:domU_t'" is an example of a normal domain. The 96labels are in the same format as SELinux labels; see http://selinuxproject.org 97for more details on the use of the user, role, and optional MLS/MCS labels. 98 99FLASK policy overview 100--------------------- 101 102Most of FLASK policy consists of defining the interactions allowed between 103different types (domU_t would be the type in this example). For simple policies, 104only type enforcement is used and the user and role are set to system_u and 105system_r for all domains. 106 107The FLASK security framework is mostly configured using a security policy file. 108It relies on the SELinux compiler "checkpolicy"; if this is available, the 109policy will be compiled as part of the tools build. If hypervisor support for a 110built-in policy is enabled ("Compile Xen with a built-in security policy"), the 111policy will be built during the hypervisor build. 112 113The policy is generated from definition files in tools/flask/policy. Most 114changes to security policy will involve creating or modifying modules found in 115tools/flask/policy/modules/. The modules.conf file there defines what modules 116are enabled and has short descriptions of each module. 117 118If not using the built-in policy, the XSM policy file needs to be copied to 119/boot and loaded as a module by grub. The exact position and filename of the 120module does not matter as long as it is after the Xen kernel; it is normally 121placed either just above the dom0 kernel or at the end. Once dom0 is running, 122the policy can be reloaded using "xl loadpolicy". 123 124The example policy included with Xen demonstrates most of the features of FLASK 125that can be used without dom0 disaggregation. The main types for domUs are: 126 127 - domU_t is a domain that can communicate with any other domU_t 128 - isolated_domU_t can only communicate with dom0 129 - prot_domU_t is a domain type whose creation can be disabled with a boolean 130 - nomigrate_t is a domain that must be created via the nomigrate_t_building 131 type, and whose memory cannot be read by dom0 once created 132 133HVM domains with stubdomain device models also need a type for the stub domain. 134The example policy defines dm_dom_t for the device model of a domU_t domain; 135there are no device model types defined for the other domU types. 136 137One disadvantage of using type enforcement to enforce isolation is that a new 138type is needed for each group of domains. The user field can be used to address 139this for the most common case of groups that can communicate internally but not 140externally; see "Users and roles" below. 141 142Type transitions 143---------------- 144 145Xen defines a number of operations such as memory mapping that are necessary for 146a domain to perform on itself, but are also undesirable to allow a domain to 147perform on every other domain of the same label. While it is possible to address 148this by only creating one domain per type, this solution significantly limits 149the flexibility of the type system. Another method to address this issue is to 150duplicate the permission names for every operation that can be performed on the 151current domain or on other domains; however, this significantly increases the 152necessary number of permissions and complicates the XSM hooks. Instead, this is 153addressed by allowing a distinct type to be used for a domain's access to 154itself. The same applies for a device model domain's access to its designated 155target, allowing the IS_PRIV_FOR checks used in Xen's DAC model to be 156implemented in FLASK. 157 158Upon domain creation (or relabel), a type transition is computed using the 159domain's label as the source and target. The result of this computation is used 160as the target when the domain accesses itself. In the example policy, this 161computed type is the result of appending _self to a domain's type: domU_t_self 162for domU_t. If no type transition rule exists, the domain will continue to use 163its own label for both the source and target. An AVC message will look like: 164 165 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self 166 167A similar type transition is done when a device model domain is associated with 168its target using the set_target operation. The transition is computed with the 169target domain as the source and the device model domain as the target: this 170ordering was chosen in order to preserve the original label for the target when 171no type transition rule exists. In the example policy, these computed types are 172the result of appending _target to the domain. 173 174Type transitions are also used to compute the labels of event channels. 175 176Users and roles 177--------------- 178 179The default user and role used for domains is system_u and system_r. Users are 180visible in the labels of domains and associated objects (event channels); when 181the vm_role module is enabled, "user_1:vm_r:domU_t" is a valid label for a 182domain created by the user_1 user. 183 184Access control rules involving users and roles are defined in a module's 185constraints file (for example, vm_rule.cons). The vm_role module defines one 186role (vm_r) and three users (user_1 .. user_3), along with constraints that 187prevent different users from communicating using grants or event channels, while 188still allowing communication with the system_u user where dom0 resides. 189 190Resource Policy 191--------------- 192 193The example policy also includes a resource type (nic_dev_t) for device 194passthrough, configured to allow use by domU_t. To label the PCI device 3:2.0 195for passthrough, run: 196 197 tools/flask/utils/flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t 198 199This command must be rerun on each boot or after any policy reload. 200 201When first loading or writing a policy, you should run FLASK in permissive mode 202(flask=permissive on the command line) and check the Xen logs (xl dmesg) for AVC 203denials before using it in enforcing mode (the default value of the boot 204parameter, which can also be changed using xl setenforce). When using the 205default types for domains (domU_t), the example policy shipped with Xen should 206allow the same operations on or between domains as when not using FLASK. 207 208By default, flask-label-pci labels the device, I/O ports, memory and IRQ with 209the provided label. These are all unique per-device, except for IRQs which 210can be shared between devices. This leads to assignment problems since vmA_t 211can't access the IRQ devB_t. To work around this issue, flask-label-pci 212takes an optional 3rd argument to label the IRQ: 213 214 flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t \ 215 system_u:object_r:shared_irq_t 216 217The IRQ labeling only applies to the PIRQ - MSI/MSI-X interrupts are labeled 218with the main device label. 219 220The policy needs to define the shared_irq_t with: 221 type shared_irq_t, resource_type; 222 223And the policy needs to be updated to allow domains appropriate access. 224 225MLS/MCS policy 226-------------- 227 228If you want to use the MLS policy, then set TYPE=xen-mls in the policy Makefile 229before building the policy. Note that the MLS constraints in policy/mls 230are incomplete and are only a sample. 231 232 233AVC denials 234----------- 235 236XSM:Flask will emit avc: denied messages when a permission is denied by the 237policy, just like SELinux. For example, if the HVM rules are removed from the 238declare_domain and create_domain interfaces: 239 240# xl dmesg | grep avc 241(XEN) avc: denied { setparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 242(XEN) avc: denied { getparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 243(XEN) avc: denied { irqlevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 244(XEN) avc: denied { pciroute } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 245(XEN) avc: denied { setparam } for domid=4 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t tclass=hvm 246(XEN) avc: denied { cacheattr } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 247(XEN) avc: denied { pcilevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 248 249Existing SELinux tools such as audit2allow can be applied to these denials, e.g. 250xl dmesg | audit2allow 251 252The generated allow rules can then be fed back into the policy by adding them to 253a module, although manual review is advised and will often lead to adding 254parameterized rules to the interfaces in xen.if to address the general case. 255 256 257Device Labeling in Policy 258------------------------- 259 260FLASK is capable of labeling devices and enforcing policies associated with 261them. There are two methods to label devices: dynamic labeling using 262flask-label-pci or similar tools run in dom0, or static labeling defined in 263policy. Static labeling will make security policy machine-specific and may 264prevent the system from booting after any hardware changes (adding PCI cards, 265memory, or even changing certain BIOS settings). Dynamic labeling requires that 266the domain performing the labeling be trusted to label all the devices in the 267system properly. 268 269IRQs, PCI devices, I/O memory and x86 IO ports can all have labels defined. 270There are examples commented out in tools/flask/policy/policy/device_contexts. 271 272Device Labeling 273--------------- 274 275The "lspci -vvn" command can be used to output all the devices and identifiers 276associated with them. For example, to label an Intel e1000e ethernet card the 277lspci output is.. 278 27900:19.0 0200: 8086:10de (rev 02) 280 Subsystem: 1028:0276 281 Interrupt: pin A routed to IRQ 33 282 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] 283 Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K] 284 Region 2: I/O ports at ecc0 [size=32] 285 Kernel modules: e1000e 286 287The labeling can be done with these lines in device_contexts: 288 289pirqcon 33 system_u:object_r:nicP_t 290iomemcon 0xfebe0-0xfebff system_u:object_r:nicP_t 291iomemcon 0xfebd9 system_u:object_r:nicP_t 292ioportcon 0xecc0-0xecdf system_u:object_r:nicP_t 293pcidevicecon 0xc800 system_u:object_r:nicP_t 294 295The PCI device label must be computed as the 32-bit SBDF number for the PCI 296device. It the PCI device is aaaa:bb:cc.d or bb:cc.d, then the SBDF can be 297calculated using: 298 SBDF = (a << 16) | (b << 8) | (c << 3) | d 299 300The AVC denials for IRQs, memory, ports, and PCI devices will normally contain 301the ranges being denied to more easily determine what resources are required. 302When running in permissive mode, only the first denial of a given 303source/destination is printed to the log, so labeling devices using this method 304may require multiple passes to find all required ranges. 305