1                     -----------------------
2                     XSM/FLASK Configuration
3                     -----------------------
4
5Xen provides a security framework called XSM, and FLASK is an implementation of
6a security model using this framework (at the time of writing, it is the only
7one). FLASK defines a mandatory access control policy providing fine-grained
8controls over Xen domains, allowing the policy writer to define what
9interactions between domains, devices, and the hypervisor are permitted.
10
11Some examples of what FLASK can do:
12 - Prevent two domains from communicating via event channels or grants
13 - Control which domains can use device passthrough (and which devices)
14 - Restrict or audit operations performed by privileged domains
15 - Prevent a privileged domain from arbitrarily mapping pages from other domains
16
17Some of these examples require dom0 disaggregation to be useful, since the
18domain build process requires the ability to write to the new domain's memory.
19
20Security Status of dom0 disaggregation
21--------------------------------------
22
23Xen supports disaggregation of various support and management
24functions into their own domains, via the XSM mechanisms described in
25this document.
26
27However the implementations of these support and management interfaces
28were originally written to be used only by the totally-privileged
29dom0, and have not been reviewed for security when exposed to
30supposedly-only-semi-privileged disaggregated management domains.  But
31such management domains are (in such a design) to be seen as
32potentially hostile, e.g. due to privilege escalation following
33exploitation of a bug in the management domain.
34
35Until the interfaces have been properly reviewed for security against
36hostile callers, the Xen.org security team intends (subject of course
37to the permission of anyone disclosing to us) to handle these and
38future vulnerabilities in these interfaces in public, as if they were
39normal non-security-related bugs.
40
41This applies only to bugs which do no more than reduce the security of
42a radically disaggregated system to the security of a
43non-disaggregated one.  Here a "radically disaggregated system" is one
44which uses the XSM mechanism to delegate the affected interfaces to
45other-than-fully-trusted domains.
46
47This policy does not apply to bugs which affect stub device models,
48driver domains, or stub xenstored - even if those bugs do no worse
49than reduce the security of such a system to one whose device models,
50backend drivers, or xenstore, run in dom0.
51
52For more information see http://xenbits.xen.org/xsa/advisory-77.html.
53
54The following interfaces are covered by this statement.  Interfaces
55not listed here are considered safe for disaggregation, security
56issues found in interfaces not listed here will be handled according
57to the normal security problem response policy
58http://www.xenproject.org/security-policy.html.
59
60__HYPERVISOR_domctl (xen/include/public/domctl.h)
61
62 All subops except the following are covered by this statement.  (That
63 is, only the subops below are considered safe for disaggregation.)
64
65 * XEN_DOMCTL_ioport_mapping
66 * XEN_DOMCTL_memory_mapping
67 * XEN_DOMCTL_bind_pt_irq
68 * XEN_DOMCTL_unbind_pt_irq
69
70__HYPERVISOR_sysctl (xen/include/public/sysctl.h)
71
72 All subops are covered by this statement.  (That is, no subops are
73 considered safe for disaggregation.)
74
75__HYPERVISOR_memory_op (xen/include/public/memory.h)
76
77 The following subops are covered by this statement. subops not listed
78 here are considered safe for disaggregation.
79
80 * XENMEM_set_pod_target
81 * XENMEM_get_pod_target
82 * XENMEM_claim_pages
83
84__HYPERVISOR_tmem_op (xen/include/public/tmem.h)
85
86 The following tmem control ops, that is the sub-subops of
87 TMEM_CONTROL, are covered by this statement.
88
89 Note that TMEM is also subject to a similar policy arising from
90 XSA-15 http://lists.xen.org/archives/html/xen-announce/2012-09/msg00006.html.
91 Due to this existing policy all TMEM Ops are already subject to
92 reduced security support.
93
94 * TMEMC_THAW
95 * TMEMC_FREEZE
96 * TMEMC_FLUSH
97 * TMEMC_DESTROY
98 * TMEMC_LIST
99 * TMEMC_SET_WEIGHT
100 * TMEMC_SET_CAP
101 * TMEMC_SET_COMPRESS
102 * TMEMC_QUERY_FREEABLE_MB
103 * TMEMC_SAVE_BEGIN
104 * TMEMC_SAVE_GET_VERSION
105 * TMEMC_SAVE_GET_MAXPOOLS
106 * TMEMC_SAVE_GET_CLIENT_WEIGHT
107 * TMEMC_SAVE_GET_CLIENT_CAP
108 * TMEMC_SAVE_GET_CLIENT_FLAGS
109 * TMEMC_SAVE_GET_POOL_FLAGS
110 * TMEMC_SAVE_GET_POOL_NPAGES
111 * TMEMC_SAVE_GET_POOL_UUID
112 * TMEMC_SAVE_GET_NEXT_PAGE
113 * TMEMC_SAVE_GET_NEXT_INV
114 * TMEMC_SAVE_END
115 * TMEMC_RESTORE_BEGIN
116 * TMEMC_RESTORE_PUT_PAGE
117 * TMEMC_RESTORE_FLUSH_PAGE
118
119
120
121Setting up FLASK
122----------------
123
124Xen must be compiled with XSM and FLASK enabled; by default, the security
125framework is disabled. Running 'make -C xen menuconfig' and enabling XSM
126and FLASK inside 'Common Features'; this change requires a make clean and
127rebuild.
128
129FLASK uses only one domain configuration parameter (seclabel) defining the
130full security label of the newly created domain. If using the example policy,
131"seclabel='system_u:system_r:domU_t'" is an example of a normal domain. The
132labels are in the same format as SELinux labels; see http://selinuxproject.org
133for more details on the use of the user, role, and optional MLS/MCS labels.
134
135FLASK policy overview
136---------------------
137
138Most of FLASK policy consists of defining the interactions allowed between
139different types (domU_t would be the type in this example). For simple policies,
140only type enforcement is used and the user and role are set to system_u and
141system_r for all domains.
142
143The FLASK security framework is mostly configured using a security policy file.
144It relies on the SELinux compiler "checkpolicy"; if this is available, the
145policy will be compiled as part of the tools build.  If hypervisor support for a
146built-in policy is enabled ("Compile Xen with a built-in security policy"), the
147policy will be built during the hypervisor build.
148
149The policy is generated from definition files in tools/flask/policy.  Most
150changes to security policy will involve creating or modifying modules found in
151tools/flask/policy/modules/.  The modules.conf file there defines what modules
152are enabled and has short descriptions of each module.
153
154If not using the built-in policy, the XSM policy file needs to be copied to
155/boot and loaded as a module by grub.  The exact position and filename of the
156module does not matter as long as it is after the Xen kernel; it is normally
157placed either just above the dom0 kernel or at the end.  Once dom0 is running,
158the policy can be reloaded using "xl loadpolicy".
159
160The example policy included with Xen demonstrates most of the features of FLASK
161that can be used without dom0 disaggregation. The main types for domUs are:
162
163 - domU_t is a domain that can communicate with any other domU_t
164 - isolated_domU_t can only communicate with dom0
165 - prot_domU_t is a domain type whose creation can be disabled with a boolean
166 - nomigrate_t is a domain that must be created via the nomigrate_t_building
167   type, and whose memory cannot be read by dom0 once created
168
169HVM domains with stubdomain device models also need a type for the stub domain.
170The example policy defines dm_dom_t for the device model of a domU_t domain;
171there are no device model types defined for the other domU types.
172
173One disadvantage of using type enforcement to enforce isolation is that a new
174type is needed for each group of domains. The user field can be used to address
175this for the most common case of groups that can communicate internally but not
176externally; see "Users and roles" below.
177
178Type transitions
179----------------
180
181Xen defines a number of operations such as memory mapping that are necessary for
182a domain to perform on itself, but are also undesirable to allow a domain to
183perform on every other domain of the same label. While it is possible to address
184this by only creating one domain per type, this solution significantly limits
185the flexibility of the type system. Another method to address this issue is to
186duplicate the permission names for every operation that can be performed on the
187current domain or on other domains; however, this significantly increases the
188necessary number of permissions and complicates the XSM hooks. Instead, this is
189addressed by allowing a distinct type to be used for a domain's access to
190itself. The same applies for a device model domain's access to its designated
191target, allowing the IS_PRIV_FOR checks used in Xen's DAC model to be
192implemented in FLASK.
193
194Upon domain creation (or relabel), a type transition is computed using the
195domain's label as the source and target. The result of this computation is used
196as the target when the domain accesses itself. In the example policy, this
197computed type is the result of appending _self to a domain's type: domU_t_self
198for domU_t. If no type transition rule exists, the domain will continue to use
199its own label for both the source and target. An AVC message will look like:
200
201    scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self
202
203A similar type transition is done when a device model domain is associated with
204its target using the set_target operation. The transition is computed with the
205target domain as the source and the device model domain as the target: this
206ordering was chosen in order to preserve the original label for the target when
207no type transition rule exists. In the example policy, these computed types are
208the result of appending _target to the domain.
209
210Type transitions are also used to compute the labels of event channels.
211
212Users and roles
213---------------
214
215The default user and role used for domains is system_u and system_r.  Users are
216visible in the labels of domains and associated objects (event channels); when
217the vm_role module is enabled, "user_1:vm_r:domU_t" is a valid label for a
218domain created by the user_1 user.
219
220Access control rules involving users and roles are defined in a module's
221constraints file (for example, vm_rule.cons). The vm_role module defines one
222role (vm_r) and three users (user_1 .. user_3), along with constraints that
223prevent different users from communicating using grants or event channels, while
224still allowing communication with the system_u user where dom0 resides.
225
226Resource Policy
227---------------
228
229The example policy also includes a resource type (nic_dev_t) for device
230passthrough, configured to allow use by domU_t. To label the PCI device 3:2.0
231for passthrough, run:
232
233	tools/flask/utils/flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t
234
235This command must be rerun on each boot or after any policy reload.
236
237When first loading or writing a policy, you should run FLASK in permissive mode
238(flask=permissive on the command line) and check the Xen logs (xl dmesg) for AVC
239denials before using it in enforcing mode (the default value of the boot
240parameter, which can also be changed using xl setenforce).  When using the
241default types for domains (domU_t), the example policy shipped with Xen should
242allow the same operations on or between domains as when not using FLASK.
243
244
245MLS/MCS policy
246--------------
247
248If you want to use the MLS policy, then set TYPE=xen-mls in the policy Makefile
249before building the policy.  Note that the MLS constraints in policy/mls
250are incomplete and are only a sample.
251
252
253AVC denials
254-----------
255
256XSM:Flask will emit avc: denied messages when a permission is denied by the
257policy, just like SELinux. For example, if the HVM rules are removed from the
258declare_domain and create_domain interfaces:
259
260# xl dmesg | grep avc
261(XEN) avc:  denied  { setparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
262(XEN) avc:  denied  { getparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
263(XEN) avc:  denied  { irqlevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
264(XEN) avc:  denied  { pciroute } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
265(XEN) avc:  denied  { setparam } for domid=4 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t tclass=hvm
266(XEN) avc:  denied  { cacheattr } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
267(XEN) avc:  denied  { pcilevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
268
269Existing SELinux tools such as audit2allow can be applied to these denials, e.g.
270xl dmesg | audit2allow
271
272The generated allow rules can then be fed back into the policy by adding them to
273a module, although manual review is advised and will often lead to adding
274parameterized rules to the interfaces in xen.if to address the general case.
275
276
277Device Labeling in Policy
278-------------------------
279
280FLASK is capable of labeling devices and enforcing policies associated with
281them. There are two methods to label devices: dynamic labeling using
282flask-label-pci or similar tools run in dom0, or static labeling defined in
283policy. Static labeling will make security policy machine-specific and may
284prevent the system from booting after any hardware changes (adding PCI cards,
285memory, or even changing certain BIOS settings). Dynamic labeling requires that
286the domain performing the labeling be trusted to label all the devices in the
287system properly.
288
289IRQs, PCI devices, I/O memory and x86 IO ports can all have labels defined.
290There are examples commented out in tools/flask/policy/policy/device_contexts.
291
292Device Labeling
293---------------
294
295The "lspci -vvn" command can be used to output all the devices and identifiers
296associated with them.  For example, to label an Intel e1000e ethernet card the
297lspci output is..
298
29900:19.0 0200: 8086:10de (rev 02)
300        Subsystem: 1028:0276
301        Interrupt: pin A routed to IRQ 33
302        Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
303        Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K]
304        Region 2: I/O ports at ecc0 [size=32]
305        Kernel modules: e1000e
306
307The labeling can be done with these lines in device_contexts:
308
309pirqcon 33 system_u:object_r:nicP_t
310iomemcon 0xfebe0-0xfebff system_u:object_r:nicP_t
311iomemcon 0xfebd9 system_u:object_r:nicP_t
312ioportcon 0xecc0-0xecdf system_u:object_r:nicP_t
313pcidevicecon 0xc800 system_u:object_r:nicP_t
314
315The PCI device label must be computed as the 32-bit SBDF number for the PCI
316device. It the PCI device is aaaa:bb:cc.d or bb:cc.d, then the SBDF can be
317calculated using:
318	SBDF = (a << 16) | (b << 8) | (c << 3) | d
319
320The AVC denials for IRQs, memory, ports, and PCI devices will normally contain
321the ranges being denied to more easily determine what resources are required.
322When running in permissive mode, only the first denial of a given
323source/destination is printed to the log, so labeling devices using this method
324may require multiple passes to find all required ranges.
325