1.. _hld-security:
2
3Security High-Level Design
4##########################
5
6.. primary author: Bing Zhu
7   contributor: Yadong Qi
8
9Introduction
10************
11
12This document describes the security high-level design in ACRN,
13including information about:
14
15-  Secure booting in ACRN
16-  Hypervisor security enhancement, including memory management, secure
17   hypervisor interfaces, etc.
18-  Platform security features virtualization, such as the virtualization
19   of TPM (vTPM) and SGX (vSGX)
20
21This document is for developers, validation teams, architects, and
22maintainers of ACRN.
23
24Readers should be familiar with the basic concepts of system
25virtualization and the ACRN hypervisor implementation.
26
27
28Background
29**********
30
31The ACRN hypervisor is a type-1 hypervisor, built for running multiple
32guest OS instances, typical of an automotive infotainment system, on a
33single Apollo Lake-I SoC platform. See :numref:`security-ACRN`.
34
35.. figure:: images/security-image-HV-overview.png
36   :width: 900px
37   :align: center
38   :name: security-ACRN
39
40   ACRN Hypervisor Overview
41
42This document focuses only on the security part of the automotive
43system built on top of the ACRN hypervisor. This includes how to build a
44secure system as well as how to virtualize the security features that
45the system can provide.
46
47Usages
48======
49
50As shown in :numref:`security-vehicle`, the ACRN hypervisor can be
51used to build a Software Defined Cockpit (SDC) or an In-Vehicle Experience
52(IVE) Solution that consolidates multiple VMs together on a single Intel
53SoC in-vehicle platform.
54
55.. figure:: images/security-image13.png
56   :width: 900px
57   :align: center
58   :name: security-vehicle
59
60   SDC and IVE System In-Vehicle
61
62
63In this system, the ACRN hypervisor is running at the most privileged
64level, VMX root mode, in virtualization technology terms. The hypervisor
65has full control of platform resources, including the processor, memory,
66devices, and in some cases, secrets of the guest OS. The ACRN
67hypervisor supports multiple guest VMs running in parallel in the less
68privileged level called VMX non-root mode.
69
70The Service VM is a special VM. While it runs as a guest VM in
71VMX non-root mode, it behaves as a privileged guest VM controlling the
72behavior of other guest VMs. The Service VM can create a guest VM, suspend and
73resume a guest VM, and provide device mediation services (Device
74Models) for other guest VMs it creates.
75
76In an SDC system, the Service VM also contains safety-critical IC (Instrument
77Cluster) applications. ACRN is designed to make sure the IC applications
78are well isolated from other applications in the Service VM such as Device
79Models (Mediators). A crash in other guest VM systems must not impact
80the IC applications, and must not cause any DoS (Deny of Service) attacks.
81Functional safety is out of scope of this document.
82
83In :numref:`security-ACRN`, the other guest VMs are referred to as User VM.
84These other VMs provide infotainment services (such as
85navigation, music, and FM/AM radio) for the front seat or rear seat.
86
87The User VM systems can be based on Linux (LaaG, Linux as a Guest) or
88Android (AaaG, Android as a Guest) depending on the customer's needs
89and board configuration. It can also be a mix of Linux and Android
90systems.
91
92In each User VM, a "side-car" OS system can accompany the normal OS system. We
93call these two OS systems "secure world" and
94"non-secure world", and they are isolated from each other by the
95hypervisor. The secure world has a higher "privilege level" than the non-secure
96world; for example, the secure world can access the non-secure world's
97physical memory but not vice versa. This document discusses how this
98security works and why it is required.
99
100Careful consideration should be made when evaluating using the Service
101VM as the Trusted Computing Base (TCB). The Service VM may be a
102fairly large system running many lines of code; thus, treating it as a
103TCB doesn't make sense from a security perspective. To achieve the
104design purpose of "defense in depth", system security designers
105should always ask themselves, "What if the Service VM is compromised?" and
106"What's the impact if this happens?" This HLD document discusses how to
107security-harden the Service VM system and mitigate attacks on the Service VM.
108
109ACRN High-Level Security Architecture
110*************************************
111
112This chapter provides a high-level architecture design overview of ACRN
113security features and their development.
114
115Secure / Verified Boot
116======================
117
118The security of the entire system built on top of the ACRN hypervisor
119depends on the security from platform boot to User VM launching. Each layer
120or module must verify the security of the next layer or module before
121transferring control to it. Verification can be checking a
122cryptographic signature on the executable of the next step before it is
123launched.
124
125Note that measured boot (as described well in this `boot security
126technologies document
127<https://firmwaresecurity.com/2015/07/29/survey-of-boot-security-technologies/>`_)
128is not supported for ACRN and its guest VMs.
129
130Boot Flow
131---------
132ACRN supports two verified boot sequences.
133
1341) Verified Boot Sequence With SBL
135~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136As shown in :numref:`security-bootflow-sbl`, the Converged Security Engine
137Firmware (CSE FW) behaves as the root of trust in this platform boot
138flow. It authenticates and starts the BIOS (SBL), whereupon the SBL is
139responsible for authenticating and verifying the ACRN hypervisor image.
140The Service VM kernel is built together with the ACRN hypervisor as
141one image bundle, so this whole image signature is verified by SBL
142before launching.
143
144.. figure:: images/security-image-bootflow-sbl.png
145   :width: 900px
146   :align: center
147   :name: security-bootflow-sbl
148
149   ACRN Boot Flow with SBL
150
1512) Verified Boot Sequence With UEFI
152~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153As shown in :numref:`security-bootflow-uefi`, in this boot sequence, UEFI
154authenticates and starts the ACRN hypervisor. Then the hypervisor returns
155to the UEFI environment to authenticate and load the Service VM kernel
156bootloader.
157
158.. figure:: images/security-image-bootflow-uefi.png
159   :width: 900px
160   :align: center
161   :name: security-bootflow-uefi
162
163   ACRN Boot Flow with UEFI
164
165As long as the Service VM kernel starts, the Service VM kernel will load all its
166subsystems subsequently. In order to launch a User VM, a DM process is
167started to launch the virtual BIOS (OVMF). Eventually, the OVMF is
168responsible for verifying and launching the User VM kernel (or the
169Android OS loader for an Android User VM).
170
171Secure Boot
172-----------
173
174In the entire boot flow, the chain of trust must be unbroken. This is
175achieved by the secure boot mechanism. Each module in the boot flow must
176authenticate and verify the next module by using a cryptographic digital
177signature algorithm.
178
179The well-known image signing algorithm uses cryptographic hashing and
180public key cryptography with PKCS1.5 padding.
181
182The 2018 minimal requirements for cryptographic strength are:
183
184#. SHA256 for image cryptographic hashing.
185#. RSA2048 for cryptographic digital signature signing and verification.
186
187We strongly recommend that SHA512 and RSA3072+ be used for a product shipped
188in 2018, especially for a product that has a long production life such as
189an automotive vehicle.
190
191The CSE FW image is signed with an Intel RSA private key. All other
192images should be signed by the responsible OEM. Our customers and
193partners are responsible for image signing, ensuring the key strength
194meets security requirements, and storing the secret RSA private key
195securely.
196
197Guest Secure Boot With OVMF
198---------------------------
199Open Virtual Machine Firmware (OVMF) is an EDK II based project to enable UEFI
200support for virtual machines in a virtualized environment. In ACRN, OVMF is
201deployed to launch a User VM, as if the User VM is booted on a machine with
202UEFI firmware.
203
204UEFI Secure Boot defines how a platform's firmware can authenticate a digitally
205signed UEFI image, such as an operating system loader or a UEFI driver stored
206in an option ROM. This provides the capability to ensure that those UEFI images
207are only loaded in an owner-authorized fashion and provides a common means to
208ensure the platform's security and integrity over systems running UEFI-based
209firmware.
210UEFI Secure Boot is already supported by OVMF.
211
212:numref:`security-secure-boot-uefi` shows a Secure Boot overview in UEFI.
213
214.. figure:: images/security-image-secure-boot-uefi.png
215   :width: 500px
216   :align: center
217   :name: security-secure-boot-uefi
218
219   UEFI Secure Boot Overview
220
221UEFI Secure Boot is controlled by a set of UEFI Authenticated Variables that specify
222the UEFI Secure Boot Policy; the platform manufacturer or the platform owner enrolls the
223policy objects, which include the n-tuple of keys {PK, KEK, db,dbx} as step 1.
224During each successive boot, the UEFI secure boot implementation will assess the
225policy in order to verify the signed images that are discovered in a host-bus adapter
226or on a disk. If the images pass the policy, they are invoked.
227
228UEFI Secure Boot implementations use these keys:
229
230#. Platform Key (PK) is the top-level key in Secure Boot; UEFI supports a single PK,
231   which is generally provided by the manufacturer.
232#. Key Exchange Key (KEK) is used to sign Signature and Forbidden Signature Database updates.
233#. Signature Database (db) contains keys and/or hashes of allowed EFI binaries.
234
235And keys and certificates are in multiple formats:
236
237#. ``.key``  PEM format private keys for EFI binary and EFI signature list signing.
238#. ``.crt``  PEM format certificates for sbsign.
239#. ``.cer``  DER format certificates for firmware.
240
241In ACRN, User VM Secure Boot can be enabled as follows:
242
243#. Generate keys (PK/KEK/DB) with a key generation tool such as Ubuntu
244   KeyGeneration. ``PK.der``, ``KEK.der``, and ``db.der`` will be enrolled in UEFI
245   BIOS. ``db.key`` and ``db.crt`` will be used to sign the User VM
246   bootloader/kernel.
247#. Create a virtual disk to hold ``PK.der``, ``KEK.der``, and ``db.der``, then launch
248   the User VM with this virtual disk.
249#. Start the OVMF in writeback mode to ensure the keys are persistently stored
250   in the OVMF image.
251#. Enroll the keys in the OVMF GUI by following the Secure Boot configuration
252   flow and enable Secure Boot mode.
253#. Perform writeback via reset in OVMF.
254#. Sign the User VM images with ``db.key`` and ``db.crt``.
255#. Boot the User VM with Secure Boot enabled.
256
257.. _service_vm_hardening:
258
259Service VM Hardening
260--------------------
261
262In the ACRN project, the reference Service VM is based on Ubuntu.
263Customers may choose to use different open source OSes or their own
264proprietary OS systems. To minimize the attack surfaces and achieve the
265goal of "defense in depth", there are many common guidelines to ensure the
266security of the Service VM system.
267
268As shown in :numref:`security-bootflow-sbl` and
269:numref:`security-bootflow-uefi` above, the integrity of the User VM
270depends on the integrity of the DM module and vBIOS/vOSloader in the
271Service VM. Hence, Service VM integrity is critical to the entire User VM security.
272If the Service VM  system is compromised, all the other User VMs may be
273jeopardized.
274
275In practice, the Service VM  designer and implementer should obey at least the
276following rules:
277
278#. Verify that the Service VM is a closed system and doesn't allow the user to
279   install any unauthorized third-party software or components.
280#. Verify that external peripherals are constrained.
281#. Enable kernel-based hardening techniques, for example, dm-verity (to
282   ensure the integrity of the DM and vBIOS/vOSloaders), and kernel module
283   signing.
284#. Enable system level hardening such as MAC (Mandatory Access Control).
285
286Detailed configurations and policies are out of scope for this document.
287For good references on OS system security hardening and enhancement,
288see `AGL security
289<https://docs.automotivelinux.org/en/lamprey/#2_Architecture_Guides/2_Security_Blueprint/0_Overview/>`_
290and `Android security <https://source.android.com/security/>`_.
291
292Hypervisor Security Enhancement
293===============================
294
295This section describes the ACRN hypervisor security enhancement for
296memory boundary access and interfaces between VMs and the hypervisor,
297such as Hypercall APIs, I/O emulations, and EPT violation handling.
298
299The main security goal of the ACRN hypervisor design is to prevent
300Privilege Escalation and enforce Isolation, for example:
301
302-  VMM privilege escalation (VMX non-root -> VMX root)
303-  Non-secure OS software (running in AaaG) accessing secure world TEE
304   assets
305-  Unauthorized software from executing in the hypervisor
306-  Cross-guest VM attacks
307-  Hypervisor secret information leakage
308
309Memory Management Enhancement
310-----------------------------
311
312Background
313~~~~~~~~~~
314
315The ACRN hypervisor has ultimate access control of all the platform
316memory spaces (see :ref:`memmgt-hld`). Note that on the APL platform,
317`SGX <https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html>`_ and `TME
318<https://itpeernetwork.intel.com/memory-encryption/>`_
319are not supported.
320
321The hypervisor can read and write any physical memory space allocated
322to any guest VM, and can even fetch instructions and execute the code in
323the memory space from any guest VM. If the hypervisor has MMU
324misconfiguration or is compromised by an attacker, it must be
325constrained in some manner to prevent the hypervisor from accessing
326guest memory space either maliciously or accidentally. As a best
327security practice, any memory content from a guest VM memory space must
328not be trusted by the hypervisor. In other words, there must be a trust
329boundary for memory space between the hypervisor and guest VMs.
330
331.. figure:: images/security-image14.png
332   :width: 500px
333   :align: center
334   :name: security-hgmem
335
336   Hypervisor and Guest Memory Layout
337
338The hypervisor must appropriately configure the EPT tables to disallow
339any guest to access (read/write/execution) the memory space owned by
340the hypervisor.
341
342Memory Access Restrictions
343~~~~~~~~~~~~~~~~~~~~~~~~~~
344
345The fundamental rules of restricting hypervisor memory access are:
346
347#. By default, prohibit any access to all guest VM memory. This means
348   that when the hypervisor initially sets up its own MMU paging tables
349   (HVA->HPA mapping), it only grants permissions for hypervisor memory
350   space (excluding guest VM memory).
351#. Grant access permission for the hypervisor to read/write a specific guest
352   VM memory region on demand. The hypervisor must never grant execution
353   permission for itself to fetch any code instructions from guest
354   memory space because there is no reason to do that.
355
356In addition to these rules, the hypervisor must also implement generic
357best-practice memory configurations for access to its own memory in host
358CR3 MMU paging tables, such as splitting hypervisor code and data
359(stack/heap) sections, and then applying W |oplus| X policy, which means if memory
360is Writable, then the hypervisor must make it non-eXecutable. The
361hypervisor must configure its code as read-only and executable, and
362configure its data as read/write. Optionally, if there are read-only
363data sections, it would be best if the hypervisor configures them as
364read-only.
365
366The following sections focus on the rules mentioned above for
367memory access restriction on guest VM memory (not restrictions on the
368hypervisor's own memory access).
369
370SMAP/SMEP Enablement in the Hypervisor
371~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
372
373For the hypervisor to isolate access to the guest VM memory space,
374three typical solutions exist:
375
376#. **Configure the hypervisor/VMM MMU CR3 paging tables by removing the
377   execution permission (setting NX bit) or removing mapping completely
378   (setting not-present) for the guest memory space.**
379
380   In practice, this works very well for NX setting to disable
381   instruction fetching from any guest memory space. However, it is not
382   suitable for read/write access isolation. For example, if the
383   hypervisor removes the mapping to a guest memory page in host CR3
384   paging tables, when the hypervisor wants to access that specific
385   guest memory page, the hypervisor must first add mapping back to its
386   CR3 paging tables before accessing that page, and revert the mapping
387   after accessing.
388
389   This remapping causes code complexity and a performance penalty and
390   may even require the hypervisor to flush the TLB. This solution won't
391   be used by the ACRN hypervisor.
392
393#. **Use CR0.WP (write-protection) bit.**
394
395   This processor feature allows
396   pages to be protected from supervisor-mode write access.
397   If the host/VMM CR0.WP = 0, supervisor-mode write access is
398   allowed to linear addresses with read-only access rights. If CR0.WP =
399   1, they are not allowed. User-mode write access is never allowed
400   for linear addresses with read-only access rights, regardless of the
401   value of CR0.WP.
402
403   To implement this WP protection, the hypervisor must first configure
404   all the guest memory space as "user-mode" accessible memory, and as
405   read-only access. In other words, the corresponding paging table
406   entry U/S bit and R/W bit must be set in host CR3 paging tables for
407   all those guest memory pages.
408
409   .. figure:: images/security-image3.png
410      :width: 500px
411      :align: center
412      :name: security-gmem
413
414      Configure Guest Memory as User-accessible
415
416   This setting seems meaningless since all the code in the ACRN hypervisor
417   is running in Ring 0 (supervisor-mode), and no code in the hypervisor
418   will be executed in Ring 3 (no user-mode applications in the hypervisor /
419   vmx-root).
420
421   However, these settings are made in order to make use of the CR0.WP
422   protection capability, because if CR0.WP = 1, if the hypervisor code is
423   running in Ring 0 and maliciously attempts to write a user-accessible
424   read-only memory page (in guest memory space), then this malicious
425   behavior can be thwarted with a page fault (#PF) by the processor in the
426   hypervisor. Whenever the hypervisor has a valid reason to have a write
427   access to user-accessible read-only memory (guest memory), it can
428   disable CR0.WP (clear CR0.WP) before writing, and then set CR0.WP
429   back to 1.
430
431   This solution is better than the 1st solution above because it doesn't
432   need to change the host CR3 paging tables to map or unmap guest memory
433   pages and doesn't need to flush the TLB.
434   However, it cannot prevent the hypervisor (running in Ring 0 mode) from
435   reading guest memory space because this CR0.WP bit doesn't control read
436   access behaviors. This read access protection is essentially required
437   because sometimes there may be secrets in guest memory and if the
438   hypervisor can be hacked to read those memory contents, then it may
439   cause secret leaking to attackers.
440
4413. **Use processor SMEP and SMAP capabilities.**
442
443   This solution is the best solution because SMAP can prevent the
444   hypervisor from both reading and writing guest memory, and SMEP can
445   prevent the hypervisor from fetching/executing code in guest memory. This
446   solution also has minimal performance impact; like the CR0.WP
447   protection, it doesn't require TLB flush (incurring a performance
448   penalty) and has less code complexity.
449
450The following sections will focus on this SMEP/SMAP protection. SMEP
451and SMAP are widely used by all modern Operating System software such as
452Windows and Linux, for isolating kernel and user memory, and can
453mitigate many vulnerability exploits.
454
455Guest Memory Execution Prevention
456+++++++++++++++++++++++++++++++++
457
458SMEP is designed to prevent user memory malware (typically
459attacker-supplied) from being executed in the kernel (Ring 0) privilege
460level.  As long as the CR4.SMEP = 1, software operating in supervisor
461mode cannot fetch instructions from linear addresses that are accessible
462in user mode.
463
464In the ACRN hypervisor, the attacker-supplied memory could be any guest
465memory, because the hypervisor doesn't trust all the data/code from guest
466memory by design.
467
468In order to activate SMEP protection, the ACRN hypervisor must:
469
470#. Configure all the guest memory as user-accessible memory (U/S = 1).
471   No matter what settings for NX bit and R/W bit in corresponding host
472   CR3 paging tables.
473#. Set CR4.SMEP bit. In the entire life cycle of the hypervisor, this bit
474   value always remains one.
475
476As an alternative, NX feature is used for this purpose by setting the
477corresponding NX (non-execution) bit for all the guest memory mapping
478in host CR3 paging tables.
479
480Since the hypervisor code never runs in Ring 3 mode, either of these two
481solutions works very well. Both solutions are enabled in the ACRN
482hypervisor.
483
484Guest Memory Access Prevention
485++++++++++++++++++++++++++++++
486
487Supervisor Mode Access Prevention (SMAP) is yet another powerful
488processor feature that makes it harder for malware to
489"trick" the kernel into using instructions or data from a user-space
490application program.
491
492This feature is controlled by the CR4.SMAP bit. When that bit is set,
493any attempt to access user-accessible memory pages while running in a
494privileged or kernel mode will lead to a page fault.
495
496However, there are times when the kernel legitimately needs to work with
497user-accessible memory pages. The Intel processor defines a separate
498"AC" flag (in RFLAGS register) that control the SMAP feature. If the AC
499flag is clear, SMAP protection is in force when CR4.SMAP=1; otherwise
500access to user-accessible memory pages is allowed even if CR4.SMAP=1.
501The "AC" flag provides suppression for SMAP enforcement.
502
503To manipulate that flag relatively quickly, STAC (set AC flag) and CLAC
504(clear AC flag) instructions are introduced for this purpose. Note that
505STAC and CLAC can only be executed in kernel mode (CPL=0).
506
507To activate SMAP protection in the ACRN hypervisor:
508
509#. Configure all the guest memory as user-writable memory (U/S bit = 1,
510   and R/W bit = 1) in corresponding host CR3 paging table entries, as
511   shown in :numref:`security-smap` below.
512#. Set CR4.SMAP bit. In the entire life cycle of the hypervisor, this bit
513   value always remains one.
514#. When needed, use STAC instruction to suppress SMAP protection, and
515   use CLAC instruction to restore SMAP protection.
516
517.. figure:: images/security-image5.png
518   :width: 500px
519   :align: center
520   :name: security-smap
521
522   Setting SMAP and Configuring U/S=1, R/W=1 for All Guest Memory Pages
523
524For example, :numref:`security-smap` shows a module of the hypervisor code
525(running in Ring 0 mode) attempting to perform a legitimate read (or
526write) access to a data area in guest memory page.
527
528.. figure:: images/security-image4.png
529   :width: 500px
530   :align: center
531   :name: security-hagm
532
533   Hypervisor Access to Guest Memory
534
535The hypervisor can do these steps:
536
537#. Execute STAC instruction to suppress SMAP protection.
538#. Perform read/write access on guest DATA area.
539#. Execute CLAC instruction to restore SMAP protection.
540
541The attack surface can be minimized because there is only a
542very small window between step 1 and step 3 in which the guest memory
543can be accessed by hypervisor code running in ring 0.
544
545Rules to Access Guest Memory in the Hypervisor
546~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
547
548In the ACRN hypervisor, functions ``stac()`` and ``clac()`` wrap
549STAC and CLAC instructions respectively, and functions
550``copy_to_gpa()`` and ``copy_from_gpa()`` can be used to copy
551an arbitrary amount of data to or from the VM memory area.
552
553Whenever the hypervisor needs to perform legitimate read/write access to
554guest memory pages, one of the functions above must be used. Otherwise, the
555#PF will be triggered by the processor to prevent malware or
556unintended access from or to the guest memory pages.
557
558These functions must also internally check the address availabilities,
559for example, ensuring the input address accessed by the hypervisor must have
560a valid mapping (GVA->GPA mapping, GPA->HPA EPT mapping and HVA->HPA
561host MMU mapping), and must not be in the range of the hypervisor memory.
562Details of these ordinary checks are out of scope in this document.
563
564
565Avoidance of Memory Information Leakage
566---------------------------------------
567
568Protecting the hypervisor's memory is critical to the security of the
569entire platform. The hypervisor must prevent any memory content (e.g.,
570stack or heap) from leaking to guest VMs. Some of the hypervisor memory
571content may contain platform secrets such as SEEDs, which are used as
572the root key for its guest VMs. `Xen Advisories
573<https://xenbits.xen.org/xsa/>`_ have many examples of past hypervisor
574memory leaks, ACRN developers can refer to this link to understand how
575to avoid this in coding.
576
577Memory content from one guest VM might be leaked to another guest VM.
578In ACRN and Device Model design, when one guest VM is destroyed or
579crashes, its memory content should be scrubbed either by the hypervisor
580or the Service VM Device Model process, in case its memory content is
581re-allocated to another guest VM that could otherwise leave the
582previous guest VM secrets in memory.
583
584.. _secure-hypervisor-interface:
585
586Secure Hypervisor Interface
587---------------------------
588
589Hypercall API Interface Hardening
590~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
591
592The hypercall API is the primary interface between a guest VM and the
593hypervisor.
594
595.. figure:: images/security-image-HC-interface-restriction.png
596   :width: 900px
597   :align: center
598   :name: security-hir
599
600   Hypercall Interface Restriction
601
602As shown in :numref:`security-hir`, there are some restrictions for
603hypercall invocation in the hypervisor design:
604
605#. Hypercalls from ring 1~3 of any guest VM are not allowed. The
606   hypervisor must discard such hypercalls and inject ``#GP(0)`` instead. Only ring-0
607   hypercalls from the guest VM are handled by the hypervisor.
608#. All the hypercalls (except world\_switch hypercall) must be called
609   from the ring-0 driver of the Service VM.
610   World\_switch Hypercall is used by the TIPC (Trusty IPC) driver to
611   switch guest VM context between secure world and non-secure world.
612   Further details will be discussed in the :ref:`secure_trusty` section.
613   When a vCPU issues an unpermitted hypercall, the hypervisor shall either
614   inject ``#UD`` (if the VM cannot issue hypercalls at all) or return ``-EINVAL``
615   (if the VM is allowed to issue hypercalls but not this specific one).
616#. For those hypercalls that may result in data inconsistent intra hypervisor
617   when they are executed concurrently, such as ``hcall_create_vm()`` or
618   ``hcll_destroy_vm()``, spinlock is used to ensure these hypercalls
619   are processed in the hypervisor in a serializing way.
620
621In addition to the above rules, there are other regular checks in the
622hypercall implementation to prevent hypercalls from being misused. For
623example, all the parameters must be sanitized, unexpected hypervisor
624memory overwrite must be avoided, any hypervisor memory content/secrets
625must not be leaked to guests, and any memory/code injection must be
626eliminated.
627
628I/O Emulation Handler
629~~~~~~~~~~~~~~~~~~~~~
630
631I/O port monitoring is also widely used by the ACRN hypervisor to
632emulate legacy I/O access behaviors.
633
634Typically, the I/O instructions could be IN, INS/INSB/INSW/INSD, OUT,
635OUTS/OUTSB/OUTSW/OUTSD with arbitrary port (although not all the I/O
636ports are monitored by the hypervisor). As with other interfaces (e.g.,
637hypercalls), the hypervisor performs security checks for all the I/O
638access parameters to make sure the emulation behaviors are correct.
639
640EPT Violation Handler
641~~~~~~~~~~~~~~~~~~~~~
642
643The Extended Page Table (EPT) is typically used by the hypervisor to
644monitor MMIO (or other types of ordinary memory access) operation from a
645guest VM. The hypervisor then emulates the MMIO instructions with design
646behaviors.
647
648As done for I/O emulation, this interface could also be manipulated by
649malware in a guest VM to compromise system security.
650
651Other VMEXIT Handlers
652~~~~~~~~~~~~~~~~~~~~~
653
654There are some other VMEXIT handlers in the hypervisor that might take
655untrusted parameters and registers from a guest VM, for example, MSR write
656VMEXIT, APIC VMEXIT.
657
658Sanity checks are performed by the hypervisor to avoid security issues when
659handling those special VMEXIT.
660
661Guest Instruction Emulation
662~~~~~~~~~~~~~~~~~~~~~~~~~~~
663
664Instruction emulation implemented by the hypervisor must also be checked
665securely. Emulating x86 instruction is complicated, and there are many
666known security CVEs reported by attackers in the KVM/XEN/QEMU
667community. This is a "hotspot" where the hypervisor may potentially
668have vulnerability bugs.
669
670Security validation process and secure code review must ensure all the
671instruction emulations behave as defined in the `IA32 SDM
672document <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html>`_.
673
674Virtual Power Life Cycle Management
675-----------------------------------
676
677In a virtualization environment, each User VM can have its
678virtual power managed just like native behavior. For example, if a User VM
679is required to enter S3 (Suspend to RAM) for power consumption saving,
680then the hypervisor and DM processor in the Service VM must handle it correctly.
681Similarly, virtual cold/warm reboot is also supported. How to implement
682virtual power life cycle management is out of scope in this document.
683
684This subsection is intended to describe the security issues for those
685power cycles.
686
687User VM Power On and Shutdown
688~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
689
690The memory of the User VM is allocated dynamically by the DM
691process in the Service VM before the User VM is launched. When the User VM
692is shut down (or crashed), its memory will be freed to Service VM memory space.
693Later on, if there is a new User VM launch event occurring, DM may potentially allocate
694the same memory content (or some overlaps) for this new User VM.
695
696In the virtualization environment, a security goal is to ensure User VM
697isolation, not only for runtime memory isolation (e.g., with EPT),
698but also for data at rest isolation.
699
700Under this situation, if the memory content of a previous User VM is not
701scrubbed by either DM or the hypervisor, then the new launched User VM could
702access the previous User VM's secrets by scanning the memory regions
703allocated for the new User VM.
704
705In ACRN, the memory content is scrubbed in the Device Model after the guest
706VM is shut down.
707
708User VM Reboot
709~~~~~~~~~~~~~~
710
711The behaviors of **cold** boot of virtual User VM reboot are the same as those of
712previous virtual power-on and shutdown events. There is a special case:
713virtual **warm** reboot.
714
715When a User VM encounters a panic, its kernel may trigger a warm reboot, so
716that in the next power cycle, a special purpose-built OS image is
717launched to dump the memory content for debugging analysis. In a warm
718reboot, the memory content must be preserved after a virtual power
719cycle. However, this violates the security rules above.
720
721This typically is fine in project ACRN, because in the next virtual
722power cycle, the same memory content won't be re-allocated to another
723User VM.
724
725But there is a new issue when the secure world (TEE/Trusty) is considered,
726because the memory content of the secure world must not be dumped by a
727non-secure world User VM. More details will be discussed in
728the section on :ref:`platform_root_of_trust`.
729
730Normally, this warm reboot (crashdump) feature is a debug feature, and
731must be disabled in a production release. Users who want to use this
732feature must possess the private signing key to re-sign the image after
733enabling the configuration.
734
735.. _user_vm_suspend_resume:
736
737User VM Suspend/Resume
738~~~~~~~~~~~~~~~~~~~~~~
739
740There are no special design considerations for normal User VMs without secure
741world supported, as long as the EPT/VT-d memory protection/isolation is
742active during the entire suspended time.
743
744The secure world (Trusty/TEE) is a special case for virtual suspend. Unlike
745the non-secure world of User VMs, whose memory content can be read/written by
746the Service VM, the memory content of the secure world of User VMs must not be
747visible to the Service VM. This is designed for security with defense in depth.
748
749During the entire process of User VM sleep/suspend, the memory protection
750for the secure world is preserved too. The physical memory region of the
751secure world is removed from EPT paging tables of any guest VM,
752even including the Service VM.
753
754Third-Party Libraries
755---------------------
756
757All the third-party libraries must be examined before use to verify
758there are no known vulnerabilities in the library source code.
759Typically, the CVE site https://cve.mitre.org/cve/search_cve_list.html
760can be used to search for known vulnerabilities.
761
762.. _platform_root_of_trust:
763
764Platform Root of Trust Key/Seed Derivation
765==========================================
766
767For security reasons, each guest VM requires a root key, which is used to
768derive many other individual keys for different purposes, for example,
769secure storage encryption, keystore master key, and HMAC keys.
770
771In the APL platform, CSE FW will generate platform SEED (pSEED, 256bit)
772unique per device since it is derived from a unique chipset secret
773burned into the chip.
774
775Then on each boot, the SBL BIOS is responsible for retrieving the pSEED
776from CSE FW, and deriving two other derivatives (dSEED, and uSEED).
777
778.. figure:: images/security-image-platform-seed-derivation.png
779   :width: 900px
780   :align: center
781   :name: security-seed
782
783   Platform SEED (pSEED) Derivation
784
785As shown in :numref:`security-seed` above, the hypervisor then derives
786multiple child SEEDs for multiple guest VMs. A guest VM must not be able
787to know the SEEDs of any other guest VMs.
788
789The algorithm used in the hypervisor to derive keys is HKDF (HMAC-based
790Extract-and-Expand Key Derivation Function), `RFC5869
791<https://tools.ietf.org/html/rfc5869>`_.  The crypto library `mbedtls
792<https://github.com/ARMmbed/mbedtls>`_ has been chosen for project ACRN.
793
794The parameters of HKDF derivation in the hypervisor are:
795
796#. VMInfo= vm name (from the hypervisor configuration file)
797#. theHash=SHA-256
798#. OutSeedLen = 64 in bytes
799#. Guest Dev and User SEED (dvSEED/uvSEED)
800
801   ``dvSEED = HKDF(theHash, nil, dSEEd, VMInfo\|"devseed", OutSeedLen)``
802
803   ``uvSEED = HKDF(theHash, nil, uSEEd, VMInfo\|"userseed", OutSeedLen)``
804
805.. _secure_trusty:
806
807Secure Isolated World (Trusty)
808==============================
809
810This section explains how to build a secure isolated world in a specific
811guest VM such as the Android User VM. (See :ref:`trusty_tee` for more
812information.)
813
814On the APL platform, the secure world is used to run a
815virtualization-based Trusty TEE in an isolated world that serves
816Android as a Guest (AaaG) to get Google's Android relevant certificates
817by fulfilling Android CDD requirements. Also as a plan, Trusty will be
818supported to provide security services for LaaG User VMs as well.
819
820Refer to this Google website for `Trusty details
821<https://source.android.com/security/trusty/>`_ and for `Android CCD
822documents <https://source.android.com/compatibility/cdd>`_.
823
824Secure World Architecture Design
825--------------------------------
826
827To support a VT-TEE (Virtualization Technology based TEE) Trusty on
828ACRN, the hypervisor creates an isolated secure world in a User VM.
829
830.. figure:: images/security-image-secure-world.png
831   :width: 900px
832   :align: center
833   :name: security-secure-world
834
835   Secure World
836
837In :numref:`security-secure-world`, the Trusty OS runs in the User VM secure
838world and a Linux- or Android-based User VM runs in the non-secure world.
839
840By design, the secure world is able to read and write to all the non-secure
841world's memory space. But non-secure world applications cannot have
842access to the secure world's memory. This is guaranteed by switching
843different EPT tables when a world switch (WS) hypercall is invoked. The
844WS hypercall can have parameters to specify the services cmd ID
845requested from the non-secure world.
846
847To design the "one VM, two worlds" architecture, there is a single
848User VM structure per-User VM in the hypervisor, but two vCPU structures that
849save non-secure world and secure world virtual logical processor states
850respectively.
851
852Whenever there is a WS hypercall from the non-secure world, the hypervisor
853will copy non-secure world CPU contexts from Guest VMCS to the non-secure
854world-vCPU structure for saving contexts, and then copy secure-world CPU
855contexts from the secure-world-vCPU structure to Guest VMCS, then do
856VMRESUME to the secure-world, and vice versa. The EPTP pointer will also be
857updated accordingly in VMCS (not shown in
858:numref:`security-secure-world`).
859
860Trusty (Secure World) Memory Mapping View
861-----------------------------------------
862
863As per the secure world design, Trusty can have read/write access to the
864non-secure world's memory, but the non-secure world cannot access the Trusty
865secure world's memory. In the hypervisor EPT configuration shown in
866:numref:`security-mem-view` below, the secure world EPTP page table
867hierarchy must contain the non-secure world address space, while the Trusty
868world's address space must be removed from the non-secure world EPTP
869page table hierarchy.
870
871Since there is no need to allow Trusty to execute memory from the non-secure
872world, for security reasons, the execution (X) permission must be removed
873for the non-secure world address space in the secure world EPT table
874configuration.
875
876To save page tables and share the mappings for the non-secure world address
877space, the hypervisor relocates the secure world's GPA to a very high
878position: 511G-512G. Hence, the PML4 for Trusty World is separated from the
879non-secure world. PDPT/PD/PT for low memory (<511G) are shared in both the
880Trusty World's EPT and non-secure world's EPT. PDPT/PD/PT for high
881memory (>=511G) are valid for the Trusty World's EPT only.
882
883.. figure:: images/security-image8.png
884   :width: 900px
885   :align: center
886   :name: security-mem-view
887
888   Memory View for User VM Non-secure World and Secure World
889
890Trusty/Tee Hypercalls
891---------------------
892
893Two hypercalls are introduced to assist in secure world (Trusty/TEE)
894execution on top of the hypervisor.
895
896Hypercall - Trusty Initialization
897~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
898
899When a User VM is created by the DM in the Service VM, if this User VM
900supports a secure isolated world, then this hypercall will be invoked
901by OSLoader (it could be the Android OS loader in
902:numref:`security-bootflow-sbl` and
903:numref:`security-bootflow-uefi` above) to create or initialize the
904secure world (Trusty/TEE).
905
906.. figure:: images/security-image9.png
907   :width: 900px
908   :align: center
909   :name: security-start-flow
910
911   Secure World Start Flow
912
913In :numref:`security-start-flow` above, the OSLoader is responsible for
914loading the TEE/Trusty image to a dedicated and reserved memory region, and
915locating its entry point of TEE/Trusty executable, then executes a
916hypercall that exits to the hypervisor handler.
917
918In the hypervisor, from a security perspective, it removes GPA->HPA
919mapping of the secure world from EPT paging tables of both the User VM
920non-secure world and even the Service VM. This is intended to disallow the
921non-secure world and Service VM to access the memory region of the secure world
922for security reasons as previously mentioned.
923
924After all is set up by the hypervisor, including vCPU context
925initialization, the hypervisor eventually does vmresume (step 4 in
926:numref:`security-start-flow` above) to the entry point of the secure world
927TEE/Trusty, then the Trusty OS gets started in VMX non-root mode to
928initialize itself, and loads its TAs (Trusted Applications) so that the
929security services can be ready right before the non-secure OS gets started.
930
931After the Trusty OS completes its initialization, a world switching (WS, see
932subsection below) hypercall is invoked (step 5 in
933:numref:`security-start-flow` above), and then the hypervisor takes
934control back, and resumes to the OSLoader (step 6 in
935:numref:`security-start-flow` above) to continue execution in the guest
936VM non-secure world context.
937
938Note that this Trusty initialization hypercall can only be called once
939in the User VM life cycle.
940
941Hypercall - Trusty Switching
942~~~~~~~~~~~~~~~~~~~~~~~~~~~~
943
944There is another special hypercall introduced only for world switching
945between the non-secure world and secure world in a User VM.
946
947.. figure:: images/security-image-world-switching-HC.png
948   :width: 900px
949   :align: center
950   :name: security-ws
951
952   World Switching Hypercall
953
954Whenever this hypercall is invoked in a User VM, the hypervisor will
955unconditionally switch to the other world. For example, if it is called
956in the non-secure world, the hypervisor will then switch context to the secure
957world. After the secure world completes its security tasks (or an external
958interrupt occurs), this hypercall will be called again, then the hypervisor
959will switch context back to the non-secure world.
960
961During the entire world switching process, the Service VM is not involved. This
962hypervisor is only available to a User VM with duo-worlds supported.
963
964Secure Storage Virtualization
965-----------------------------
966
967Secure storage is one of the security services provided by the secure world
968(TEE/Trusty). In the current implementation, secure storage is built up
969on the RPMB partition in eMMC (or UFS, and NVMe storage). Details of how
970RPMB works are out of scope for this document.
971
972Since the eMMC in APL SoC platforms only has a single RPMB
973partition for tamper-resistant and anti-replay secure storage, the
974secure storage (RPMB) should be virtualized in order to support multiple
975guest User VMs. However, although future generations of flash storage
976(e.g., UFS 3.0 and NVMe) support multiple RPMB partitions, this
977document still only focuses on the virtualization solution for
978single-RPMB flash storage devices in APL SoC platforms.
979
980The following :numref:`security-storage` illustrates the virtualization
981of secure storage high-level architecture overview.
982
983.. figure:: images/security-image-secure-storage-virt.png
984   :width: 900px
985   :align: center
986   :name: security-storage
987
988   Secure Storage Virtualization
989
990In :numref:`security-storage`, the rKey is the physical RPMB
991authentication key used for data authenticated read/write access between
992the Service VM kernel and the physical RPMB controller in eMMC device.  The
993VrKey is the virtual RPMB authentication key used for authentication
994between the DM module in the Service VM and its corresponding User VM secure software.
995Each User VM (if secure storage is supported) has its own VrKey, generated
996randomly when the DM process starts, and is securely distributed to the User VM
997secure world for each reboot. The rKey is fixed on a specific platform
998unless the eMMC is replaced with another one.
999
1000The details of physical RPMB key (rKey) provisioning are out of scope.  In
1001the current project ACRN on APL platforms, the rKey is provisioned by
1002BIOS (SBL) right after a production device ends its manufacturing process.
1003
1004For each reboot, the BIOS/SBL always retrieves the rKey from CSE FW
1005(or generated from a special SEED that is retrieved from CSE FW, refer
1006to :ref:`platform_root_of_trust`). The SBL hands this over to the
1007ACRN hypervisor, and the hypervisor in turn sends it to the Service VM kernel.
1008
1009As an example, secure storage virtualization workflow for data write
1010access is like this:
1011
1012#. User VM secure world (e.g., Trusty) packs the encrypted data and signs it
1013   with the vRPMB authentication key (VrKey), and sends the data along
1014   with its signature over the RPMB FE driver in the User VM non-secure world.
1015#. After the DM process in the Service VM receives the data and signature, the
1016   vRPMB module in the DM verifies them with the shared secret (vRPMB
1017   authentication key, VrKey).
1018#. If verification is successful, the vRPMB module does data address remap
1019   (remembering that the multiple User VMs share a single physical RPMB
1020   partition), and forwards the data to the Service VM kernel. The kernel packs
1021   the data and signs it with the physical RPMB authentication key
1022   (rKey). Eventually, the data and its signature will be sent to the
1023   physical eMMC device.
1024#. If the verification is successful in the eMMC RPMB controller, the
1025   data will be written into the storage device.
1026
1027This workflow of authenticated data read is very similar to this flow
1028above, but in reverse order.
1029
1030Note that there are some security considerations in this design:
1031
1032#. The rKey protection is very critical in this system. If  it is
1033   leaked, an attacker can overwrite the data on RPMB, which
1034   violates the "tamper-resistant & anti-replay" capability.
1035#. Typically, the vRPMB module in the DM process of the Service VM system can
1036   filter
1037   data access, preventing one User VM from performing read/write access to the
1038   data from another User VM. If the vRPMB module in the DM process is
1039   compromised, one User VM may also change/overwrite the secure data of the
1040   other User VM.
1041
1042Keeping the Service VM system as secure as possible is a very important goal in
1043the system security design. Follow the recommendations in
1044:ref:`service_vm_hardening`.
1045
1046SEED Derivation
1047---------------
1048
1049Refer to the previous section: :ref:`platform_root_of_trust`.
1050
1051Trusty/TEE S3 (Suspend to RAM)
1052------------------------------
1053
1054Secure world S3 design is not yet finalized. However, there is a
1055temporary solution as explained below to make it work on top of ACRN.
1056
1057Two new hypercalls are introduced: one saves the secure world processor
1058contexts/states; the other one restores the secure world processor
1059contexts/states.
1060
1061The save state hypercall is called only in the secure world (Trusty/TEE OS)
1062as long as the Trusty receives a signal when the entire system (actually
1063the non-secure OS issues this power event) is about to enter S3. While
1064the restore state hypercall is called only by vBIOS when the User VM is ready to
1065resume from suspend state.
1066
1067For security design considerations of handling secure world S3,
1068read the previous section: :ref:`user_vm_suspend_resume`.
1069
1070Platform Security Feature Virtualization and Enablement
1071=======================================================
1072
1073This section talks about how the hypervisor enables host CPU features
1074(e.g., SGX) and enables platform features (e.g., HECI), to allow guest
1075VMs the ability to use those features.
1076
1077TPM 2.0 Virtualization (vTPM)
1078-----------------------------
1079
1080On APL platforms, Intel |reg| PTT (Platform Trust Technology) implements TPM
1081functionalities based on the TCG TPM 2.0 industry standard specification.
1082PTT exposes the TPM hardware interface as CRB (Command Response Buffer)
1083defined in the TCG TPM 2.0 spec.
1084
1085However, in project ACRN, TPM virtualization doesn't assume it is based
1086on PTT or discrete TPM; both TPMs (2.0) are supported by design.
1087Customers are free to use either PTT or discrete TPM (but not at the same
1088time). PTT, however, is a built-in TPM 2.0 implementation in APL
1089platforms and does not require extra BOM cost (unlike discrete TPM).
1090
1091Note that the underlying CSE FW/HW implements PTT functionalities;
1092however, this TPM 2.0 feature does not rely on MEI/HECI virtualization.
1093
1094Unlike regular hardware, implementation of virtualizing a TPM must
1095address both security and trust.
1096
1097The goal of virtualization is to provide TPM functionality to each guest
1098VM, such as:
1099
1100#. Allows programs to interact with a TPM in a virtual system the same
1101   way they interact with a TPM on the physical system.
1102#. Each User VM gets its own unique, emulated, software TPM, for example,
1103   vPCR and vNVRAM.
1104#. One-to-one mapping between running vTPM instances and logical vTPM in
1105   each VM.
1106
1107
1108