1# Nested SVM (AMD) CPUID requirements 2 3The first step in making nested SVM production-ready is to make sure 4that all features are implemented and well-tested. To make this 5tractable, we will initially be limiting the "supported" range of 6nested virt to a specific subset of host and guest features. This 7document describes the criteria for deciding on features, and the 8rationale behind each feature. 9 10For AMD, all virtualization-related features can be found in CPUID 11leaf 8000000A:edx 12 13# Criteria 14 15- Processor support: At a minimum we want to support processors from 16 the last 5 years. All things being equal, we'd prefer to cover 17 older processors than not. Bits 0:7 were available in the very 18 earliest processors; and even through bit 15 we should be pretty 19 good support-wise. 20 21- Faithfulness to hardware: We need the behavior of the "virtual cpu" 22 from the L1 hypervisor's perspective to be as close as possible to 23 the original hardware. In particular, the behavior of the hardware 24 on error paths 1) is not easy to understand or test, 2) can be the 25 source of surprising vulnerabiliies. (See XSA-7 for an example of a 26 case where subtle error-handling differences can open up a privilege 27 escalation.) We should avoid emulating any bit of the hardware with 28 complex error paths if we can at all help it. 29 30- Cost of implementation: We want to minimize the cost of 31 implementation (where this includes bringing an existing sub-par 32 implementation up to speed). All things being equal, we'll favor a 33 configuration which does not require any new implementation. 34 35- Performance: All things being equal, we'd prefer to choose a set of 36 L0 / L1 CPUID bits that are faster than slower. 37 38 39# Bits 40 41- 0 `NP` *Nested Paging*: Required both for L0 and L1. 42 43 Based primarily on faithfulness and performance, as well as 44 potential cost of implementation. Available on earliest hardware, 45 so no compatibility issues. 46 47- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1. 48 49 For L0 this is required for performance: There's no way to tell the 50 guests not to use the LBR-related registers; and if the guest does, 51 then you have to save and restore all LBR-related registers on 52 context switch, which is prohibitive. Furthermore, the additional 53 emulation risks a security-relevant difference to come up. 54 55 Providing it to L1 when we have it in L0 is basically free, and 56 already implemented. 57 58 Just require it and provide it. 59 60- 2 `SVML` *SVM Lock*: Not required for L0, not provided to L1 61 62 Seems to be aboult enabling an operating system to prevent "blue 63 pill" attacks against itself. 64 65 Xen doesn't use it, nor provide it; so it would need to be 66 implementend. The best way to protect a guest OS is to leave nested 67 virt disabled in the tools. 68 69- 3 `NRIPS` NRIP Save: Require for both L0 and L1 70 71 If NRIPS is not present, the software interrupt injection 72 functionality can't be used; and Xen has to emulate it. That's 73 another source of potential security issues. If hardware supports 74 it, then providing it to guest is basically free. 75 76- 4 `TscRateMsr`: Not required by L0, not provided to L1 77 78 The main putative use for this would be trying to maintain an 79 invariant TSC across cores with different clock speeds, or after a 80 migrate. Unlike others, this doesn't have an error path to worry 81 about compatibility-wise; and according to tests done when nestedSVM 82 was first implemented, it's actually faster to emliate TscRateMSR in 83 the L0 hypervisor than for L1 to attempt to emulate it itself. 84 85 However, using this properly in L0 will take some implementation 86 effort; and composing it properly with L1 will take even more 87 effort. Just leave it off for now. 88 89 - 5 `VmcbClean`: VMCB Clean Bits: Not required by L0, provide to L1 90 91 This is a pure optimization, both on the side of the L0 and L1. The 92 implementaiton for L1 is entirely Xen-side, so can be provided even 93 on hardware that doesn't provide it. And it's purely an 94 optimization, so could be "implemented" by ignoring the bits 95 entirely. 96 97 As such, we don't need to require it for L0; and as it's already 98 implemented, no reason not to provide it to L1. Before this feature 99 was available those bits were marked SBZ ("should be zero"); setting 100 them was already advertised to cause unpredictable behavior. 101 102- 6 `FlushByAsid`: Require for L0, provide to L1 103 104 This is cheap and easy to use for L0 and to provide to the L1; 105 there's no reson not to just pass it through. 106 107- 7 `DecodeAssists`: Require for L0, provide to L1 108 109 Using it in L0 reduces the chance that we'll make some sort of error 110 in the decode path. And if hardware supports it, it's easy enough 111 to provide to the L1. 112