1# Entropy collection TODOs 2 3I'm writing this at the end of my internship to record some of the things I didn't get to. 4 5[TOC] 6 7## Proper use of RdRand 8 9On x86, `RdRand` reads from a deterministic CPRNG (which is seeded from a hardware entropy source). 10The newer `RdSeed` instruction reads from the underlying entropy source directly (well, with some 11post-processing). Currently, we prefer to use `RdSeed` but if that isn't available we fall back on 12`RdRand`. However, we just draw random bits directly from `RdRand`, in contravention of the Intel 13HWRNG guide 14([online here](https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide); 15see section 4.2.5 "Guaranteeing DBRG Reseeding"). We should fix that. 16 17Googlers: see issue ZX-983 18 19## Reseeding the CPRNG during runtime 20 21My hacky virtio driver will reseed the CPRNG on qemu (on a five minute recurring timer). I think 22that's the only entropy source that is currently used to reseed after system startup. 23 24As a start, we should be able to use the entropy sources built into the kernel (RdRand and 25jitterentropy). Just running these on a periodic timer would improve our reseeding story. Note that 26once every 5 minutes is probably more often than we need. 27 28We've talked about reseeding more often if large amounts of data have been drawn from the CPRNG (on 29the order of 2^48 bits, I think). 30 31## Monitoring entropy sources 32 33Entropy sources can potentially fail, either totally or partially. 34 35Total failures like "the device was unplugged" or "the device is not responding to I/O" will 36hopefully be reported by the hardware layer. 37 38Partial failures, where the device returns data but with less entropy than expected, are scarier. We 39should run simple health tests to try to detect partial failures. See for example the continuous 40health tests in NIST SP800-90B, section 4.4. The health tests there are pretty simple and require 41minimal resources. They do require storing some statistics about recent entropy source outputs, 42which presents some security risk. 43 44The NIST SP also suggests (well, requires, but I'm not aware of any immediate plans for 45certification) running startup tests. The NIST startup tests involve running the continuous tests 46over at least 4096 samples (see section 4.3 #12), after which these samples may be reused to seed 47the CPRNG. 48 49Once monitoring is in place, we need to decide how to respond to entropy source failures. If one of 50six different entropy sources fails, we might treat that as a minor hardware failure that gets 51logged. If the system has only one entropy source and it fails, we need to take more drastic action 52(on the order of shutting off the CPRNG or halting the system). 53 54## Userspace RNG drivers 55 56Once DDK settles down, we should add to and improve our RNG drivers. Currently, there are two 57RNG-related drivers: TPM and virtio-rng. 58 59An important requirement is to restrict access to the `zx_cprng_add_entropy` syscall, via a Resource 60or similar mechanism. We should also use this to differentiate between the devices providing 61entropy, for monitoring purposes. It would also be nice if the kernel can send start/stop signals to 62the drivers through this Resource. 63 64Here are some currently unused entropy sources to consider: 65 66- There's an existing TPM driver, which calls `cprng_add_entropy` in its `bind()` callback. We 67 should add support for TPM 2.0, for better coverage. 68 69- There are plenty of commercially available hardware RNGs, often connecting over USB. We could add 70 drivers for those, but it probably makes sense to expect third party drivers instead. 71 72- There's also apparently a hardware RNG built into the SoC in Raspberry Pis, according to 73 [the Raspberry Pi forums](https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=19334&p=273944#p273944). 74 In general we could check other specific targets (i.e. not "pc-x86-64") for hardware RNGs and wire 75 those up. If we're lucky, many of these will be accessible from the kernel for use during or 76 immediately after boot. 77 78- Finally, we could record entropy from hardware IRQs, especially for hard disks, network cards, 79 input devices, and other classic entropy sources. This won't be anywhere near as fast as a 80 dedicated hardware RNG, but it's attractive since a few lines of code added in the right places in 81 our driver stack should enable entropy collection from a wide variety of very common devices. 82 83Googlers: SEC-29 84 85## Jitterentropy 86 87### Replace the noise-generating functions by assembly, and remove '-O0' 88 89Right now, jitterentropy is compiled at optimization level `-O0` (as per the author's 90documentation). The reason is the two noise-generating functions: `jent_lfsr_time` and 91`jent_memaccess`. We should replace these C functions by assembly code (probably by compiling with 92flags `-S -O0`), then compile the rest of jitterentropy with optimizations enabled. After this, we 93should re-test to make sure our entropy estimates remain accurate. 94 95Googlers: SEC-14 96 97### Test jitterentropy more thoroughly 98 99I've been testing on the same handful of physical devices. We should test jitterentropy on a few 100other PCs, RPis, etc. 101 102Googlers: SEC-22 103 104### Test jitterentropy at runtime 105 106Right now, jitterentropy only runs (and was only tested) during the single-core part of the boot 107sequence. We should test jitterentropy during SMP runtime, and consider whether we need to (say) 108disable interrupts or pin ourselves to a CPU inside jitterentropy. 109 110Googlers: ZX-1024 111 112### More tuning 113 114See [the tuning doc](jitterentropy/config-tuning.md). The current universally hard-coded parameters 115seem to be decent, so this probably isn't incredibly urgent. Still, since jitterentropy is on the 116critical path for every single boot and since it will run during runtime as well (hopefully soon!), 117it's probably worth optimizing at some point. 118 119We should probably at least tune jitterentropy on a per-architecture basis, and ideally per-target. 120Note that right now, the `entropy_per_1000_bytes` statistic in 121`kernel/lib/crypto/entropy/jitterentropy_collector.cpp` is hard-coded and not arch/target dependent. 122That should probably also be configurable. 123 124Googlers: ZX-1022 125 126## Cloning the NIST test suite 127 128We may want to clone the NIST test suite into Fuchsia third\_party. This would help us to automate 129the testing and analysis of our entropy sources (Jitterentropy in particular). 130