1# Entropy collection TODOs
2
3I'm writing this at the end of my internship to record some of the things I didn't get to.
4
5[TOC]
6
7## Proper use of RdRand
8
9On x86, `RdRand` reads from a deterministic CPRNG (which is seeded from a hardware entropy source).
10The newer `RdSeed` instruction reads from the underlying entropy source directly (well, with some
11post-processing). Currently, we prefer to use `RdSeed` but if that isn't available we fall back on
12`RdRand`. However, we just draw random bits directly from `RdRand`, in contravention of the Intel
13HWRNG guide
14([online here](https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide);
15see section 4.2.5 "Guaranteeing DBRG Reseeding"). We should fix that.
16
17Googlers: see issue ZX-983
18
19## Reseeding the CPRNG during runtime
20
21My hacky virtio driver will reseed the CPRNG on qemu (on a five minute recurring timer). I think
22that's the only entropy source that is currently used to reseed after system startup.
23
24As a start, we should be able to use the entropy sources built into the kernel (RdRand and
25jitterentropy). Just running these on a periodic timer would improve our reseeding story. Note that
26once every 5 minutes is probably more often than we need.
27
28We've talked about reseeding more often if large amounts of data have been drawn from the CPRNG (on
29the order of 2^48 bits, I think).
30
31## Monitoring entropy sources
32
33Entropy sources can potentially fail, either totally or partially.
34
35Total failures like "the device was unplugged" or "the device is not responding to I/O" will
36hopefully be reported by the hardware layer.
37
38Partial failures, where the device returns data but with less entropy than expected, are scarier. We
39should run simple health tests to try to detect partial failures. See for example the continuous
40health tests in NIST SP800-90B, section 4.4. The health tests there are pretty simple and require
41minimal resources. They do require storing some statistics about recent entropy source outputs,
42which presents some security risk.
43
44The NIST SP also suggests (well, requires, but I'm not aware of any immediate plans for
45certification) running startup tests. The NIST startup tests involve running the continuous tests
46over at least 4096 samples (see section 4.3 #12), after which these samples may be reused to seed
47the CPRNG.
48
49Once monitoring is in place, we need to decide how to respond to entropy source failures. If one of
50six different entropy sources fails, we might treat that as a minor hardware failure that gets
51logged. If the system has only one entropy source and it fails, we need to take more drastic action
52(on the order of shutting off the CPRNG or halting the system).
53
54## Userspace RNG drivers
55
56Once DDK settles down, we should add to and improve our RNG drivers. Currently, there are two
57RNG-related drivers: TPM and virtio-rng.
58
59An important requirement is to restrict access to the `zx_cprng_add_entropy` syscall, via a Resource
60or similar mechanism. We should also use this to differentiate between the devices providing
61entropy, for monitoring purposes. It would also be nice if the kernel can send start/stop signals to
62the drivers through this Resource.
63
64Here are some currently unused entropy sources to consider:
65
66- There's an existing TPM driver, which calls `cprng_add_entropy` in its `bind()` callback. We
67  should add support for TPM 2.0, for better coverage.
68
69- There are plenty of commercially available hardware RNGs, often connecting over USB. We could add
70  drivers for those, but it probably makes sense to expect third party drivers instead.
71
72- There's also apparently a hardware RNG built into the SoC in Raspberry Pis, according to
73  [the Raspberry Pi forums](https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=19334&p=273944#p273944).
74  In general we could check other specific targets (i.e. not "pc-x86-64") for hardware RNGs and wire
75  those up. If we're lucky, many of these will be accessible from the kernel for use during or
76  immediately after boot.
77
78- Finally, we could record entropy from hardware IRQs, especially for hard disks, network cards,
79  input devices, and other classic entropy sources. This won't be anywhere near as fast as a
80  dedicated hardware RNG, but it's attractive since a few lines of code added in the right places in
81  our driver stack should enable entropy collection from a wide variety of very common devices.
82
83Googlers: SEC-29
84
85## Jitterentropy
86
87### Replace the noise-generating functions by assembly, and remove '-O0'
88
89Right now, jitterentropy is compiled at optimization level `-O0` (as per the author's
90documentation). The reason is the two noise-generating functions: `jent_lfsr_time` and
91`jent_memaccess`. We should replace these C functions by assembly code (probably by compiling with
92flags `-S -O0`), then compile the rest of jitterentropy with optimizations enabled. After this, we
93should re-test to make sure our entropy estimates remain accurate.
94
95Googlers: SEC-14
96
97### Test jitterentropy more thoroughly
98
99I've been testing on the same handful of physical devices. We should test jitterentropy on a few
100other PCs, RPis, etc.
101
102Googlers: SEC-22
103
104### Test jitterentropy at runtime
105
106Right now, jitterentropy only runs (and was only tested) during the single-core part of the boot
107sequence. We should test jitterentropy during SMP runtime, and consider whether we need to (say)
108disable interrupts or pin ourselves to a CPU inside jitterentropy.
109
110Googlers: ZX-1024
111
112### More tuning
113
114See [the tuning doc](jitterentropy/config-tuning.md). The current universally hard-coded parameters
115seem to be decent, so this probably isn't incredibly urgent. Still, since jitterentropy is on the
116critical path for every single boot and since it will run during runtime as well (hopefully soon!),
117it's probably worth optimizing at some point.
118
119We should probably at least tune jitterentropy on a per-architecture basis, and ideally per-target.
120Note that right now, the `entropy_per_1000_bytes` statistic in
121`kernel/lib/crypto/entropy/jitterentropy_collector.cpp` is hard-coded and not arch/target dependent.
122That should probably also be configurable.
123
124Googlers: ZX-1022
125
126## Cloning the NIST test suite
127
128We may want to clone the NIST test suite into Fuchsia third\_party. This would help us to automate
129the testing and analysis of our entropy sources (Jitterentropy in particular).
130