1# Jitterentropy: basic configuration
2
3The jitterentropy library is written by Stephan Mueller, is available at
4<https://github.com/smuellerDD/jitterentropy-library>, and is documented at
5<http://www.chronox.de/jent.html>. In Zircon, it's used as a simple entropy
6source to seed the system CPRNG.
7
8This document describes and analyzes two (independent) configuration options of
9jitterentropy:
10
111. Whether to use a variable, pseudorandom number of iterations in the noise
12   generating functions.
132. Whether to post-process the raw noise samples with jitterentropy's internal
14   processing routines.
15
16I consider these basic configuration options because the affect the basic
17process that jitterentropy uses. I'm contrasting them to tunable parameters
18(like the precise value used for loop counts if they are not chosen
19pseudorandomly, or the size of the scratch memory used internal by
20jitterentropy), since the tunable parameters don't greatly affect the means by
21which jitterentropy collects entropy, just the amount it collects and the time
22it takes.
23
24My full conclusions are at the end of this document, but in summary I think that
25we should avoid both choosing pseudorandom iteration numbers and using the
26jitterentropy post-processed data.
27
28[TOC]
29
30## Brief explanation of jitterentropy
31
32The author's documentation is available in HTML form at
33<http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html>, or in PDF form at
34<http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.pdf>. In brief, the library
35collects random bits from variations in CPU instruction timing.
36
37Jitterentropy maintains a random state, in the form of a 64-bit number that is
38affected by many of the jitterentropy functions, and ultimately is used as the
39output randomness.
40
41There are two noise sources, both of which are blocks of relatively slow-running
42code whose precise runtime is measured (using a system clock, requiring roughly
43nanosecond resolution). The precise time to complete these blocks of code will
44vary. We test these times to ensure that they are unpredictable; while we can't
45be perfectly certain that they are, the test results (including the results
46below) are encouraging. Note however that the purpose of this document is not to
47justify our estimates for the min-entropy in jitterentropy samples, but rather
48to discuss the two configuration options listed above.
49
50The first of the code blocks used as a noise source is a CPU-intensive LFSR
51loop, implemented in
52[the `jent_lfsr_time` function](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#185).
53The number of times the LFSR logic is repeated is controlled by the
54`kernel.jitterentropy.ll` cmdline ("`ll`" stands for "LFSR loops"). If `ll = 0`,
55a pseudorandom count is used, and otherwise the value of `ll` is used.
56Looking at the source code, the outer loop repeats according to the `ll`
57parameter.  The inner loop advances an LFSR by 64 steps, each time XOR-ing in
58one bit from the most recent time sample. Passing the time sample through the
59LFSR this way serves as a processing step, generally tending to whiten the
60random timesteps. As described in the
61[entropy quality testing doc](../entropy_quality_tests.md), it's important to
62skip this processing when testing the entropic content of the CPU time
63variations.  It's also the case that enabling the processing increases the
64entropy estimates by a suspicious amount in some cases (see
65[the "Effects of processing the raw samples" section](#effects-of-processing-the-raw-samples)).
66
67The second noise source is a memory access loop, in
68[the `jent_memaccess` function](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#261).
69The memory access loop is repeated according to the `kernel.jitterentropy.ml`
70cmdline ("`ml`" for "memory loops"), where again a value of 0 activates the
71pseudorandom loop count, and any non-zero value overrides the pseudorandom
72count. Each iteration of the actual memory access loop both reads and writes a
73relatively large chunk of memory, divided into `kernel.jitterentropy.bc`-many
74blocks of size `kernel.jitterentropy.bs` bytes each. The default values when I
75wrote the current document are `bc = 1024` and `bs = 64`; up-to-date defaults
76should be documented in
77[the cmdline document](../kernel_cmdline.md). For comparison, the defaults in
78the jitterentropy source code are `bc = 64` and `bs = 32`,
79[defined here](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/include/lib/jitterentropy/jitterentropy.h#79).
80Per the comment above the `jent_memaccess` function, the total memory size
81should be larger than the L1 cache size of the target machine. Confusingly,
82`bc = 64` and `bs = 32` produce a memory size of 2048 bytes, which is much
83smaller than even most L1 caches (I couldn't find any CPU with more than 0 bytes
84but less than 4KB of L1). Using `bs = 64` and `bc = 1024` result in 64KB of
85memory, which is usually enough to overflow L1 data caches.
86
87### Option 1: Pseudorandom loop counts
88
89Jitterentropy was originally designed so that the two noise generating functions
90run a pseudorandom number of times. Specifically,
91[the `jent_loop_shuffle` function](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#125)
92mixes together (1) the time read from the high-resolution clock and (2)
93jitterentropy's internal random state in order to decide how many times to run
94the noise sources.
95
96We added the ability to override these pseudorandom loop counts, and tested
97jitterentropy's performance both with and without the override. The results are
98discussed in more depth in
99[the "Effects of pseudorandom loop counts" section](#effects-of-pseudorandom-loop-counts),
100but in summary: the statistical tests suggested that the pseudorandom loop
101counts increased the entropy far more than expected.  This makes me mistrust
102these higher entropy counts, so I recommend using the lower estimates and
103preferring deterministic loop counts to pseudorandom.
104
105### Jitterentropy's random data processing
106
107As mentioned above, jitterentropy can process its random data, which makes the
108data look "more random".  Specifically, the processing should decrease (and
109ideally remove) the deviation of the random data from the uniform distribution,
110and reduce (ideally, remove) any intercorrelations between random bytes.
111
112The main function of interest for generating processed samples is
113[`jent_gen_entropy`](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#462),
114which is called in a loop by
115[`jent_read_entropy`](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#544)
116to produce an arbitrarily large number of random bytes.
117In essence, `jent_gen_entropy` calls the noise functions in a loop 64 times.
118Each of the 64 invocations of `jent_lfsr_time` mixes the noisy time measurement
119into the jitterentropy random state.
120
121After these 64 iterations, the random state is optionally "stirred" in
122[`jent_stir_pool`](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#403)
123by XOR-ing with a "mixer" value, itself dependent on the jitterentropy random
124state. As noted in the source code, this operation cannot increase or decrease
125the entropy in the pool (since XOR is bijective), but it can potentially improve
126the statistical appearance of the random state.
127
128In principle, invoking the noise source functions 64 times should produce 64
129times as much entropy, up to the maximum 64 bits that the random state can hold.
130This assumes that the mixing operation in `jent_lfsr_time` is cryptographically
131sound. I'm not an expert in cryptanalysis, but a LFSR itself is not a
132cryptographically secure RNG, since 64 successive bits reveal the entire state
133of a 64-bit LFSR, after which all past and future values can be easily
134computed. I am not sure that the jitterentropy scheme &mdash; XOR-ing the time
135measurement into the "bottom" of the LFSR as the LFSR is shifted &mdash; is more
136secure. Without careful cryptographic examination of this scheme (which for all
137I know may exist, but the I did not see it mentioned in the jitterentropy
138documentation), I would lean towards using unprocessed samples, and mixing them
139into our system entropy pool in a known-good way (e.g. SHA-2, as we do now).
140
141That said, I did run the NIST test suite against processed data samples. My
142results are in
143[the "Effects of processing the raw samples" section](#effects-of-processing-the-raw-samples))
144below.
145
146## Testing process
147
148The procedure for running entropy source quality tests is documented in
149[the entropy quality tests document](../entropy_quality_tests.md).
150
151These preliminary results were gathered on a Zircon debug build on Raspberry Pi
1523, built from commit
153[18358de5e90a012cb1e042efae83f5ea264d1502](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d)
154"\[virtio]\[entropy] Basic virtio-rng driver". The following flags were set in
155my `local.mk` file when building:
156
157```
158ENABLE_ENTROPY_COLLECTOR_TEST=1
159ENTROPY_COLLECTOR_TEST_MAXLEN=1048576
160```
161
162I ran the boot-time tests after netbooting the debug kernel on the Pi with the
163following kernel cmdline, varying the values of `$ML`, `$LL`, and `$RAW`:
164
165```
166kernel.entropy-test.src=jitterentropy
167kernel.jitterentropy.bs=64
168kernel.jitterentropy.bc=1024
169kernel.jitterentropy.ml=$ML
170kernel.jitterentropy.ll=$LL
171kernel.jitterentropy.raw=$RAW
172```
173
174## Test results and analysis
175
176### Effects of pseudorandom loop counts
177
178#### Raw Data
179
180Following the logic in the jitterentropy source code (search for
181[`MAX_FOLD_LOOP_BIT`](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#191)
182and
183[`MAX_ACC_LOOP_BIT`](https://fuchsia.googlesource.com/zircon/+/a1a80a6a7d/third_party/lib/jitterentropy/jitterentropy-base.c#265))
184the pseudorandom loop counts vary within these ranges:
185
186```
187ml: 1 .. 128 (inclusive)
188ll: 1 .. 16 (inclusive)
189```
190
191I have included the overall min-entropy estimate from the NIST suite in this
192table, as well as two contributing estimates: the compression estimate and the
193Markov estimate. The NIST min-entropy estimate is the minimum of 10 different
194estimates, including these two. The compression estimate is generally the
195smallest for jitterentropy raw samples with deterministic loop counts, and the
196Markov estimate is generally smallest for jitterentropy with other
197configurations.
198
199| `ml`              | `ll`             | min-entropy (bits / byte) | Compression estimate | Markov estimate |
200|:-----------------:|:----------------:|:-------------------------:|:--------------------:|:---------------:|
201| random (1 .. 128) | random (1 .. 16) | 5.77                      | 6.84                 | 5.77            |
202| 128               | 16               | 1.62                      | 1.62                 | 3.60            |
203| 1                 | 1                | 0.20                      | 0.20                 | 0.84            |
204
205
206In other words, varying the loop counts pseudorandomly increased the min-entropy
207estimate for raw samples by 4.15 bits (or 250%), compared to the deterministic
208version that always used the maximum values from the pseudorandom ranges.
209
210#### Analysis
211
212The pseudorandom loop count values are determined by adding one extra time
213sample per noise function. First, these time samples are not independent of the
214noise function time measurements, since the gaps between the loop count time
215samples correspond predictably to the noise function time measurements. As a
216result it would be highly questionable to assume that they increase the
217min-entropy of the output data at all.  Second, it is absurd to imagine that the
218loop count time samples were somehow about 250% as random as the noise function
219time measurements, since both rely on the same noise source, except that the
220very first loop count time samples maybe get a small boost from the random
221amount of time needed to boot the system enough to run the test.
222
223Consequently, I suspect that what happened is that the pseudorandom loop counts
224were enough to "fool" the particular suite of statistical tests and
225predictor-based tests in the NIST suite, but that a predictor test written with
226specific knowledge of how the jitterentropy pseudorandom loop counts are derived
227could in fact predict the output with far better accuracy. I think the "true"
228min-entropy in the pseudorandom loop count test, against an adversary that's
229specifically targeting our code, is within the bounds of the two deterministic
230tests, i.e. between about 0.20 and 1.62 bits per byte.
231
232Using pseudorandom counts forces us to make an additional decision: do we
233conservatively estimate the actual entropy content at 0.20 bits per byte (as if
234the pseudorandom count function always chose `ml = 1` and `ll = 1`)? Or do we
235chose an average entropy content (there is probably a more intelligent averaging
236technique than to compute (1.62 + 0.20) / 2 = 0.91 bits / byte, but that will
237serve for the purpose of this discussion) and risk the pseudorandom loop counts
238occasionally causing us to undershoot this average entropy content? If we are
239too conservative, we will spend more time collecting entropy than is needed; if
240we are too optimistic, we might have a security vulnerability. Ultimately, this
241forces a trade-off between security (which prefers conservative entropy
242estimates) and efficiency (which prefers optimistic entropy estimates).
243
244### Effects of processing the raw samples
245
246#### Raw Data
247
248I repeated the three tests reported above, but with jitterentropy's internal
249processing turned on (with `kernel.jitterentropy.raw = false` instead of the
250default value `true`). For convenience, the tables below include both the raw
251sample results (copied from above) in the top three rows, and the processed
252results (newly added) in the bottom three rows.
253
254| `ml`              | `ll`             | raw   | min-entropy (bits / byte) | Compression estimate | Markov estimate |
255|:-----------------:|:----------------:|:-----:|:-------------------------:|:--------------------:|:---------------:|
256| random (1 .. 128) | random (1 .. 16) | true  | 5.77                      | 6.84                 | 5.77            |
257| 128               | 16               | true  | 1.62                      | 1.62                 | 3.60            |
258| 1                 | 1                | true  | 0.20                      | 0.20                 | 0.84            |
259
260| `ml`              | `ll`             | raw   | min-entropy (bits / byte) | Compression estimate | Markov estimate |
261|:-----------------:|:----------------:|:-----:|:-------------------------:|:--------------------:|:---------------:|
262| random (1 .. 128) | random (1 .. 16) | false | 5.79                      | 6.59                 | 5.79            |
263| 128               | 16               | false | 5.78                      | 6.97                 | 5.78            |
264| 1                 | 1                | false | 5.77                      | 6.71                 | 5.77            |
265
266#### Analysis
267
268The post-processing min-entropy estimates are all essentially equal (up to
269slight variations easily explained by randomness), and also equal to the
270min-entropy estimate for raw samples with pseudorandom loop counts.
271
272Recall that jitterentropy's processed entropy is formed from 64 separate random
273data samples, mixed together in a 64-bit internal state buffer. Each of the raw
274samples corresponds to a sample in the `raw = true` table. In particular, it's
275absurd to think that combining 64 samples with `ml = 1` and `ll = 1` then
276processing these could produce (5.77 \* 8) = 46.2 bits of entropy per 8 bytes of
277processed output, since that would imply (46.2 / 64) = 0.72 bits of entropy per
278unprocessed sample as opposed to the measured value of 0.20 bits.
279
280This argument applies against the `ml = 1`, `ll = 1`, `raw = false` measurement,
281but does *not* apply to `ml = 128`, `ll = 16`, `raw = false`. In particular,
282combining 64 raw samples with `ml = 128` and `ll = 16` could in principle
283collect (1.64 \* 64 / 8) = 13.1 bits of entropy per processed byte, except that
284of course there is a hard limit at 8 bits per byte.
285
286Interestingly, the minimal entropy estimator switches from the compression
287estimate to the Markov estimator. My theory is that the additional "confusion"
288from post-processing is enough to "fool" the compression estimate. If there is a
289cryptographic vulnerability in the jitterentropy processing routine, it may be
290possible to write a similar estimator that reveals a significantly smaller
291min-entropy. If we use the general-purpose tests to decide how many raw samples
292to collect in order to have 256 of min-entropy, but an adversary uses a targeted
293attack, then (relative to this targeted attack) our system may have less entropy
294in its entropy pool than we expect. This is a security vulnerability.
295
296If there is a very bad weakness in the jitterentropy processing routine, it may
297in fact be reducing the "true" entropy in jitterentropy's internal pool. The
298arithmetical argument regarding `ml = 1` and `ll = 1` shows that we can't trust
299the NIST test suite to accurately measure the actual min-entropy in the
300processed data, so it is possible that the processing actually reduces
301min-entropy and our tools just can't detect the loss. This would exacerbate the
302vulnerability described in the previous paragraph.
303
304## Conclusions
305
306Jitterentropy's pseudorandom loop counts are of questionable benefit at best,
307and if used they force us to make a security/efficiency trade-off. Unless we can
308show convincing evidence that the pseudorandom times really do drastically
309increase entropy estimates rather than just defeating the NIST test suite, we
310should use deterministic loop counts, ideally tuned for performance on a
311per-target basis.
312
313Jitterentropy's processing is also questionable, since (to my knowledge) it
314hasn't been subjected to enough cryptographic analysis and testing to be
315trusted. Furthermore, we can't directly measure the min-entropy in the
316post-processed data via the NIST test suite, so if there is a cryptographic
317vulnerability we can't easily detect it. I think we should instead rely on the
318entropy mixing code in the Zircon CPRNG (based on SHA-2), and leave
319jitterentropy's processing disabled.
320
321## TODOs
322
3231. Repeat the tests reported above against different versions of Zircon, and
324   ensure that the entropy estimates remain consistent.
3252. Repeat the tests on different platforms and targets (note: x86 targets don't
326   currently have access to a system clock during early boot, so the early boot
327   entropy tests and early boot CPRNG seeding don't yet support jitterentropy on
328   x86).
3293. Automate the process of running the tests and generating the reports in this
330   document. Specifically, the tests should compare:
331
332   - pseudorandom loop counts versus various deterministic loop count values
333   - raw samples versus processed data
334