config-tuning.md - OpenGrok cross reference for /docs/jitterentropy/config-tuning.md

# Jitterentropy: tuning the configuration

The jitterentropy library is written by Stephan Mueller, is available at
<https://github.com/smuellerDD/jitterentropy-library>, and is documented at
<http://www.chronox.de/jent.html>. In Zircon, it's used as a simple entropy
source to seed the system CPRNG.

[The companion document about basic configuration options to jitterentropy](config-basic.md)
describes two options that fundamentally affect how jitterentropy runs. This document describes
instead the numeric parameters that control how fast jitterentropy is and how much entropy it
collects, but without fundamentally altering its principles of operation. It also describes how to
test various parameters and what to look for in the output (e.g. if adding support for a new device,
or to do a more thorough job of optimizing the parameters).

[TOC]

## A rundown of jitterentropy's parameters

The following tunable parameters control how fast jitterentropy runs, and how fast it collects
entropy:

### [`kernel.jitterentropy.ll`](../kernel_cmdline.md#kernel_jitterentropy_ll_num)

"`ll`" stands for "LFSR loops". Jitterentropy uses a (deliberately inefficient implementation of a)
LFSR to exercise the CPU, as part of its noise generation. The inner loop shifts the LFSR 64 times;
the outer loop repeats `kernel.jitterentropy.ll`-many times.

In my experience, the LFSR code significantly slows down jitterentropy, but doesn't generate very
much entropy. I tested this on RPi3 and qemu-arm64 with qualitatively similar results, but it hasn't
been tested on x86 yet. This is something to consider when tuning: using fewer LFSR loops tends to
lead to better overall performance.

Note that setting `kernel.jitterentropy.ll=0` causes jitterentropy to choose the number of LFSR
loops in a "random-ish" way. As described in [the basic config doc](config-basic.md), I discourage
the use of `kernel.jitterentropy.ll=0`.


### [`kernel.jitterentropy.ml`](../kernel_cmdline.md#kernel_jitterentropy_ml_num)

"`ml`" stands for "memory access loops". Jitterentropy walks through a moderately large chunk of
RAM, reading and writing each byte. The size of the chunk and access pattern are controlled by the
two parameters below. The memory access loop is repeated `kernel.jitterentropy.ml`-many times.

In my experience, the memory access loops are a good source of raw entropy. Again, I've only tested
this on RPi3 and qemu-arm64 so far.

Much like `kernel.jitterentropy.ll`, if you set `kernel.jitterentropy.ml=0`, then jitterentropy will
choose a "random-ish" value for the memory access loop count. I also discourage this.

### [`kernel.jitterentropy.bs`](../kernel_cmdline.md#kernel_jitterentropy_bs_num)

"`bs`" stands for "block size". Jitterentropy divides its chunk of RAM into blocks of this size. The
memory access loop starts with byte 0 of block zero, then "byte -1" of block 1 (which is actually
the last byte of block 0), then "byte -2" of block 2 (i.e. the second-to-last byte of block 1), and
so on. This pattern ensures that every byte gets hit, and most accesses go into different blocks.

I have usually tested jitterentropy with `kernel.jitterentropy.bs=64`, based on the size of a cache
line. I haven't tested yet to see whether there's a better option on some/all platforms.

### [`kernel.jitterentropy.bc`](../kernel_cmdline.md#kernel_jitterentropy_bc_num)

"`bc`" stands for "block count". Jitterentropy uses this many blocks of RAM, each of size
`kernel.jitterentropy.bs`, in its memory access loops.

Since I choose `kernel.jitterentropy.bs=64`, I usually choose `kernel.jitterentropy.bc=1024`.
This means using 64KB of RAM, which is enough to overflow L1 cache.

The [jitterentropy source code](../../third_party/lib/jitterentropy/jitterentropy-base.c#234)
in the comment before `jent_memaccess` suggests choosing the block size and count so that the RAM
used is bigger than L1. Confusingly, the default values in upstream jitterentropy (block size = 32,
block count = 64) aren't big enough to overflow L1.

## Tuning process

The basic idea is simple: on a particular target device, try different values for the parameters.
Collect a large amount of data for each parameter set (ideally around 1MB), then
[run the NIST test suite to analyze the data](../entropy_quality_tests.md#running-the-nist-test-suite).
Determine which parameters give the best entropy per unit time. The time taken to draw the entropy
samples is logged on the system under test.

One complication is the startup testing built into jitterentropy. This essentially draws and
discards 400 samples, after performing some basic analysis (mostly making sure that the clock is
monotonic and has a high enough resolution and variability). A more accurate test would reboot twice
for each set of parameters: once to collect around 1MB of data for analysis, and a second time to
boot with the "right" amount of entropy (as computed according to the entropy estimate in the first
phase, with appropriate safety margins, etc. See
["Determining the entropy\_per\_1000\_bytes statistic"](#determining-the-entropy_per_1000_bytes-statistic),
below). This second phase of testing simulates a real boot, including the startup tests. After
completing the second phase, choose the parameter set that boots fastest. Of course, each phase of
testing should be repeated a few times to reduce random variations.

## Determining the entropy\_per\_1000\_bytes statistic

The `crypto::entropy::Collector` interface in
[kernel/lib/crypto/include/lib/crypto/entropy/collector.h](../../kernel/lib/crypto/include/lib/crypto/entropy/collector.h)
requires a parameter `entropy_per_1000_bytes` from its instantiations. The value relevant to
jitterentropy is currently hard-coded in
[kernel/lib/crypto/entropy/jitterentropy\_collector.cpp](../../kernel/lib/crypto/entropy/jitterentropy_collector.cpp).
This value is meant to measure how much min-entropy is contained in each byte of data produced by
jitterentropy (since the bytes aren't independent and uniformly distributed, this will be less than
8 bits). The "per 1000 bytes" part simply makes it possible to specify fractional amounts of
entropy, like "0.123 bits / byte", without requiring fractional arithmetic (since `float` is
disallowed in kernel code, and fixed-point arithmetic is confusing).

The value should be determined by using the NIST test suite to analyze random data samples, as
described in
[the entropy quality tests document](../entropy_quality_tests.md#running-the-nist-test-suite).
The test suite produces an estimate of the min-entropy; repeated tests of the same RNG have (in my
experience) varied by a few tenths of a bit (which is pretty significant when entropy values can be
around 0.5 bits per byte of data!). After getting good, consistent results from the test suites,
apply a safety factor (i.e. divide the entropy estimate by 2), and update the value of
`entropy_per_1000_bytes` (don't forget to multiply by 1000).

Note that eventually `entropy_per_1000_bytes` should probably be configured somewhere instead of
hard-coded in jitterentropy\_collector.cpp. Kernel cmdlines or even a preprocessor symbol could work.

## Notes about the testing script

The `scripts/entropy-test/jitterentropy/test-tunable` script automates the practice of looping
through a large test matrix. The downside is that tests run in sequence on a single machine, so (1)
an error will stall the test pipeline so supervision *is* required, and (2) the machine is being
constantly rebooted rather than cold-booted (plus it's a netboot-reboot), which could conceivably
confound the tests. Still, it beats hitting power-off/power-on a thousand times by hand!

Some happy notes:

1. When netbooting, the script leaves bootserver on while waiting for netcp to successfully export
   the data file. If the system hangs, you can power it off and back on, and the existing bootserver
   process will restart the failed test.

2. If the test is going to run (say) 16 combinations of parameters 10 times each, it will go like
   this:

       test # 0: ml = 1   ll = 1  bc = 1  bs = 1
       test # 1: ml = 1   ll = 1  bc = 1  bs = 64
       test # 2: ml = 1   ll = 1  bc = 32 bs = 1
       test # 3: ml = 1   ll = 1  bc = 32 bs = 64
       ...
       test #15: ml = 128 ll = 16 bc = 32 bs = 64
       test #16: ml = 1   ll = 1  bc = 1  bs = 1
       test #17: ml = 1   ll = 1  bc = 1  bs = 64
       ...

   (The output files are numbered starting with 0, so I started with 0 above.)

   So, if test #17 fails, you can delete tests #16 and #17, and re-run 9 more iterations of each
   test. You can at least keep the complete results from the first iteration. In theory, the tests
   could be smarter and also keep the existing result from test #16, but the current shell scripts
   aren't that sophisticated.

The scripts don't do a two-phase process like I suggested in the ["Tuning process"](#tuning-process)
section above. It's certainly possible, but again the existing scripts aren't that sophisticated.

## Open questions

### How much do we trust the low-entropy extreme?

It's *a priori* possible that we maximize entropy per unit time by choosing small parameter values.
Most extreme is of course `ll=1, ml=1, bs=1, bc=1`, but even something like `ll=1, ml=1, bs=64,
bc=32` is an example of what I'm thinking of.  Part of the concern is the variability in the test
suite: if hypothetically the tests are only accurate to within 0.2 bits of entropy per byte, and if
they're reporting 0.15 bits of entropy per byte, what do we make of it? Hopefully running the same
test a few hundred times in a row will reveal a clear modal value, but it's still a little bit risky
to rely on that low estimate to be accurate.

The NIST publication states (line 1302, page 35, second draft) that the estimators "work well when
the entropy-per-sample is greater than 0.1". This is fairly low, so hopefully it isn't an issue in
practice. Still, the fact that there is a lower bound means we should probably leave a fairly
conservative envelope around it.

### How device-dependent is the optimal choice of parameters?

There's evidently a significant difference in the actual "bits of entropy per byte" metric on
different architectures or different hardware. Is it possible that most systems are optimal at
similar parameter values (so that we can just hard-code these values into
`kernel/lib/crypto/entropy/jitterentropy_collector.cpp`? Or, do we need to put the parameters into
MDI or into a preprocessor macro, so that we can use different defaults on a per-platform basis (or
whatever level of granularity is appropriate).

### Can we even record optimal parameters with enough granularity?

I mentioned it above, but one of our targets is "x86", which is what runs on any x86
PC. Naturally, x86 PCs can very quite a bit. Even if we did something like add preprocessor symbols
like `JITTERENTROPY_LL_VALUE` etc. to the build, customized in `kernel/project/target/pc-x86.mk`,
could we pick a good value for *all PCs*?

If not, what are our options?

1. We could store a lookup table based on values accessible at runtime (like the exact CPU model,
   the core memory size, cache line size, etc.). This seems rather unwieldy. Maybe if we could find
   one or two simple properties to key off of, say "CPU core frequency" and "L1 cache size", we
   could make this relatively non-terrible.

2. We could try an adaptive approach: monitor the quality of the entropy stream, and adjust the
   parameters according on the fly. This would take a lot of testing and justification if we want to
   trust it.

3. We could settle for "good enough" parameters on most devices, with the option to tune via kernel
   cmdlines or a similar mechanism. This seems like the most likely outcome to me. I expect that
   "good enough" parameters will be easy to find, and not disruptive enough to justify extreme
   solutions.