design_docs/booting/bl1.rst

########################
BL1 Immutable bootloader
########################

:Author: Raef Coles
:Organization: Arm Limited
:Contact: raef.coles@arm.com

************
Introduction
************

Some devices that use TF-M will require initial boot code that is stored in ROM.
There are a variety of reasons that this might happen:

- The device cannot access flash memory without a driver, so needs some setup
  to be done before main images on flash can be booted.
- The device has no on-chip secure flash, and therefore cannot otherwise
  maintain a tamper-resistant root of trust.
- The device has a security model that requires an immutable root of trust

Henceforth any bootloader stored in ROM will be referred to as BL1, as it would
necessarily be the first stage in the boot chain.

TF-M provides a reference second-stage flash bootloader BL2, in order to allow
easier integration. This bootloader implements all secure boot functionality
needed to provide a secure chain of trust.

A reference ROM bootloader BL1 has now being added with the same motivation -
allowing easier integration of TF-M for platforms that do not have their own
BL1 and require one.

****************************
BL1 Features and Motivations
****************************

The reference ROM bootloader provides the following features:

- A split between code being stored in ROM and in other non-volatile memory.

  - This can allow significant cost reduction in fixing bugs compared to
    ROM-only bootloaders.

- A secure boot mechanism that allows upgrading the next boot stage (which
  would usually be BL2).

  - This allows for the fixing of any bugs in the BL2 image.
  - Alternately, this could allow the removal of BL2 in some devices that are
    constrained in flash space but have ROM.

- A post-quantum resistant asymmetric signature scheme for verifying the next
  boot stage image.

  - This can allow devices to be securely updated even if attacks
    involving quantum computers become viable. This could extend the lifespans
    of devices that might be deployed in the field for many years.

- A mechanism for passing boot measurements to the TF-M runtime so that they
  can be attested.
- Tooling to create and sign images.
- Fault Injection (FI) and Differential Power Analysis (DPA) mitigations.

*********************************
BL1_1 and BL1_2 split bootloaders
*********************************

BL1 is split into two distinct boot stages, BL1_1 which is stored in ROM and
BL1_2 which is stored in other non-volatile storage. This would usually be
either trusted or untrusted flash, but on platforms without flash memory can be
OTP. As BL1_2 is verified against a hash stored in OTP, it is immutable after
provisioning even if stored in mutable storage.

Bugs in ROM bootloaders usually cannot be fixed once a device is provisioned /
in the field, as ROM code is immutable the only option is fixing the bug in
newly manufactured devices.

However, it can be very expensive to change the ROM code of devices once
manufacturing has begun, as it requires changes to the photolithography masks
that are used to create the device. This cost varies depending on the complexity
of the device and of the process node that it is being fabricated on, but can be
large, both in engineering time and material/process costs.

By placing the majority of the immutable bootloader in other storage, we can
mitigate the costs associated with changing ROM code, as a new BL1_2 image can
be used at provisioning time with minimal changeover cost. BL1_1 contains a
minimal codebase responsible mainly for the verification of the BL1_2 image.

The bootflow is as follows. For simplicity this assumes that the boot stage
after BL1 is BL2, though this is not necessarily the case:

1) BL1_1 begins executing in place from ROM
2) BL1_1 copies BL1_2 into RAM
3) BL1_1 verifies BL1_2 against the hash stored in OTP
4) BL1_1 jumps to BL1_2, if the hash verification has succeeded
5) BL1_2 copies the primary BL2 image from flash into RAM
6) BL1_2 verifies the BL2 image using asymmetric cryptography
7) If verification fails, BL1_2 repeats 5 and 6 with the secondary BL2 image
8) BL1_2 jumps to BL2, if either image has successfully verified

.. Note::
    The BL1_2 image is not encrypted, so if it is placed in untrusted flash it
    will be possible to read the data in the image.

Some optimizations have been made specifically for the case where BL1_2 has been
stored in OTP:

OTP can be very expensive in terms of chip area, though new technologies like
antifuse OTP decrease this cost. Because of this, the code size of BL1_2 has
been minimized. Code-sharing has been configured so that BL1_2 can call
functions stored in ROM. Care should be taken that OTP is sized such that it is
possible to include versions of the functions used via code-sharing, in case the
ROM functions contain bugs, though less space is needed than if all code is
duplicated as it is assumed that most functions will not contain bugs.

As OTP memory frequently has low performance, BL1_2 is copied into RAM before it
it is executed. It also copies the next image stage into RAM before
authenticating it, which allows the next stage to be stored in untrusted flash.
This requires that the device have sufficient RAM to contain both the BL1_2
image and the next stage image at the same time. Note that this is done even if
BL1_2 is located in XIP-capable flash, as it both allows the use of untrusted
flash and simplifies the image upgrade logic.

.. Note::
   BL1_2 enables TF-M to be used on devices that contain no secure flash, though
   the ITS service will not be available. Other services that depend on ITS will
   not be available without modification.

*************************************
Secure boot / Image upgrade mechanism
*************************************

BL1_2 verifies the authenticity of the next stage image via asymmetric
cryptography, using a public key that is provisioned into OTP.

BL1_2 implements a rollback protection counter in OTP, which is used to prevent
the next stage image being downgraded to a less secure version.

BL1_2 has two image slots, which allows image upgrades to be performed. The
primary slot is always booted first, and then if verification of this fails
(either due to an invalid signature or due to a version lower than the rollback
protection counter) the secondary slot is then booted (subject to the same
checks).

BL1_2 contains no image upgrade logic, in order for OTA of the next stage image
to be implemented, a later stage in the system must handle downloading new
images and placing them in the required slot.

********************************************
Post-Quantum signature verification in BL1_2
********************************************

BL1_2 uses a post-quantum asymmetric signature scheme to verify the next stage.
The scheme used is Leighton-Michaeli Signatures (henceforth LMS). LMS is
standardised in `NIST SP800-208
<https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-208.pdf>`_
and `IETF RFC8554. <https://datatracker.ietf.org/doc/html/rfc8554>`_

LMS is a stateful-hash signature scheme, meaning that:

 1) It is constructed from a cryptographic hash function, in this case SHA256.

    - This function can be accelerated by existing hardware accelerators, which
      can make LMS verification relatively fast compared to other post-quantum
      signature schemes that cannot be accelerated in hardware yet.

 2) Each private key can only be used to sign a certain number of images.

    - BL1_2 uses the SHA256_H10 parameter set, meaning each key can sign 1024
      images.

The main downside, the limited amount of possible signatures, can be mitigated
by limiting the amount of image upgrades that are done. As BL2 is often
currently not upgradable, it is not anticipated that this limit will be
problematic. If BL1 is being used to directly boot a TF-M/NS combined image, the
limit is more likely to be problematic, and care should be taken to examine the
likely update amount.

LMS public keys are 32 bytes in size, and LMS signatures are 1912 bytes in size.
The signature size is larger than some asymmetric schemes, though most devices
should have enough space in flash to accommodate this.

The main upside of LMS, aside from the security against attacks involving
quantum computers, is that it is relatively simple to implement. The software
implementation that is used by BL1 is ~3KiB in size, which is considerably
smaller than the corresponding RSA implementation which is at least 6.5K. This
simplicity of implementation is useful to avoid bugs.

BL1_2 uses the Mbed TLS implementation of LMS. More information about it is
available in the `Mbed TLS documentation
<https://mbed-tls.readthedocs.io/projects/api/en/development/api/file/lms_8h/>`_

*********************
BL1 boot measurements
*********************

BL1 outputs boot measurements in the same format as BL2, utilising the same
shared memory area. These measurements can then be included in the attestation
token, allowing the attestation of the version of the boot stage after BL1.

***********
BL1 tooling
***********

Image signing scripts are provided for BL1_1 and BL1_2. While the script is
named ``create_bl2_img.py``, it can be used for any next stage image.

- ``bl1/bl1_1/scripts/create_bl1_2_img.py``
- ``bl1/bl1_2/scripts/create_bl2_img.py``

These sign (and encrypt in the case of ``create_bl2_img.py``) a given image file
and append the required headers.

**************************
BL1 FI and DPA mitigations
**************************

BL1 reuses the FI countermeasures used in the TF-M runtime, which are found in
``lib/fih/``.

BL1 implements countermeasures against fault injection. The functions with these
countermeasures are found in ``bl1/bl1_1/shared_lib/util.c``

``bl_fih_memeql`` tests if memory regions have the same value

- It inserts random delays to improve resilience to FIH attacks
- It performs loop integrity checks
- It uses FIH constructs

**************************
Using BL1 on new platforms
**************************

New platforms must define the following macros in their ``region_defs.h``:

- ``BL1_1_HEAP_SIZE``
- ``BL1_1_STACK_SIZE``
- ``BL1_2_HEAP_SIZE``
- ``BL1_2_STACK_SIZE``
- ``BL1_1_CODE_START``
- ``BL1_1_CODE_LIMIT``
- ``BL1_1_CODE_SIZE``
- ``BL1_2_CODE_START``
- ``BL1_2_CODE_LIMIT``
- ``BL1_2_CODE_SIZE``
- ``PROVISIONING_DATA_START``
- ``PROVISIONING_DATA_LIMIT``
- ``PROVISIONING_DATA_SIZE``

The ``PROVISIONING_DATA_*`` defines are used to locate where the data to be
provisioned into OTP can be found. These are required as the provisioning bundle
needs to contain the entire BL1_2 image, usually >= 8KiB in size, which is too
large to be placed in the static data area as is done for all other dummy
provisioning data. On development platforms with reprogrammable ROM, this is
often placed in unused ROM. On production platforms, this should be located in
RAM and then filled with provisioning data. The format of the provisioning data
that should be located in the ``PROVISIONING_DATA_*`` region can be found in
``bl1/bl1_1/lib/provisioning.c`` in the struct
``bl1_assembly_and_test_provisioning_data_t``

If the platform is storing BL1_2 in flash, it must set
``BL1_2_IMAGE_FLASH_OFFSET`` to the flash offset of the start of BL1_2.

The platform must also implement the HAL functions defined in the following
headers:

- ``bl1/bl1_1/shared_lib/interface/trng.h``
- ``bl1/bl1_1/shared_lib/interface/crypto.h``
- ``bl1/bl1_1/shared_lib/interface/otp.h``

If the platform integrates a CryptoCell-312, then it can reuse the existing
implementation.

***********
BL1 Testing
***********

New tests have been written to test both the HAL implementation, and the
integration of those functions for verifying images. These tests are stored in
the ``tf-m-tests`` repository, under the ``test/bl1/`` directory, and further
subdivided into BL1_1 and BL1_2 tests.

--------------

*Copyright (c) 2022-2024, Arm Limited. All rights reserved.*