% % Copyright 2014, General Dynamics C4 Systems % % SPDX-License-Identifier: GPL-2.0-only % \chapter{\label{ch:io}Hardware I/O} \section{Interrupt Delivery} \label{sec:interrupts} Interrupts are delivered as notifications. A thread may configure the kernel to signal a particular \obj{Notification} object each time a certain interrupt triggers. Threads may then wait for interrupts to occur by calling \apifunc{seL4\_Wait}{sel4_wait} or \apifunc{seL4\_Poll}{sel4_poll} on that \obj{Notification}. \obj{IRQHandler} capabilities represent the ability of a thread to configure a certain interrupt. They have three methods: \begin{description} \item[\apifunc{seL4\_IRQHandler\_SetNotification}{irq_handlersetnotification}] specifies the \obj{Notification} the kernel should \apifunc{signal}{sel4_signal} when an interrupt occurs. A driver may then call \apifunc{seL4\_Wait}{sel4_wait} or \apifunc{seL4\_Poll}{sel4_poll} on this notification to wait for interrupts to arrive. \item[\apifunc{seL4\_IRQHandler\_Ack}{irq_handleracknowledge}] informs the kernel that the userspace driver has finished processing the interrupt and the microkernel can send further pending or new interrupts to the application. \item[\apifunc{seL4\_IRQHandler\_Clear}{irq_handlerclear}] de-registers the \obj{Notification} from the \obj{IRQHandler} object. \end{description} When the system first starts, no \obj{IRQHandler} capabilities are present. Instead, the initial thread's CSpace contains a single \obj{IRQControl} capability. This capability may be used to produce a single \obj{IRQHandler} capability for each interrupt available in the system. Typically, the initial thread of a system will determine which IRQs are required by other components in the system, produce an \obj{IRQHandler} capability for each interrupt, and then delegate the resulting capabilities as appropriate. Methods on \obj{IRQControl} can be used for creating \obj{IRQHandler} capabilities for interrupt sources. \ifxeightsix \section{x86-Specific I/O} \subsection{Interrupts} \label{sec:x86_interrupts} In addition to managing \obj{IRQHandler} capabilities, x86 platforms require the delivery location in the CPU vectors to be configured. Regardless of where an interrupt comes from (IOAPIC, MSI, etc) it must be assigned a unique vector for delivery, ranging from VECTOR\_MIN to VECTOR\_MAX. The rights to allocate a vector are effectively given through the \obj{IRQControl} capability and can be considered as the kernel outsourcing the allocation of this namespace to user level. \begin{description} \item[\apifunc{seL4\_IRQControl\_GetIOAPIC}{x86_irq_handler_getioapic}] creates an \obj{IRQHandler} capability for an IOAPIC interrupt \item[\apifunc{seL4\_IRQControl\_GetMSI}{x86_irq_handler_getmsi}] creates an \obj{IRQHandler} capability for an MSI interrupt \end{description} \subsection{I/O Ports} \label{sec:ioports} On x86 platforms, seL4 provides access to I/O ports to user-level threads. Access to I/O ports is controlled by \obj{IO Port} capabilities. Each \obj{IO Port} capability identifies a range of ports that can be accessed with it. Reading from I/O ports is accomplished with the \apifunc{seL4\_X86\_IOPort\_In8}{x86_io_port_in8}, \apifunc{seL4\_X86\_IOPort\_In16}{x86_io_port_in16}, and \apifunc{seL4\_X86\_IOPort\_In32}{x86_io_port_in32} methods, which allow for reading of 8-, 16- and 32-bit quantities. Similarly, writing to I/O ports is accomplished with the \apifunc{seL4\_X86\_IOPort\_Out8}{x86_io_port_out8}, \apifunc{seL4\_X86\_IOPort\_Out16}{x86_io_port_out16}, and \apifunc{seL4\_X86\_IOPort\_Out32}{x86_io_port_out32} methods. Each of these methods takes as arguments an \obj{IO Port} capability and an unsigned integer~\texttt{port}, which indicates the I/O port to read from or write to, respectively. In each case, \texttt{port} must be within the range of I/O ports identified by the given \obj{IO Port} capability in order for the method to succeed. The I/O port methods return error codes upon failure. A \texttt{seL4\_IllegalOperation} code is returned if port access is attempted outside the range allowed by the \obj{IO Port} capability. Since invocations that read from I/O ports are required to return two values -- the value read and the error code -- a structure containing two members, \texttt{result} and \texttt{error}, is returned from these API calls. At system initialisation, the initial thread's \obj{CSpace} contains the \obj{IOPortControl} capability, which can be used to \apifunc{seL4\_X86\_IOPort\_Issue}{x86_ioport_issue} \obj{IO Port} capabilities to sub ranges of I/O ports. Any range that is issued may not have overlap with any existing issued \obj{IO Port} capability. \subsection{I/O Space} \label{sec:iospace} I/O devices capable of DMA present a security risk because the CPU's MMU is bypassed when the device accesses memory. In seL4, device drivers run in user space to keep them out of the trusted computing base. A malicious or buggy device driver may, however, program the device to access or corrupt memory that is not part of its address space, thus subverting security. To mitigate this threat, seL4 provides support for the IOMMU on Intel x86-based platforms. An IOMMU allows memory to be remapped from the device's point of view. It acts as an MMU for the device, restricting the regions of system memory that it can access. More information can be obtained from Intel's IOMMU documentation \cite{extra:vtd}. Two new objects are provided by the kernel to abstract the IOMMU: \begin{description} \item[\obj{IOSpace}] This object represents the address space associated with a hardware device on the PCI bus. It represents the right to modify a device's memory mappings. \item[\obj{IOPageTable}] This object represents a node in the multilevel page-table structure used by IOMMU hardware to translate hardware memory accesses. \end{description} \obj{Page} capabilities are used to represent the actual frames that are mapped into the I/O address space. A \obj{Page} can be mapped into either a \obj{VSpace} or an \obj{IOSpace} but never into both at the same time. \obj{IOSpace} and \obj{VSpace} fault handling differ significantly. \obj{VSpace} page faults are redirected to the thread's exception handler (see \autoref{sec:faults}), which can take the appropriate action and restart the thread at the faulting instruction. There is no concept of an exception handler for an \obj{IOSpace}. Instead, faulting transactions are simply aborted; the device driver must correct the cause of the fault and retry the DMA transaction. An initial master \obj{IOSpace} capability is provided in the initial thread's CSpace. An \obj{IOSpace} capability for a specific device is created by using the \apifunc{seL4\_CNode\_Mint}{cnode_mint} method, passing the PCI identifier of the device as the low 16 bits of the \texttt{badge} argument, and a Domain ID as the high 16 bits of the \texttt{badge} argument. PCI identifiers are explained fully in the PCI specification \cite{Shanley:PCISA}, but are briefly described here. A PCI identifier is a 16-bit quantity. The first 8 bits identify the bus that the device is on. The next 5 bits are the device identifier: the number of the device on the bus. The last 3 bits are the function number. A single device may consist of several independent functions, each of which may be addressed by the PCI identifier. Domain IDs are explained fully in the Intel IOMMU documentation \cite{extra:vtd}. There is presently no way to query seL4 for how many Domain IDs are supported by the IOMMU and the \apifunc{seL4\_CNode\_Mint}{cnode_mint} method will fail if an unsupported value is chosen. The IOMMU page-table structure has three levels. Page tables are mapped into an \obj{IOSpace} using the \apifunc{seL4\_X86\_IOPageTable\_Map}{x86_io_page_table_map} method. This method takes the \obj{IOPageTable} to map, the \obj{IOSpace} to map into and the address to map at. Three levels of page tables must be mapped before a frame can be mapped successfully. A frame is mapped with the \apifunc{seL4\_X86\_Page\_MapIO}{x86_page_map_io} method whose parameters are analogous to the corresponding method that maps \obj{Page}s into \obj{VSpaces} (see \autoref{ch:vspace}), namely \apifunc{seL4\_X86\_Page\_Map}{x86_page_map}. Unmapping is accomplished with the usual unmap (see \autoref{ch:vspace}) API call, \apifunc{seL4\_X86\_Page\_Unmap}{x86_page_unmap}. More information about seL4's IOMMU abstractions can be found in \cite{Palande:M}. \fi \section{Arm-Specific I/O} \subsection{Arm SMMU version 2.0} \label{sec:smmuv2} seL4 provides an API for programming the Arm System MMU (SMMU) version 2.0, which allows system software to manage access rights and address translation for devices that can initiate direct memory accesses (DMA). An Arm SMMU v2.0 implementation allows device memory transactions to be associated with an identifier (StreamID) that is used to direct the transaction through a SMMU translation context bank (CB). A translation context bank can perform address translation, memory protection and memory attribute transformation. The standard specifies different types of address translations that correspond to stages in the ArmV8 virtual memory system architecture such as either non-secure EL0, EL1 first and second stage translations, Hyp mode translations or secure mode translations. It is possible to associate different StreamIDs with the same context bank and it is possible to share address translation tables between a context bank and software MMU address space if the stage and type of translation is the same. Faults that occur when a memory transaction conflicts with a StreamID or CB configuration happen asynchronously with respect to a processor element's execution. When this occurs an interrupt is used to allow a PE to handle the SMMU fault. Faults are reported through registers in the SMMU that can be queried in an interrupt handler. TLB maintenance operations are required to keep SMMU translation caches consistent when there are changes to any valid page table mapping entries. An SMMU implementation usually has a maximum number of StreamIDs that it supports. The specificiation allows StreamIDs to be up to 16bits wide. There are also a fixed number of context banks, up to a maximum of 128. Context banks can be generic or support only a single address translation stage. This information is reported by ID registers in each implementation. The seL4 API allows system software to manage an SMMU by assigning StreamIDs to context banks, bind context banks to page translation structures, implement SMMU fault handling and also perform explicit TLB maintenance. This allows system software to ensure that a device is only able to access and modify memory contents that it has been explicitly given access to and allow devices to be presented with a virtualized address space for performing DMA. All the StreamIDs and context banks are accessible via capabilities. Control capabilities are used to create capabilities referring to each StreamID and context bank in a system. The kernel tracks the allocation of StreamIDs and context banks with two static CNodes, one for each resource type. These CNodes track which VSpace a context bank has bound to it, and which context bank a StreamID is bound to. The capabilities allow access control policies to be implemented by a user thread. When StreamID, context bank capabilities are revoked, the kernel will disable the context banks or StreamID mappings. TLB maintenance is handled by the kernel via tracking which context banks are associated with a particular VSpace. Any TLB maintenance operations that the kernel performs on VSpace invocations are also applied to associated context banks. SMMU fault handling is delegated to user level via invocations that allow fault statuses to be queried and cleared for each context bank and for the SMMU globally. SMMU fault interrupts can be handled the same as other platform level interrupts. The kernel implementation only uses translation stages matching what translation the kernel is performing for VSpace objects. When seL4 is operating in EL1, the SMMU only uses stage 1 translation (ASID), that is "stage 1 with stage 2 bypass" in the context bank attribute configuration. When hypervisor mode is enabled, and seL4 is operating in EL2, the SMMU only does stage 2 translations. Four capabilities types provide access to SMMU resources: \begin{description} \item[\obj{seL4\_ARM\_SID}] A capability granting access to a single transaction stream, which can be used to bind and unbind a stream to a single context bank. \item[\obj{seL4\_ARM\_CB}] A capbility representing a single specific context bank. It can be used to bind and unbind a VSpace to assign what page tables the context bank should use for translation, assign StreamIDs and process context bank faults. \item[\obj{seL4\_ARM\_SIDControl}] A control capability which can be used to create \obj{seL4\_ARM\_SID} capabilities to specific transaction streams. The \obj{seL4\_ARM\_SIDControl} cap is used for managing rights on StreamID configurations. This capability is provided in the initial thread's CSpace. \item[\obj{seL4\_ARM\_CBControl}] A control capability that can be used to derive \obj{seL4\_ARM\_CB} capabilities. The \obj{seL4\_ARM\_CBControl} cap is used for managing rights on context bank configurations. This capability is provided in the initial thread's CSpace. \end{description} \subsubsection{Creating \obj{seL4\_ARM\_SID} capabilities} \label{sec:smmuv2-creating-sel4-arm-sid-capabilities} The Arm SMMU 2.0 specification doesn't specify how StreamIDs need to correspond to different devices. Each platform can define its own policy for how StreamIDs are allocated. A \obj{seL4\_ARM\_SIDControl} capability can be used to create a capability to any valid StreamID for the SMMU and delegate access to other tasks in the system. \begin{description} \item[\apifunc{seL4\_ARM\_SIDControl\_GetSID}{arm_sid_controlgetsid}] uses the \obj{seL4\_ARM\_SIDControl} capability to create a new \obj{seL4\_ARM\_SID} capability that represents a single StreamID. This new capbility is placed in the provided slot. It is expected that whatever thread controls an \obj{seL4\_ARM\_SIDControl} capability knows about how StreamIDs are allocated in a system. \end{description} The Arm SMMU 2.0 specification descibes many ways of associating StreamIDs with context banks. Currently only direct mapping of a StreamID to a context bank is supported. \subsubsection{Creating \obj{seL4\_ARM\_CB} capabilities} \label{sec:smmuv2-creating-sel4-arm-cb-capabilities} Each context bank allows the SMMU to maintain an active translation context with it's own registers for holding context specific information. An SMMU has a fixed number of context banks available for use and these are allocated using the \obj{seL4\_ARM\_CBControl} capability. \begin{description} \item[\apifunc{seL4\_ARM\_CBControl\_GetCB}{arm_cb_controlgetcb}] uses the \obj{seL4\_ARM\_CBControl} capability to create a new \obj{seL4\_ARM\_CB} capability that represents a single context bank. This new capability is placed in the provided slot. It is expected that whatever thread controls a \obj{seL4\_ARM\_CBControl} capability has knowledge of the properties of each context bank that each index refers to. \end{description} \subsubsection{Configuring context banks} \label{sec:smmuv2-configuring-context-banks} By providing a \obj{seL4\_ARM\_CB} cap, a user-level thread can configure the VSpace used by the bank with the following API: \begin{description} \item[\apifunc{seL4\_ARM\_CB\_AssignVspace}{arm_cb_assignvspace}] configures the context bank to use the provided VSapce root for translations. \item[\apifunc{seL4\_ARM\_CB\_UnassignVspace}{arm_cb_unassignvspace}] removes the configured VSpace and conducting a TLB invalidation. \end{description} The SMMU-v2 uses the same paging structure as the MMU (AArch\_64 and AArch\_32 formats). Therefore, there is no need to provide a new set of page structure caps nor a separate set of map and unmap functions. To manage the assignment, the kernel has an internal CNode, called smmuStateCBNode, that stores copies of the \obj{VSpace\_cap} created by executing the above API. The copy of the \obj{VSpace\_cap} contains its assigned ContextBank number. Therefore the kernel can conduct context bank invalidation if the \obj{VSpace\_cap} is revoked. \subsubsection{Configuring streams (transactions)} \label{sec:smmuv2-configuring-streams-transactions} A user-level thread can bind a context bank with an \obj{seL4\_ARM\_SID} capability with: \begin{description} \item[\apifunc{seL4\_ARM\_SID\_BindCB}{arm_sid_bindcb}] configures the stream to use given context bank for translation. To simplify the process, the binding also enables the stream ID. \obj{seL4\_ARM\_SID\_BindCB} generates a copy of the \obj{seL4\_ARM\_CB} cap in kernel's internal CNode. This allows the stream ID to be disabled if the \obj{seL4\_ARM\_CB} cap is revoked. \item[\apifunc{seL4\_ARM\_SID\_UnbindCB}{arm_sid_unbindcb}] removes the \obj{seL4\_ARM\_CB} cap from the kernel's internal CNode and disables the stream ID. The kernel provides this API for the conveniences of sharing a stream ID among multiple VSpaces. \end{description} If there are any exceptions after the stream ID is enabled, the user-level software should use the fault handling mechanisms to resolve them. \subsubsection{Copying and Deleting caps} \label{sec:smmuv2-copying-and-deleting-caps} The kernel allows copying both \obj{ARM\_SID} cap and \obj{seL4\_ARM\_CB} cap. This allows capabilites to be delegated to different threads. The kernel does not allow copying neither the \obj{seL4\_ARM\_SIDControl} nor the \obj{seL4\_ARM\_CBControl} capabilities. Deleting a \obj{seL4\_ARM\_CB} cap that contains a valid capBindSID field will: \begin{itemize} \item invalidate the streamID to ContextBank assignment in hardware. \end{itemize} Deleting the last \obj{seL4\_ARM\_CB} cap will: \begin{itemize} \item perform an \apifunc{seL4\_ARM\_CB\_UnassignVspace}{arm_cb_unassignvspace}, removing any configured VSpace, \item conduct a TLB invalidation. \end{itemize} Similarly, deleting a VSpace\_cap that contains assigned context bank number will: \begin{itemize} \item invalidate the context bank \item conduct a TLB invalidation \end{itemize} Deleting the last ARM\_SID cap will: \begin{itemize} \item Perform an \apifunc{seL4\_ARM\_SID\_UnbindCB}{arm_sid_unbindcb}, (deleting the copy of the assigned \obj{seL4\_ARM\_CB} cap) \item Disable the stream ID. \end{itemize} \subsubsection{TLB invalidation} \label{sec:smmuv2-tlb-invalidation} The kernel is expected to perform all required SMMU TLB maintenance operations as part of the API implementation. In addition, the kernel provides two system calls for explicitly performing invalidations: \begin{description} \item[\apifunc{seL4\_ARM\_CBControl\_TLBInvalidateAll}{arm_cb_controltlbinvalidate}] invalidates all TLB entries in all context banks. \item[\apifunc{seL4\_ARM\_CB\_TLBInvalidate}{arm_cb_tlbinvalidate}] invalidates all TLB entries in a context bank. \end{description} The kernel does not impose any restrictions on how a VSpace is used by user-level applications, hence a VSpace can be shared by normal threads and drivers. Sharing a VSpace between threads and drivers also means sharing all mappings in that VSpace between MMUs in CPU cores and SMMU used by device transactions. Moreover, multiple context banks in SMMU can share a VSpace. Therefore, maintaining the coherency between the TLB in MMU and the TLB in SMMU's context banks is important. The kernel keeps a record of Vspace's usage in context banks in SMMU by maintaining: the number of context banks using a given ASID, and the ASID that a given context bank is using. There are a few reasons behind this design. \begin{itemize} \item First, the ASID is efficient for representing a VSpace. In seL4, each VSpace has an ASID which is assigned before the VSpace is ready to be used and will never change until the VSpace is deleted. Recording how many context banks are using a VSpace's ASID is equivalent to recording the VSpace's usage in context banks. \item Second, all TLB invalidation operation requires knowledge of the ASID. There are two types of TLB invalidation operations: invalidating a page table entry using its ASID (triggered by updating a page table entry, e.g. unmapping a page), and invalidating all mappings of an ASID (triggered by deleting a VSpace). \item Third, the kernel can easily find a context banks' ASID on all occasions, which is useful to either conduct TLB invalidation requests or unassign VSpace from a context bank. \end{itemize} By knowing how many context banks are using an ASID, the kernel can easily check in every TLB invalidation operation and invoke TLB invalidation in SMMU if the value is not zero. In SMMU's TLB invalidation operation, the kernel searches the context banks using the ASID, and conducts TLB invalidation in those context banks. Ideally, the SMMU shares the same ASID or VMID name space with the rest of the system. This allows the SMMU to maintain TLB coherency by listening for TLB broadcasting messages. This means the context banks should be configured with the correct ASID or VMID when the StreamID is enabled. This is not a problem for stage 1 translation, as there are a large number of ASID bits and an ASID can be assigned to a vspace root with existing APIs. However, the VMID used in stage 2 only has 8 bits, and the kernel allocates them on demand and can reclaim a VSpace's hardware ASID to reuse if there are more VSpaces than available ASIDS. While it is possible to do this when the VSpace is only used in an MMU, it is not possible with multiple active context banks. Due to this, the context bank in SMMU cannot be configured with the correct VMID. Currently, the SMMU driver uses private VMID space, and uses the context bank number as the corresponding VMID number. \subsubsection{Fault handling} \label{sec:smmuv2-fault-handling} The number of IRQs used for reporting transaction faults is hardware dependent. There are two kinds of faults: global faults ( general configuration and transaction faults), or context bank faults. For transaction faults, the SMMU reports faulty stream IDs. The global faults reports: \begin{itemize} \item Invalid context fault. \item Unidentified stream fault. \item Stream match conflict fault. \item Unimplemented context bank fault. \item Unimplemented context interrupt fault. \item Configuration access fault. \item External fault. \end{itemize} Each context bank contains registers to report faults on address translation, for example, faulty addresses, or permission errors. The SMMU driver identifies the cause of a fault by first reading the global fault registers (one state register and three fault syndrome registers), then by reading corresponding context bank fault registers. Note, the SMMU reports the faulty transaction (stream) ID, which can be used to identify its context bank ID. \begin{itemize} \item System assumption: Both the SMMU's IRQ handler and the owner of the \obj{seL4\_ARM\_SIDControl} cap (controlling stream ID distributions) are trusted. \item SMMU interrupts are handled as same as other IRQs, i,e, the kernel does not treat the SMMU IRQs special, reporting the interrupt via IRQ notifications. \item The kernel provides a API for reading the global fault registers: \apifunc{seL4\_ARM\_SIDControl\_GetFault}{arm_sid_controlgetfault}. Because the IRQ notification can only deliver information via the badge, the owner of the StreamControl\_cap can retrieve more information via this API. \item If the fault is related to a transaction, the owner of the \obj{seL4\_ARM\_SIDControl} cap will notify the holder of the corresponding stream ID cap, which should also have a copy of the context bank cap bound to this transaction. \item The kernel provides an API for reading the context bank fault registers: \apifunc{seL4\_ARM\_CB\_CBGetFault}{arm_cb_getfault}, used by a context bank cap holder (the \obj{seL4\_ARM\_CB} cap holder). \item Once the fault handling is done, the server can call \apifunc{seL4\_ARM\_CB\_CBClearFault}{arm_cb_clearfault} to clear the fault status on a context bank, and \apifunc{seL4\_ARM\_SIDControl\_ClearFault}{arm_sid_controlclearfault} to clear the fault status on SMMU. \end{itemize}