1.. _rt_performance_tuning:
2
3ACRN Real-Time (RT) Performance Analysis
4########################################
5
6The document describes the methods to collect trace/data for ACRN real-time VM (RTVM)
7real-time performance analysis. Two parts are included:
8
9- Method to trace ``vmexit`` occurrences for analysis.
10- Method to collect Performance Monitoring Counters information for tuning based on Performance Monitoring Unit, or PMU.
11
12vmexit Analysis for ACRN RT Performance
13***************************************
14
15``vmexit`` are triggered in response to certain instructions and events and are
16a key source of performance degradation in virtual machines. During the runtime
17of a hard RTVM of ACRN, the following impacts real-time deterministic latency:
18
19  - CPUID
20  - TSC_Adjust read/write
21  - TSC write
22  - APICID/LDR read
23  - ICR write
24
25Generally, we don't want to see any ``vmexit`` occur during the critical section of the RT task.
26
27The methodology of ``vmexit`` analysis is very simple. First, we clearly
28identify the **critical section** of the RT task. The critical section is
29the duration of time where we do not want to see any ``vmexit`` occur.
30Different RT tasks use different critical sections. This document uses
31the cyclictest benchmark as an example of how to do ``vmexit`` analysis.
32
33The Critical Sections
34=====================
35
36Here is example pseudocode of a cyclictest implementation.
37
38.. code-block:: none
39
40   while (!shutdown) {
41         ...
42         clock_nanosleep(&next)
43         clock_gettime(&now)
44         latency = calcdiff(now, next)
45         ...
46         next += interval
47   }
48
49Time point ``now`` is the actual point at which the cyclictest app is woken up
50and scheduled. Time point ``next`` is the expected point at which we want
51the cyclictest to be awakened and scheduled. Here we can get the latency by
52``now - next``. We don't want to see a ``vmexit`` in between ``next`` and ``now``.
53So, we define the starting point of the critical section as ``next`` and
54the ending point as ``now``.
55
56Log and Trace Data Collection
57=============================
58
59#. Add time stamps (in TSC) at ``next`` and ``now``.
60#. Capture the log with the above time stamps in the RTVM.
61#. Capture the ``acrntrace`` log in the Service VM at the same time.
62
63Offline Analysis
64================
65
66#. Convert the raw trace data to human readable format.
67#. Merge the logs in the RTVM and the ACRN hypervisor trace based on time stamps (in TSC).
68#. Check to see if any ``vmexit`` occurred within the critical sections. The pattern is as follows:
69
70   .. figure:: images/vm_exits_log.png
71      :align: center
72      :name: vm_exits_log
73
74Collecting Performance Monitoring Counters Data
75***********************************************
76
77Performance Monitoring Unit (PMU) Support for the RTVM
78======================================================
79
80By default, the ACRN hypervisor exposes the PMU-related CPUID and MSRs to the RTVM.
81Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM.
82
83Perf/PMU Tools in Performance Analysis
84======================================
85
86Since users no longer need to expose PMU-related CPUID/MSRs to the VM, performance analysis tools
87such as ``perf`` and ``PMU`` can be used inside the VM to locate
88the bottleneck of the application.
89
90``Perf`` is a profiler tool for Linux 2.6+ based systems that abstracts away
91CPU hardware differences in Linux performance measurements and presents a
92simple command-line interface. Perf is based on the ``perf_events`` interface
93exported by recent versions of the Linux kernel.
94
95``PMU tools`` is a collection of tools for profile collection and
96performance analysis on Intel CPUs on top of Linux Perf. Refer to the
97following links for perf usage:
98
99  - https://perf.wiki.kernel.org/index.php/Main_Page
100  - https://perf.wiki.kernel.org/index.php/Tutorial
101
102Refer to https://github.com/andikleen/pmu-tools for PMU usage.
103
104Top-Down Microarchitecture Analysis Method (TMAM)
105==================================================
106
107The top-down microarchitecture analysis method (TMAM), based on top-down
108characterization methodology, aims to provide an insight into whether you
109have made wise choices with your algorithms and data structures. See the
110Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual
111<http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_,
112Appendix B.1 for more details on TMAM. Refer to this `technical paper
113<https://fd.io/docs/whitepapers/performance_analysis_sw_data_planes_dec21_2017.pdf>`_
114that adopts TMAM for systematic performance benchmarking and analysis
115of compute-native Network Function data planes that are executed on
116commercial-off-the-shelf (COTS) servers using available open-source
117measurement tools.
118
119Example: Using Perf to analyze TMAM level 1 on CPU core 1:
120
121   .. code-block:: console
122
123      perf stat --topdown -C 1 taskset -c 1 dd if=/dev/zero of=/dev/null count=10
124      10+0 records in
125      10+0 records out
126      5120 bytes (5.1 kB, 5.0 KiB) copied, 0.00336348 s, 1.5 MB/s
127
128      Performance counter stats for 'CPU(s) 1':
129
130              retiring bad speculation frontend bound backend bound
131      S0-C1 1 10.6%               1.5%           3.9%         84.0%
132
133      0.006737123 seconds time elapsed
134