1.. _rt_performance_tuning: 2 3ACRN Real-Time (RT) Performance Analysis 4######################################## 5 6The document describes the methods to collect trace/data for ACRN real-time VM (RTVM) 7real-time performance analysis. Two parts are included: 8 9- Method to trace ``vmexit`` occurrences for analysis. 10- Method to collect Performance Monitoring Counters information for tuning based on Performance Monitoring Unit, or PMU. 11 12vmexit Analysis for ACRN RT Performance 13*************************************** 14 15``vmexit`` are triggered in response to certain instructions and events and are 16a key source of performance degradation in virtual machines. During the runtime 17of a hard RTVM of ACRN, the following impacts real-time deterministic latency: 18 19 - CPUID 20 - TSC_Adjust read/write 21 - TSC write 22 - APICID/LDR read 23 - ICR write 24 25Generally, we don't want to see any ``vmexit`` occur during the critical section of the RT task. 26 27The methodology of ``vmexit`` analysis is very simple. First, we clearly 28identify the **critical section** of the RT task. The critical section is 29the duration of time where we do not want to see any ``vmexit`` occur. 30Different RT tasks use different critical sections. This document uses 31the cyclictest benchmark as an example of how to do ``vmexit`` analysis. 32 33The Critical Sections 34===================== 35 36Here is example pseudocode of a cyclictest implementation. 37 38.. code-block:: none 39 40 while (!shutdown) { 41 ... 42 clock_nanosleep(&next) 43 clock_gettime(&now) 44 latency = calcdiff(now, next) 45 ... 46 next += interval 47 } 48 49Time point ``now`` is the actual point at which the cyclictest app is woken up 50and scheduled. Time point ``next`` is the expected point at which we want 51the cyclictest to be awakened and scheduled. Here we can get the latency by 52``now - next``. We don't want to see a ``vmexit`` in between ``next`` and ``now``. 53So, we define the starting point of the critical section as ``next`` and 54the ending point as ``now``. 55 56Log and Trace Data Collection 57============================= 58 59#. Add time stamps (in TSC) at ``next`` and ``now``. 60#. Capture the log with the above time stamps in the RTVM. 61#. Capture the ``acrntrace`` log in the Service VM at the same time. 62 63Offline Analysis 64================ 65 66#. Convert the raw trace data to human readable format. 67#. Merge the logs in the RTVM and the ACRN hypervisor trace based on time stamps (in TSC). 68#. Check to see if any ``vmexit`` occurred within the critical sections. The pattern is as follows: 69 70 .. figure:: images/vm_exits_log.png 71 :align: center 72 :name: vm_exits_log 73 74Collecting Performance Monitoring Counters Data 75*********************************************** 76 77Performance Monitoring Unit (PMU) Support for the RTVM 78====================================================== 79 80By default, the ACRN hypervisor exposes the PMU-related CPUID and MSRs to the RTVM. 81Note that Precise Event Based Sampling (PEBS) is not yet enabled in the VM. 82 83Perf/PMU Tools in Performance Analysis 84====================================== 85 86Since users no longer need to expose PMU-related CPUID/MSRs to the VM, performance analysis tools 87such as ``perf`` and ``PMU`` can be used inside the VM to locate 88the bottleneck of the application. 89 90``Perf`` is a profiler tool for Linux 2.6+ based systems that abstracts away 91CPU hardware differences in Linux performance measurements and presents a 92simple command-line interface. Perf is based on the ``perf_events`` interface 93exported by recent versions of the Linux kernel. 94 95``PMU tools`` is a collection of tools for profile collection and 96performance analysis on Intel CPUs on top of Linux Perf. Refer to the 97following links for perf usage: 98 99 - https://perf.wiki.kernel.org/index.php/Main_Page 100 - https://perf.wiki.kernel.org/index.php/Tutorial 101 102Refer to https://github.com/andikleen/pmu-tools for PMU usage. 103 104Top-Down Microarchitecture Analysis Method (TMAM) 105================================================== 106 107The top-down microarchitecture analysis method (TMAM), based on top-down 108characterization methodology, aims to provide an insight into whether you 109have made wise choices with your algorithms and data structures. See the 110Intel |reg| 64 and IA-32 `Architectures Optimization Reference Manual 111<http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>`_, 112Appendix B.1 for more details on TMAM. Refer to this `technical paper 113<https://fd.io/docs/whitepapers/performance_analysis_sw_data_planes_dec21_2017.pdf>`_ 114that adopts TMAM for systematic performance benchmarking and analysis 115of compute-native Network Function data planes that are executed on 116commercial-off-the-shelf (COTS) servers using available open-source 117measurement tools. 118 119Example: Using Perf to analyze TMAM level 1 on CPU core 1: 120 121 .. code-block:: console 122 123 perf stat --topdown -C 1 taskset -c 1 dd if=/dev/zero of=/dev/null count=10 124 10+0 records in 125 10+0 records out 126 5120 bytes (5.1 kB, 5.0 KiB) copied, 0.00336348 s, 1.5 MB/s 127 128 Performance counter stats for 'CPU(s) 1': 129 130 retiring bad speculation frontend bound backend bound 131 S0-C1 1 10.6% 1.5% 3.9% 84.0% 132 133 0.006737123 seconds time elapsed 134