| In recent years, chip multi-core processors has become the mainstream, domestic chip Loongson also launched a quad-core processor-Loongson 3A. To take full advantage of multi-core processor resources, parallel programming has become more and more important. Programming parallel program has been the difficulty. Programmer changs the serial programs into parallel programs not only rely on analysis of algorithms and understanding program behavior, but also need to understand the operation of the hardware behavior. Therefore, programmers need a performance analysis tool to analyze the behaviors of programs and hardware.Most modern processors have performance counters (PMU), the performance information of applications, operating systems and processor they collect can help programmer to find hot spots and bottlenecks of applications.Based on the performance counters (PMU) of Loongson-3A platform we implemented a performance analysis tool TProfiler.We use the single process sampling method referring to the existing performance analysis tools VTune, Oprofile and Perf's principle.The main study contents include: (1) Determine the design of TProfiler based on the advantages and disadvantages of Oprofile and perf and combining the hardware features of Loongson-3A. (2) Design software architecture of TProfiler. We divide TProfiler into two modules: Front-end and Back-end. Front-end which runs on the user layer, is responsible for analyzing the performance information collected by Back-end; Back-end running on the core layer, is responsible for controlling the performance counters (PMU), collecting the hardware event information of applications. (3) Implement the functionalities of Front-end and Back-end.To support single process sampling, we add functions in process building and switch in kernel, and add data structueres in process descriptor for performance counters (PMU).We also add file map function for the data transfer between user and kernel.Finally, using compiler technology and anslysize binary file, TProfiler get the useful information to programmer.This paper has implemented a performance analysis tool TProfiler based on the performance counters (PMU) on Loongson-3A. It implements most functions of Oprofile. Comparing the experimental results of Oprofile and TProfiler, we can see that TProfiler has more accurate and extensive range sampling data. Finally, the paper also proposed the improvements of TProfiler and performance counters (PMU)'s shortcomings of Loongson-3A , and proposed improved methods initially. |