| Performance analysis and optimization is an attractive direction for computing research ever since. The hotspot functions and instructions which always cover a small fraction of the whole program usually take the longest time to excute. The goal of performance analysis and optimization is to find out program hotspots, analyze where the bottleneck is and choose a suitable optimization method to make them execute faster. With the embedded system becoming more and more complex, the performance analysis becomes more difficult. To make the work more effective, a series of auxiliary tools are developed. At the present, analyze tools usually have application limitations. So if a systematical analyze is carried out, several analyze tools should be used together to obtain an accurate and comprehensive analysis.This dissertation mainly focuses on the performance analysis and optimization for Godson embedded system. First, we make an intensive study of popular approaches of performance analysis, the superiorities and deficiencies of them are pointed out after a compare. Then we implement an instruction analyzer'My-Analysis', which can be applied to assemble programs and Trace records which are output from simulators. Also, a script program'SeeProgram2008'is developed by integration of several opensource resources on web, which can draw a graph to show the calling-and-caller relationships in a program based on the information output from gprofile tool. Finally, focused on the complexity and efficiency problem of the full system analyzer oprofile, we develop a tool named E-profile, which can solve these problems and support continual automatic sampling of large testing benchmarks. Using Oprofile tool, Sim-godson simulator and My-Analysis instruction analyzer, and with the workload of EEMBC benchmark, we analyzed the CPU events such as IPC,Cache miss rate and branch misprediction in Godson embedded system. Later, a Four-Phase Manual Optimization (FPMO) method for the software pipelining is proposed, based on the detailed performance analysis and optimization of EEMBC Autocorrelation benchmark. In our experiment, the result shows the FPMO method obtained 40.678% performance promotion by increasing 2.04% code size while there is only 38.33% performance promotion at the cost of 33.35% code size expansion by the pure compiler automatic optimization method. In conclusion, the FPMO method has resolved the e program optimization problem of mbedded system with a limited hardware resources cost. |