| Multiprocessor System-on-Chip (MPSoC) is the most promising way to match the constraints in terms of performance and power consumption. And Network-on-Chip (NoC), which provides better scalability, low power, high reliability and performance for on-chip communication, makes it possible to enhance the scale and complexity of MPSoC. However, this framework is only the foundation of a complete solution. The key requirement is the effective utilization of MPSoC by end-users, which relies on better programming model essentially.Based on analyzing the disadvantages of traditional MPSoC design flow, an optimized one, which separates parallel-programming from the iterative exploration process, is proposed. To realize this design flow, a flexible parallel programming model MMPI (Multiprocessor Message Passing Interface) is proposed. Targeting high portability, scalability and low design and implementation cost, MMPI adopt MPI standard which is a language extension with API method. And to improve design efficiency, MMPI introduce a"mapping file"mechanism and a layered communication protocol stack, which decouples the program from task mapping pattern and underling hardware.Then the MMPI-based programming and communication system are implemented on a full-system co-simulation MPSoC platform. Aiming at high performance and resource utilization, temporal and spatial task-parallelisms are introduced and the communication mode of MPSoC is defined. Both point-to-point and collective communication are realized with an efficient reordering mechanism. MMPI and its implementation are well evaluated, which proves that programming mode not only resolves the problem of parallel programming, but also determines the performance and resource utilization. Through analyzing the composition and characteristics of communication cost, some valuable experience and reference are provided for the software designers and MMPI programmers.At last, HAL-based, algorithm-based and hierarchical communication mode are introduced to optimize the broadcast communication through reducing data-copy, improving communication parallelism and reducing network communication respectively. Those not only improve the performance, but also provide several efficient choices for optimization of collective communication. |