| The increasing complexity of embedded applications and the prevalence of heterogeneous multiprocessor-on-chip (MPSoC) introduce a great challenge for designers on how to achieve performance and programmability simultaneously in embedded systems. Software programmers need to address several problems such as designing communications between threads, avoiding deadlock caused by multithreaded program, adapting software to different processors, and implementing software for different communication protocols. Automatic multithreaded code generator adapted to different MPSoC architectures and communication protocols can be an effective solution.However, the increasing number of processors drives software designers to use finer-grained multithreaded software, which makes communications between threads occur more frequent, and system communication and synchronization cost be increased. Communication is becoming a major factor of system performance. In order to generate efficient multithreaded code, communication is concerned in fine-grained multithreaded system. Furthermore, task mapping, thread partition and scheduling are key issues of fine-grained multithreaded system, which impact code efficiency directly.The research work covers the following three aspects:1) Communication optimization techniques on fine grained multithreaded system. In order to avoid being affected by task mapping, thread partition and scheduling, this research is based on a system model (i.e. an application is partitioned into multiple threads, and threads are mapped to different processors). We first propose a method combining message aggregation and communication pipelining techniques, which can reduce communication cost dramatically. Then, we introduce multi-entry communication buffer technique to futher increase processor utilization, and apply it based on static analysis and dynamic emulation. However, the cyclic dependency may hinder the effectiveness of these techniques. We further propose a set of optimizations, including re-partition based on SCC (Strongly Connected Component) and pre-processing strategies, to reduce the number of communication channels that cannot be optimized.2) Research on task mapping, thread partition and scheduling. We first propose static mapping and scheduling approaches based on ILP (Integer Linear Programming) which concerns processor workload balance and system communication cost, to obtain optimal scheduling result. Furthermore, software pipelining technique is introduced to mapping stage to further raise processor utilization. The mapping approach considers workload balance as well as dependency topology between processors, to avoid pipeline stalling.3) Based on above research, we combine techniques on communication, task mapping, thread partition and scheduling optimizations to further improve system performance. |