Research On MPI Collective Communication For Multi-Clusters

Posted on:2014-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Cheng

Full Text:PDF

GTID:2248330395496756

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Recently, High Performance Computing (HPC) has made great achievements in researchfields such as Engineering, Science and Military. International scholars generally agree thatthe enormous challenge facing in the21th century such as Global Climate forecasting, geneticrecombination and geological resource exploration can only be solved by the HPC. Therefore,the high-performance computing has been drawing a great concern both at home and abroad.In the1990s, the United States formulated research programs on HPC fields, for instance,ASCI and HPCC. In our country, the government also put a highly degree of concern on it.Now, massively parallel computing, experiment and theory has been the three pillars forhuman to understand the natural world.MPI (Message Passing Interface) is the predominant messaging standard for many HPCapplications, it is independent of language and platform. As a Library, programmer candirectly call these interfaces by FORTRAN and C language. MPICH is an open sourceimplementation of the MPI standardï¼Œnow it is not only widely used in the industry but alsoattract the attention of the academic community.MSU and Argonne National Laboratorymade great contributions to MPICH. The emergence of MPICH facilitates people to writeefficient parallel applications. Furthermore, it promotes the development of high performancecomputing technology.MPI provides for a plenty of communication primitives, these primitives can be classifiedfor point-to-point or collective communications. By studying the profile we find that the timespending in collective operations accounts for great proportion of a transfer time in someapplications. Thus, in order to gain high speed-ups for parallel applications, we need toimprove the performance of collective communication. MPI_Bcast is one of the frequentlyused collective communications primitives, a so-called root process sends its message to allother processes. In this paper, we optimize collective operations on multi-core computingnodes based on mpich2and Balance_Bcast algorithms have been proposed on multi-clusters.This algorithm is mainly concerned about two key issues of the multi-cluster environment:(1) Cluster Size. The number of computing nodes in each cluster may be different.(2) Link Delay. Computing nodes are connected by high-speed network (like Infiniband) inthe cluster, but links delays between clusters are different from each other and relatively slower than those in local-cluster.In this paper, we focus on making the completed time of intra-cluster broadcasting be same.In MPI_Bcast in MPICH2, for small messages, binomial tree algorithms are widely employed.This algorithm doesnâ€™t consider the links between nodes, so slow links may be frequent useand thus may degrade the performance of the algorithm. Optimize collective operationperformance just by making each cluster as a communicator doesnâ€™t consider the differencesof cluster size, it maybe result that the time of the whole algorithm needed depend on somespecial cluster. Therefore the performance of this algorithm canâ€™t be always well.We put the two key issues (cluster size and link delay) together and propose a algorithm ofBalance_Bcast. The algorithm evaluates the time needed by each cluster to completebroadcasting, then sorting the cluster based on the evaluating time of each cluster, we send themessage to the more important cluster at first and repeat the evaluation and sort until eachcluster exist a messaging node at least, the scheduling algorithm stops. Finally, we usebinomial tree algorithms to broadcast in local-cluster.Due to the limitations of the network conditions, we use NS2simulation software togenerate the network topology and testing the algorithm we proposed. By detailed simulation,we find that Balance_Bcast algorithm perform better than the algorithm mentioned above andachieve the expected goals of this work.

Keywords/Search Tags:

HPC, Collective Communication Optimization, Multi-Core, Multi-Cluster

PDF Full Text Request

Related items

1	Research On Performance Optimization For Parallel Discrete Event Simulaiton On Multi-core Cluster
2	The Research And Implementation Of Optimization To Collective Communication In Cluster Computing Environment
3	Research On Key Technology Of Multi - Core Processor
4	A Research Of Dataflow Programming Model Oriented Multi-core CPU/Many-core GPU Heterogeneous Cluster
5	Research And Implementation Based On Multi-core PC Cluster Parallel Rendering System
6	The Design And Implementation Of Embedded Multi-core Processor Communication Methods
7	Multicore Processor Core Communications Technology Research
8	Design and evaluation of efficient collective communications on modern interconnects and multi-core clusters
9	Research On Inter-communication For Multi-core Processors
10	Optimization Of MPI Communication Library On KD60 Platform