Bio informatics is a new science field. Research in this field involves multi-disciplines such as biology, computer science, mathematics, etc. Bioinformatics is subject to expose the biological signification of large amount of biological data and explore the mystery of life activities. The assembly of whole genome DNA sequence is an important task in the research of bioinformatics. Sequence assembly is an important and time-consuming procedure in the commonly used Shotgun sequencing method. The key point of this research is how to improve the speed of sequence assembly.After analyzing existing assembly methods and software, a novel parallel algorithm for DNA sequence assembly on the distributed memory environment is presented in this thesis. The serial processing procedure and parallel algorithm for Overlap, Layout and Consensus of the DNA sequence assembly are described respectively. More parallel methods are presented and compared by analyzing division of the fragment data set and parallelizability of the serial procedure.Based on this algorithm, a software package named PL_Nphrap is implemented, and the data structure, sequence assembly procedure and optimization of communication are illustrated in detail respectively. The issues in sequence assembly procedure include: fragment alignment for Match and Read Pairs, Smith-Waterman algorithm, computing of LLR, output overlaps dynamically, computing of offsets for Layout, voting of fragments for consensus sequence, parallelism and communication.Finally, some experiments of this parallel algorithm are presented. The testing results indicate that the proposed algorithm is of high efficiency. |