| Bioinformatics is a new cross subject with the startup of Human Genome Project (HGP), which is a science to deposit, search and analysis the biological data by the use of computer. The life science comes into the back genome era after the accomplishment of HGP. And it mostly takes genomes and proteomics as the study focus. In proteomics all of the proteins which exist and act in the cells are researched. However, the traditional method to study the single protein can not keep up with the back genome eras. Bioinformatics become more important in the resolution of advanced-structure protein.At present, various studies of the protein in full swing, but despite this, recognize the human protein is only the tip of the iceberg. To do research on proteins which the genes express by experiments is unrealistic, a practicable method is infer to the function of the new found protein by the sequence or structure alignment with the known protein. After sequence or structure alignment, we can know about the biologic function of the new protein. In the process of biological evolution, structure of protein is more conservative than its sequence, sequence changes may not necessarily change the structure of proteins, similar structure of the protein may have different sequences, and proteins with similar structures often have similar functions, so the structures need to pay more attention. In the active cells, proteins and other molecules through the appropriate combination to implement of almost all the main functions . The structure of the protein have a significant role to its function. Structural Genomics is the study under these conditions, through continuous improving understanding of protein structure can improve and perfect the new method of infer new proteins by the structure alignment of the protein. In structural biology, the structure comparison and evaluation similarity of the proteins is a hot field. Typically, to clarify the evolutionary relationship between proteins is a very important question for the biologist, which can be carried out by comparison of protein structure.In this paper, I make a summary of the existing structure alignment of the protein, including pairwise structure proteins comparison and multiple structure comparison. Then described in detail the structure alignment algorithm based on the distance matrix of protein MatAlign. The idea of aligning distance matrices to yield the alignment of protein structures has been previously used in the DALI method. However, MatAlign adopts a different approach. DALI sub-divides a distance matrix into 6*6 overlapping sub-matrices, finds the matching sub-matrix pairs from two proteins, and assemble these matching pairs into the final alignment by means of Monte Carlo optimization.On the other hand, MatAlign uses dynamic programming at two levels: first for row–row alignment and second for consolidating row–row scores into the initial alignment; and then iteratively refining the initial alignment into the final one based on the objective alignment score function. Again, although MatAlign utilizes the two-level dynamic programming strategy, it is substantially different from the double dynamic programming of SSAP. The two methods are diverse in their data representation, superimposition and score accumulation strategies. In addition, unlike DALI and many other methods such as VAST, MatAlign does not use any secondary structure information at all. Thus, the alignment results produced by MatAlign will not be affected by the choice of the secondary structure annotation method. MatAlign can be easily parallelized. Most of the running time of MatAlign is incurred in the step of all-against-all alignments of rows from two matrices. Since we have to perform multiple mutually-independent dynamic programming procedures in this step, we can simply reduce the running time by parallelizing them. MatAlign algorithm's time complexity is O (N4), if improve dynamic programming algorithm to obtain a smaller time complexity, it will have great practical significance the huge amount of protein data.Thus, on the basis of MatAlign algorithm, develop the sort distance matrix protein structure alignment algorithm SortMatAlign algorithm, by quick sorting the distance matrix ,then use the two-level dynamic programming algorithm, SortMatAlign produce the similar alignment results and premise with MatAlign .But, SortMatAlign is 18.276 times faster than MatAlign. At the same time, the relevance of biology at SortMatAlign algorithms for analysis, carried out on the SCOP protein structure database queries, have the similar query results with DALI algorithm, and better query results than CE, compared with the two methods at the same time, computing speed has been greatly improved. Experiments show that SortMatAlign not only have good alignment results and performance, but also good biological relevance. Thus SortMatAlign has an important biological significance. Put SortMatAlign applied to classification of protein structure, on the SCOP database of protein super-family level classification. ,produce similar classification accuracy with DALI, CE, MatAlign, and the classification speed has been greatly improved.The current multiple structure alignment algorithm is based on the pairwise structure alignment, now that SortMatAlign algorithm has very good results and good biological relevance, then its application to multiple structure alignment methods will be reflected a great advantage. For unknown or newly discovered protein molecules, through structural analysis, can guide the design of biological experiments to confirm function. Through structural analysis of protein structure, to confirm the functional unit or domain, and aim for the genetic manipulation to provide for the design of new proteins or protein has been modified to provide a reliable basis for the molecular design of new drugs to provide a reasonable target molecular structure. |