Font Size: a A A

The Research On Similarity Analysis Methods Of DNA And Protein Sequences

Posted on:2010-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z B LiuFull Text:PDF
GTID:2120360275481833Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the completion and full implementation of biological genome project, it brings an increasing number of molecular sequence data. The focus of biological research transits from data accumulation to the data analysis and interpretation. Bioinformatics have emerged in this context. These researches'content and field are very rich, most of which give reasonable mathematical descriptions for molecular sequences. Over the past two decades, the mathematical descriptions of DNA and protein sequences have made an increasingly important role in the comparative analysis of biological sequences analyze research, and the corresponding numerical characteristics and analysis of similar have also been brought forward one after another. In this paper, we focus on such aspect as mathematic expression in DNA and protein sequences, similarity analysis and phylogenetic tree structure. Mainly include:After summarization mathematical methods of DNA and protein sequences in detail, the paper presents a 2D graphical representation of DNA sequences based on dual nucleotide, and graphical representation of the distribution curve of sequences with nucleotide sequences of the classification. According to the characteristics of graphical, we represent a new structure of covariance matrix CM, and then extracted 2-dimensional feature vector from the matrix to represent the amount of information in species. We verify the effectiveness of the method through the experimental analysis on 11 species of theβ-globin gene coding sequences.Based on the new classification of the amino acid, the paper presents 5D mathematic representation of protein sequences, and then obtains a M matrix on the basis of mathematic expression. Then we computed the mathematic invariable according to the M matrix, namely five-dimensional feature vector. According to the angle between two vectors, we analyzed the similarity of 13 kinds of the original sequence of coronavirus N protein. With the software PHYLIP, we create a phylogenetic tree structure and compare the experimental results with traditional one. The experimental results show that the mathematical model of this method is simple and low computational complexity and better results. Such method of mathematic representation and similarity analysis of protein sequences is a new impetus for the comparison of protein sequences.
Keywords/Search Tags:DNA sequence, Protein, Graphical representation, Bioinformatics, Analysis of similarity, Sequence alignment
PDF Full Text Request
Related items