The Research On Similarity Analysis Methods Of DNA And Protein Sequences

Posted on:2010-01-13

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Liu

Full Text:PDF

GTID:2120360275481833

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the completion and full implementation of biological genome project, it brings an increasing number of molecular sequence data. The focus of biological research transits from data accumulation to the data analysis and interpretation. Bioinformatics have emerged in this context. These researches'content and field are very rich, most of which give reasonable mathematical descriptions for molecular sequences. Over the past two decades, the mathematical descriptions of DNA and protein sequences have made an increasingly important role in the comparative analysis of biological sequences analyze research, and the corresponding numerical characteristics and analysis of similar have also been brought forward one after another. In this paper, we focus on such aspect as mathematic expression in DNA and protein sequences, similarity analysis and phylogenetic tree structure. Mainly include:After summarization mathematical methods of DNA and protein sequences in detail, the paper presents a 2D graphical representation of DNA sequences based on dual nucleotide, and graphical representation of the distribution curve of sequences with nucleotide sequences of the classification. According to the characteristics of graphical, we represent a new structure of covariance matrix CM, and then extracted 2-dimensional feature vector from the matrix to represent the amount of information in species. We verify the effectiveness of the method through the experimental analysis on 11 species of theÎ²-globin gene coding sequences.Based on the new classification of the amino acid, the paper presents 5D mathematic representation of protein sequences, and then obtains a M matrix on the basis of mathematic expression. Then we computed the mathematic invariable according to the M matrix, namely five-dimensional feature vector. According to the angle between two vectors, we analyzed the similarity of 13 kinds of the original sequence of coronavirus N protein. With the software PHYLIP, we create a phylogenetic tree structure and compare the experimental results with traditional one. The experimental results show that the mathematical model of this method is simple and low computational complexity and better results. Such method of mathematic representation and similarity analysis of protein sequences is a new impetus for the comparison of protein sequences.

Keywords/Search Tags:

DNA sequence, Protein, Graphical representation, Bioinformatics, Analysis of similarity, Sequence alignment

PDF Full Text Request

Related items

1	The Research On Graphical Representation Of Protein Sequence And Application
2	The Algorithms Of Sequence Alignment In Bioinformatics
3	Analysis Of Biological Sequences Similarity And Research On κ-Word Model
4	Graphical Representation Of Protein Sequence And Its Similarity Analysis
5	Evolutionary Tree Algorithm Based On Similarity Analysis Of Dna Sequence 4d Study
6	The Research Of Graphical Representation Of Protein Sequences And Its Application
7	A Novel Graphical Representation Of Dna Based On Physico-chemical Properties Of Amino Acids And Similarity Analysis
8	Mathematical Description Of The Biological Macromolecules And Its Applications
9	The Prediction Model Of Protein Subcellular Localization Based On Graphical Representation
10	Graphical Representation Of Dna Sequences And Their Similarity Analysis