Font Size: a A A

A Novel Model For Alignment-free DNA Sequence Similarity Analysis Based On Characterizations Of Complex Networks

Posted on:2018-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:P Y ZhongFull Text:PDF
GTID:2310330533966817Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the completion of the human genome project,especially with the development of the high-flux sequencing technology,human beings have accumulated a large amount of data of the DNA and protein sequences.It is a problem to analyze and understand the features,functions,structures and evolution of these DNA and protein sequences which biologists need to solve at present.It is also a very challenging job.In this paper,we study the evolutionary characteristics of the DNA and protein sequences with a new point of view,namely the similarity of the sequence.To structure a complex network with the DNA and protein sequences,and then to study the similarity of the DNA and protein sequences by using the features of the complex network.For the DNA sequences,at first,we construct five complex networks for the DNA sequences with the idea of the central dogma of the biology.To choose the mitochondrion DNA sequences of nine species as an example,constructing a vector for each DNA sequence with the characteristics of each network,constructing a similarity matrix according to the Euclidean distance and the vector cosine,and constructing the phylogenetic tree by using the similarity matrix.The resulting genetic relationship between the nine species is consistent with the actual situation.Then we analyze the similarity of the DNA sequences by using the local features of the complex network based on the 5 CIS nucleotide sequences of the constructed DNA sequences.Finally,we explain the similarity of the DNA sequences by taking the topological coefficients of the five nucleotide sequences of the species of the 9 species as examples.For the protein sequences,at first,we construct two networks for each protein sequence,which is a sequence of two or three amino acids.Then,we use the fourteen characteristics of each network to construct the vector for each protein sequence,and construct similarity matrix according to the Euclidean distance of the vector,and construct the phylogenetic tree by using the similarity matrix.The resulting genetic relationship between the ten species is consistent with the actual situation.We also analyze the similarity of the protein sequences based on the Euclidean distance of the vector by investigating the global features of only three amino acid sequences of CIS network.Then,we analyze the similarity of the two or amino acid sequence network and the amino acid sequence network based on the global features.The result shows that the phylogenetic tree of the ten species built from the similarity matrix of the Euclidean distance are on the same basic.Finally,we analyze the similarity of the protein sequences by using the local features of the complex networks of the two CIS amino acid sequences.
Keywords/Search Tags:DNA, protein, network, similarity analysis, phylogenetic tree
PDF Full Text Request
Related items