Font Size: a A A

Study Of Biological Sequence Comparison Algorithm

Posted on:2013-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X D GuoFull Text:PDF
GTID:2230330371461869Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the completion of HGP (Human Genome Project) and the implement of the genomeproject of some model organisms, the biological data increased dramatically. The deluge of databasebrings the new question that how to analyze and process the data to refine the useful information forthe human. This is a big challenge for the scientists on their way to explore the miracle of life. Tomeet the challenges, the core of the biological research is transmitting from accumulating data toanalyzing and explaining data, which produces Bioinformatics.Bioinformatics is an interdisciplinary, and it reveals the biological mystery, which is containedin the large and complex biological data, by the comprehensive utilization of biology, computerscience and information technology. The research area of Bioinformatics is very wide, and itincludes sequence comparison, gene recognition, molecular evolution, RNA and protein structureprediction, and so on. Most of them are based on sequence comparison. So biological sequencecomparison is not only one of the most basic and important subjects, but also has further effect onthe study of life science. This dissertation mainly studies the algorithms of biological sequencecomparison, and the main contents can be summarized as follows:1. In chapter 2, this paper summarized the research status of biological sequence comparison,focusing on sequence alignment algorithms and some of the classic alignment-free approaches. Wealso do the comparison of these various research methods, and clarify the differences andsimilarities, which provides a theoretical basis for our study in this dissertation.2. In chapter 3, we presented a novel method to analyze DNA sequences, which is based on LZcomplexity and dynamic programming algorithm. For any DNA sequence, firstly, it could be brokeninto a word set by the proposed method based on the segmented thinking of LZ complexityalgorithm. Then, motivated by the dynamic programming algorithm, we can analyze the similaritydegree of DNA sequences by measuring shared information among their word-sets. Finally, we cantest the proposed method according to the trails and comparing its performance with the traditionalmeasures.3. In chapter 4, this paper introduced a novel method to represent and compare biologicalsequences on the basis of visualization. Based on the whole distribution of the dual bases, weproposed a polar coordinates representation that maps a biological sequence into a closed curve.According to the characteristics of the closed curves, we use the segmentation algorithm to build thecurve tree, rather than the distance matrix, thereby reduces the loss of information during theconversion process. Then, a tree matching function was proposed to estimate the difference between the two biological sequences. Finally, we took two sets of standard data to assess the effectivenessof the proposed method according to the reasonable comparison method.
Keywords/Search Tags:Bioinformatics, Biological sequence comparison, sequence alignment, LZ complexity, polar coordinates representation, curve tree
PDF Full Text Request
Related items