Font Size: a A A

A Study Of DNA Sequence Classification Based On Hidden Markov Model

Posted on:2016-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y M GuoFull Text:PDF
GTID:2297330473456948Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the implementation of the human genome project, bioinformatics research is developing unceasingly. In recent years, the prominent characteristic of development biology is the exponential growth of biological information. The explosive growth of data has arise a serious of new problems:how to manage, interpret and make full use of these huge amounts of information? It is important to study DNA sequences for interpreting the human genome structure and the hidden function, but the DNA sequence is quite different from the numeric data. The DNA sequence is formed by the numerical symbols, such that the traditional distance measure cannot be directly used. At the same time, due to the correlation between DNA bases, if we use traditional method to study DNA sequence, the information in which will lost. Influenced by the characteristics of DNA sequence, many classification methods which are well on numeric data are unable to obtain better classification result when they apply to DNA sequence, as a result, it is necessary to apply some special methods to conduct DNA sequence classification.In this paper, The paper starts with the biological characteristics of DNA sequences and the statistical model, then introduce the probability and statistics characteristic of the DNA sequence, based on which we analyze the DNA sequence in-depth according to two key issues:feature representation and classification based on the model. A new DNA sequence feature representation method is proposed to classify sequence on the basis of hidden markov model. At the same time, the second-order hidden markov models is discussed in the application of DNA sequence classification. Finally, in view of the massive amounts of biological data, we propose a method combining the ensemble learning and the model based sequence classification, which possess important theoretical significance and practical application value. The main work and contributions are as follows:1. In view of that the classification accuracy is easily influenced by incomplete Bases in DNA sequences, we propose a new DNA sequence feature representation method, based on which, a kind of K-NN classifier is constructed to conduct DNA sequence classification.2. On the basis of analyzing structure characteristic of the biology in DNA sequence, we put forward a new second-order hidden markov model for DNA sequence classification. Based on the new model, a new bayesian classification method is proposed.3. In view of the massive amounts of biological data, the disadvantages of batch study exposure. We propose a incremental second-order hidden markov model which combining the ensemble learning to classify DNA sequence, which realize the incremental learning of DNA sequence’s classification mode and improve the processing capacity in huge amounts of data.
Keywords/Search Tags:DNA sequence, classification, hidden markov model, incremental learning, integrated learning
PDF Full Text Request
Related items