Font Size: a A A

Recognition Of Splice Sites Based On Variable Length Markov Model

Posted on:2012-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiFull Text:PDF
GTID:2210330368992252Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the completion of human genome project, humanity has entered a post-genomic era, the focus of genome research has turned to genomic information analysis, and splicing is an important part of eukaryotic genome information analysis. Meanwhile, with the emergence of massive biological data, Bioinformatics technology has become the core technology of post-genomic era.Since the 1990s, a number of pattern recognition methods have been applied to splice site recognition, like Support Vector Machines, Hidden Markov Model, Neural Network and so on, which has achieved very good results. However, when these pattern recognition methods are applied to splice site identification process, there are still some problems, such as the parameters of characteristic sequence need to be artificially set, the selected input features are miscellaneous, and model does not reflect the probability of correlation between sites. Under certain circumstances, these issues may affect the generalization ability of the models and the classification results.In this thesis, splice site identification based on variable length markov model was done research deeply to solve these problems, the main research results are concluded as follows:1. Analyzed and summarized the advantages and disadvantages of variable length markov model used in splice site recognition;2. Provided a model which based on variable length markov model with KL divergence, to optimize model of feature selection. The direction of the extended sequence, this model effectively improves the ability to identify characteristic sequence;3. Provided a probabilistic suffix tree(PST) algorithm to train the conditional probability of variable length markov model(VLMM). This algorithm makes the model to process sequence variable length and variable-order, and saves a lot of storage space;4. Builted a splicing site recognition experimental system based on the proposed theoretical method, which verified the effectiveness of the provided method.Finally, the research work involved in the thesis was summarized and the future developments were forecasted.
Keywords/Search Tags:Splice Site Recognition, Variable Length Markov Model, KL Divergence, Probabilistic Suffix Tree
PDF Full Text Request
Related items