Font Size: a A A

Eukaryotic Gene Promoter Recognition Based On Optimized Support Vector Machine

Posted on:2016-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2310330512471039Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Promoter is an important component in gene expression.The research of promoter is aimed at establishing gene transcription regulatory mechanism.The recognition of promoter is a topic study in computing bioinformatics.There are five chapters in this paper.Firstly,we introduce the relevant knowledge of the promoter,and summarize the research progress of promoter recognition.Secondly,we recommend and put forward some efficient feature extraction methods.Thirdly,we introduce the machine learning method,and put forward PSO-SVM and GA-SVM algorithm for promoter recognition,which are based on genetic algorithm and particle swarm optimization of support vector machine.Fourthly,we describe the databases of the experimental data and evaluation method.Finally,machine learning methods are applied in the simulation experiments.Then we summary and analyze the experimental results.The main work of this paper can be summarized into two parts.1.Using efficient feature extraction methods.They mainly include the component likelihood score,position correlation weight matrix score,physical structure specificity index,and PZ curves biology specificity index features.The features cover the signal,space and graphics information of the sequence,and reflect the differences among different functional fragments of DNA sequence.2.Using optimized support vector machine algorithm.Support vector machine has deficiencies in parameter selection.Therefore,we optimize the parameters by genetic algorithm and particle swarm optimization.The extreme learning machine,support vector machine and random forest are several commonly machine learning algorithms.Then the simulation experiment is made based on a large number of original sequence data.We compare the promoter recognition results of optimized support vector machine with above-mentioned machine learning algorithms.The experimental results show that support vector machine based on particle swarm optimization algorithm is more effective in identifying the promoter sequence.5-fold cross validation accuracy rate in classifications of promoter-exon,promoter-intron and promoter-intergenic are all above 96%,and the Mathews correlation coefficients are all above 0.93.
Keywords/Search Tags:Particle swarm optimization, Support vector machine, Random forest, Extreme learning machine, Promoter prediction
PDF Full Text Request
Related items