Font Size: a A A

Research On TA Protein Targeting Prediction Method Based On Intelligent Algorithm

Posted on:2020-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y L HeFull Text:PDF
GTID:2430330575953800Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the completion of sequencing biological gene,a lot of problems about gene recognition are faced with great challenges.Intelligent algorithms have unique advantages in dealing with such large amounts of data,noise patterns,and lack of unified theory.Studies have shown that intelligent algorithms can successfully solve these problems of bioinformatics.However,there are still many problems and challenges in gene sequence recognition and protein targeting prediction.For example,the accuracy of gene sequence recognition needs to be improved.The TA protein targeting prediction in plants has not yet been implemented by intelligent algorithm applications.Gene sequence recognition is often the research basis of intelligent algorithm application,and CpG islands recognition is an important topic of gene recognition.In response to the above problems,this paper has done a lot of research work,and has an in-depth understanding of CpG islands identification and TA protein targeting prediction.A large number of studies have shown that the accuracy of existing classical algorithm recognition and classification is relatively low,and the accuracy required by the research such as gene recognition and TA protein targeting prediction cannot be achieved.In view of the above problems,this paper takes CpG islands as an example of algorithm research,and TA protein targeting is used as a follow-up study.The intelligent algorithm can solve the above problems well.At the same time,the accuracy of recognition and classification is improved,and good results have been obtained in the research of CpG islands recognition and TA protein targeting.The main research contents of this paper are as follows:1.This paper proposes a CpG islands recognition method based on the combination of genetic algorithm and Hidden Markov model for the recognition of CpG islands.This method optimizes the Hidden Markov model parameters by GA algorithm,and the obtained model can be better used for CpG islands recognition.2.Through rigorous screening,428 eukaryotic TA proteins were obtained for TA protein targeting prediction.Seven TA protein sequence feature extraction methods were used in this paper.Hydrophobicity and charge were added in order to class feature training model.3.In this paper,a naive Bayesian feature extraction method was constructed to extract TA protein sequence features.The mRMR algorithm is used to select thefeature of the protein data.Finally,the support vector machine is used to train the model.The parameters and the penalty coefficient C are optimized based on the grid method during the training process,and the experimental results are analyzed.4.This paper trains and compares the effects of five machine learning models.That is,Random Forest(RF),Logistic Regression(LR),Naive Bayes(NB),K-Nearest Neighbor(KNN),and Gradient Boosting Decision Tree(GBDT).Finally,the ability of global search of genetic algorithm is used to optimize the HMM parameters,which can improve the accuracy and recall rate of CpG islands recognition.For TA protein targeting prediction,a feature extraction method was used to extract protein sequence features.By integrating the classification results of multiple algorithms,the targeting of TA proteins in subcellular organelles can be better predicted.The prediction accuracy is 84%.
Keywords/Search Tags:Genetic Algorithm, HMM, CpG Islands, Terminal-Anchored Proteins, Feature Extraction, Model Ensemble
PDF Full Text Request
Related items