Font Size: a A A

Prediction Of Protein Subcellular Localization By Using Machine Learning Method And Its Application

Posted on:2019-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2310330566465860Subject:Statistics
Abstract/Summary:PDF Full Text Request
The development of proteomics is a milestone when life science research has entered post-genome era,and it is also another “big data science” after genome research.Protein subcellular localization prediction research is an important content of proteomics and also a hot topic of bioinformatics.The study of single-site and multi-site protein subcellular localization is important for the study of the pathogenesis of certain diseases,drug design and discovery.Around this topic of prediction of protein subcellular localization by using machine learning method,the main work of this thesis is as follows:1.We propose a novel method for predicting apoptosis protein subcellular localization,called PsePSSM-DCCA-LFDA.Firstly,the protein sequences are extracted by combining pseudo-position specific scoring matrix(PsePSSM)and detrended crosscorrelation analysis coefficient(DCCA coefficient),then the extracted feature information is reduced dimensionality by local Fisher discriminant analysis(LFDA).Finally,the optimal feature vectors are input to the SVM classifier to predict subcellular location of the apoptosis proteins.Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods.The experimental results indicate that our method is quite high to be able to become a promising tool for further proteomics studies.2.We propose a novel method for protein subcellular localization prediction,called PseAAC-PsePSSM-WD.Firstly,the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition(PseAAC)and pseudo-position specific scoring matrix(PsePSSM),then the feature information of the extracted is denoised by two-dimensional(2-D)wavelet denoising.Finally,the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins.Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods.The results show that the proposed method can significantly improve the prediction accuracy of subcellular localization of apoptotic proteins,and is expected to be used for the prediction of other properties of proteins.3.We propose a novel method for protein subcellular localization prediction based on multi-label learning,called DMLDA-LocLIFT.Firstly,the protein sequences are extracted by using pseudo amino acid composition,pseudo-position specific scoring matrix,encoding based on grouped weight,dipeptide composition and GO information,respectively,and the five algorithms are combined.Then the extracted feature information is reduced dimensionality by direct multi-label linear discriminant analysis(DMLDA).Finally,the optimal feature vectors are input to multi-label learning with label-specific features(LIFT)classifier to predict the location of multi-label protein subcellular.Compared with other prediction methods,the accuracy have reached the highest by using jackknife test for gram-negative bacteria,gram-positive bacteria and plant datasets,respectively.The results show that can effectively predict protein subcellular localization based on multi-label learning.
Keywords/Search Tags:subcellular localization, multi-label learning, support vector machine, two-dimensional wavelet denoising, multi-label learning with label-specific features
PDF Full Text Request
Related items