Font Size: a A A

A Method And Its Application Research For Protein Subcellular Localization Prediction Based On Multi-label Learning

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2480306548997009Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The role of protein in cells can participate in the growth,development and reproduction of living organisms.Studies have found that multi-label proteins carry more cellular functions.The study of multi-label protein subcellular localization(PSL)can help people understand the pathogenesis of more diseases.It also can provide reference information for the prevention and diagnosis of cellular diseases,and reveal the important role of multi-label proteins in pharmaceutical compounds.With the continuous growth of multi-label protein data,traditional research methods cannot be the main means for researchers to explore.Therefore,it is more and more important to use machine learning to predict multi-label protein PSL.This paper conducts in-depth research on the PSL of multi-label learning.The research content is as follows:1.A new method called Mps LDA-Pro SVM for multi-label PSL prediction is proposed.Firstly,we utilize four coding algorithms including pseudo-position specific scoring matrix,gene ontology,conjoint triad and pseudo-amino acid composition to draw the information from sequences and fuse multi-information.Then,for the first time,we use a weighted multi-label linear discriminant analysis framework based on entropy weight form to refine and purify features.Finally,we input the optimal feature subset into the multi-label learning with label-specific features and multi-label knearest neighbor algorithms to obtain a synthetic ranking of relevant labels,and then use Prediction and Relevance Ordering based SVM(Pro SVM)classifier to predict the PSLs.Tested by leave-one-out cross-validation,the OAA value on virus,plant,Grampositive bacteria and Gram-negative bacteria datasets are 98.06%,98.97%,99.81% and98.49%,which are 0.56%-9.16%,1.07%-30.87%,0.21%-6.91% and 3.99%-8.59%higher than other advanced methods respectively.By comparison,the model Mps LDAPro SVM can effectively predict the specific location of multi-label proteins in cells.2.A method for multi-label PSL prediction based on multi-view learning is proposed,which is called Mps-mv RBRL.Firstly,pseudo-position specific scoring matrix,dipeptide composition,position specific scoring matrix?transition probability composition,gene ontology and pseudo-amino acid composition algorithms are used to obtain numerical information from different views.Based on the contribution of five individual feature extraction methods,differential evolution is used for the first time to learn the weight of single feature,and then these original features use a weighted combination method to fuse multi-information.Secondly,the fused high-dimensional features use a weighted linear discriminant analysis framework based on binary weight form to eliminate irrelevant information.Finally,the best feature vector is inputted into the joint ranking SVM and binary relevance with robust low-rank learning(RBRL)classifier to predict the PSL.After applying leave-one-out cross-validation,the OAA and OLA of Mps-mv RBRL on training set of Gram-positive bacteria are both 99.81%.The OAA on the test sets of virus,plant and Gram-negative datasets are 98.55%,97.24%and 98.20%,respectively,and the OLA are 97.62%,97.16% and 98.28%,respecively.The results show that the model proposed in this paper achieves good prediction performance and can effectively predict the location of multi-label protein subcellular.
Keywords/Search Tags:multi-label protein subcellular localization, machine learning, multi-information fusion, differential evolution algorithm, weighted multi-label linear discriminant analysis framework, ProSVM classifier, RBRL classifier
PDF Full Text Request
Related items