Font Size: a A A

The Classification Prediction Of High Dimensional Data Of Membrane Protein Based On Multi-feature Fusion

Posted on:2019-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:S C PengFull Text:PDF
GTID:2370330548973461Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Membrane proteins play an important role in the study of proteomics as an embodiment and undertaker of protein function.Studies have shown that the occurrence of certain diseases is closely related to the function and structure of membrane proteins.Therefore,accurate classification prediction for membrane protein types becomes a crucial research topic.For mass membrane protein sequence data,not only many time and efforts are saved,but also the utilization rate of sequence data is improved with the methods of machine learning.Under the complex protein sequence information,with the physicochemical properties,sequence correlation and sequence evolution information of membrane proteins to extract effective features are the main approaches for processing sequence characteristics in this paper.In this paper,an expression method for feature fusion of membrane protein sequences is proposed to reduce dimensionality for high-dimensional features.Finally,a variety of classifiers are used,ensemble methods are introduced to carry out experimental comparison,we achieve state-of-the art performance and the experiment results illustrate the effectiveness of feature fusion method.The highlights of our study are shown as follows:(1)By analyzing the sequence information of membrane proteins,the feature extraction method is proposed by fusing pseudo-amino acid composition,dipeptide composition,amino acid attribute group and position-specific fraction matrix.The original membrane protein sequence was transformed into an 853 dimensional feature vector.This feature contains abundant sequence feature information,which laid a good foundation for the subsequent establishment of a reliable prediction model.(2)Membrane protein feature fusion method also brings the problem of information redundancy and the curse of dimensionality.Therefore,two dimensionality reduction algorithms are utilized in this study: Principal Component Analysis(PCA)and Linear Discriminant Analysis(LDA).Experiments show that the fusion expression method after dimensionality reduction can not only improve the computational efficiency,but also improve the classification performance.(3)In order to further improve the classification and prediction performance of membrane proteins,the Stacking which is an ensemble learning framework was introduced in this paper.The base classifiers in the framework include K Nearest Neighbor(KNN),Support Vector Machine(SVM),Neural Network(NN),and Random Forest(RF).The metaclassifier uses a Multivariate Logistic Regression(MLR).Experiments show that this scheme can obtain better prediction accuracy.
Keywords/Search Tags:Membrane protein type prediction, Feature fusion, Dimension reduction algorithm, Stacking ensemble learning
PDF Full Text Request
Related items