Font Size: a A A

Research On Protein Submitochondria Location Prediction Based On Machine Learning

Posted on:2020-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:W Y QiuFull Text:PDF
GTID:2370330590952908Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,the exponential growth of sequence data in protein databases inevitably contains important biological laws.How to use machine learning to accurately predict the location of protein submitochondria has become a challenging task in bioinformatics and proteomics research.Moreover,the study of protein submitochondria localization will play an important role in understanding the structure and function of proteins,and has long-term research significance for the evolution of life and the mechanism of disease.This paper studies the submitochondria protein localization based on the machine learning method.The main research results are as follows:1.We proposed a new method,PseAAC-PsePSSM-WD,for protein submitochondrial localization prediction.Firstly,the pseudo-amino acid composition(PseAAC)and the pseudo-position specific scoring matrix(PsePSSM)were fused to extract the submitochondrial sequence features.Then,the extracted feature vectors are processed using two-dimensional wavelet denoising(WD).Finally,the best feature vector after noise reduction is used to predict the location of protein submitochondria using support vector machine(SVM).Use the jackknife test and compare it to other prediction methods.The results show that the method in this paper is significantly better than the existing research results,and can provide a new method for other protein subcellular organ localization prediction.2.We proposed a new method,SubMito-XGBoost,for protein submitochondrial localization prediction.Firstly,the four feature extraction methods of g-gap deptide composition(g-Gap DC),PseAAC,auto-correlation function(ACF)and Bi-gram position specific scoring matrix(Bi-gram PSSM)are used to extract the feature information of the submitochondrial protein sequences.Due to the unbalanced sample datasets M317,M983 and M495,three datasets are processed by synthetic minority oversampling technique(SMOTE).Then the ReliefF algorithm is used to reduce the dimension of the high-dimensional feature vectors.Finally,the optimal feature vectors were classified using the eXtreme gradient boosting(XGBoost)to predict the protein submitochondria localication.This paper has obtained satisfactory prediction results by jackknife test and compared with other prediction methods.The results show that the prediction results of the proposed method are significantly better than the existing research results,and can provide a new tool for other substructure prediction.
Keywords/Search Tags:machine learning, submitochondrial localization, multi-information fusion, synthetic minority oversampling technique, support vector machine, eXtreme gradient boosting
PDF Full Text Request
Related items