Research On Protein Subcellular Localization Based On Feature Extraction

Posted on:2023-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Jin

Full Text:PDF

GTID:2530306818997449

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of sequencing technology,a large number of protein sequences have emerged,and traditional biological experiments have been unable to meet the demand for accurate prediction.Protein subcellular localization based on bioinformatics has become one of the most important methods for location prediction and also an important content of proteomics.The prediction of protein subcellular localization is not only important for the study of protein structure and function,but also can promote the design and development of new drugs.In this paper,feature extraction methods and classification algorithms for protein subcellular localization are studied in depth.The main work is as follows:1.Based on protein evolutionary information and data segmented distribution,a new method for predicting single-location protein subcellular is explored.Focusing on the evolutionary information of protein homologous sequence,a novel feature extraction method PSSM-GSD is proposed on the basis of protein position specific score matrix.This feature reflects the segmented distribution of amino acid’s evolutionary information along the protein sequence to add more local information.After fusion of PSSM-GSD with AAO and AAPSSM method,it is put into support vector machine for subcellular localization.In view of the imbalance of dataset,SMOTE algorithm is used to generate the minority protein samples.Finally,the experiment is performed on the Gram-positive protein Gpos-m PLoc and Gram-negative protein Gneg-m PLoc datasets,and the overall accuracy is 82.0% and 79.5%respectively.2.Based on feature selection and dynamic classifier chain algorithm,a novel method for predicting multiple-location protein subcellular is further explored.There is a lot of evidence that some proteins exist at two or more subcellular sites,and the localization of these proteins is particularly important.In this paper,MULoc EL is constructed,which is a noval ensemble classifier for subcellular localization of multi-label proteins.AAOD,SDPP and CSPPC feature extraction methods are proposed based on the centralization trend and dispersion degree of data.The seven feature extraction methods are integrated to extract protein sequence information,evolutionary information and amino acid physicochemical information.PAGERANK algorithm is used to integrate multiple feature selection methods.The forward adding strategy is used to screen the optimal sub-features of 106 dimensions from the 702 dimensions.Based on the traditional classifier chain algorithm,the order of labels is dynamically adjusted according to the CEF index constructed by conditional entropy and F1 value,and the dynamic classifier chain algorithm DCC is proposed.On this basis,the final classification result is obtained by Bagging.MULoc EL achieves an overall accuracy of 84.0%on the Gram-negative protein Gneg-m PLoc dataset and can effectively predict multi-label protein subcellular location.

Keywords/Search Tags:

Position specific score matrix, Feature extraction, Feature dimension reduction, Multi-label learning, Ensemble classifier

PDF Full Text Request

Related items

1	A Multi-label Classifier Based On PSSM And GO For Predicting Protein Subcellular Localization
2	Protein Subcellular Localization Prediction From Multi-label Learning
3	Predicting Subnuclear Location Of Proteins And Subcellular Location Of Ncrnas Based On Multi-Information Fusion And Multi-Label Ensemble Classifier
4	Using Multi-label Learning Methods To Study Protein Subcellular Localization Prediction
5	The Research On Feature Extraction For The Prediction Of Amyloid Sequences Regions
6	Research On Multi-site Protein Subcellular Localization Prediction Method Based On Fusion Feature And Multi-label Deep Forest Model
7	Classification Methods For Hyperspectral Image By Multi-Classifier Ensemble And Spectral-Spatial Feature Combination
8	The Classification Prediction Of High Dimensional Data Of Membrane Protein Based On Multi-feature Fusion
9	Research On A New Intelligent Prediction Method For Multifunctional Enzymes
10	Research On Sequence Information Extraction Methods And Subcellular Location Prediction Of Proteins