Font Size: a A A

Identifying DNA-binding Proteins Based On Feature Construction And Feature Selection

Posted on:2018-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2310330515952779Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
DNA-binding proteins play a vital important role in molecular biology,it has an impact on genes regulation,DNA replication and so on.However,DNA-binding proteins identified by experimental techniques,which are time-consuming and expensive.The method of data analysis has been becoming animportant way to identify DNA-binding proteins.In order to improving the identificationperformance,the feature engineering of DNA-binding proteins identification problem,including feature construction and feature selection,was studied according to its data characteristics,the feature engineering can effectively improve the DNA-binding proteins identification performanceand provide a simple and efficient method for identifying DNA-binding proteins.The main work of this paper is as followed.1)Feature construction:aiming at the characteristics of low similarity,independence and different sequence length between DNA-binding proteins,we describe DNA-binding proteins from five aspects:physicochemical properties,chaos game representation,fractal dimension,position specificity score matrix and spectrum analysis to construct high dimensional features.2)Feature selection:In order to reduce the redundant features and feature dimension,we use several classical feature selection method includingrecursive feature elimination method based on SVM and PLS,ReliefF algorithm based on classification interval to reduce its dimension.At the same time,the reason why SVM-RFE is superior to other feature selection methods is also analyzed.3)Identification analysis:We adopt SVM to realize the identification of DNA-binding proteins.The proposed method is validated through three independent public datasets,and the algorithm's veracity is tested by 30times 10-fold cross validation.Furthermore,the performance of the classifier is evaluated from the four performance indexes of prediction,which are accuracy,specificity,sensitivity and Matthews correlation coefficient.And the experimental results are analyzed from three different perspectives,which are performance index,discrete degree and feature distribution.Through the analysis and comparison of different feature construction and feature selection,the results show that the combination of multi-class feature construction and SVM-RFE feature selection method can effectively improve the identification accuracy of DNA-binding proteins.By analysis the distribution of selected feature subsets,various physicochemical characteristics and PSSMmatrix play an important role in the identification of DNA-binding proteins.Compared with DNA-Prot and hybrid fractal features,it also show that the proposed method is superior to them.In addition,the proposed method is not only suitable for the feature engineering of DNA-binding proteins,but also for the analysis of protein structure prediction and other proteins identification.
Keywords/Search Tags:DNA-binding proteins, feature construction, feature selection, identification performance
PDF Full Text Request
Related items