DNA-binding proteins are essential for DNAs to function properly;they play fundamental roles in many biological processes.The traditional experimental lab methods of DNA-binding proteins identification are accurate but expensive.Therefore,the developing of effective computational tools for identifying DNA-binding proteins is becoming highly desirable in recent years.Based on the requirement on input data,there are two categories of computational prediction methods.One category is the structure-based-methods.These methods could provide better prediction result if the homogeneous structures are available.However,the requirement of tertiary structure information greatly hinders the application scope of these methods.The other category is the sequence-based-methods.It is more difficult to perform prediction using only the protein sequence,as it provides much less information than the structure.However,the easy accessibility of protein sequences makes these methods very applicable in general practice.In this paper,we aimed to propose a new sequence-based method to identify DNA-binding prteins.Firstly,we collected informative properties that had been proved related to DNA-binding interactions.Secondly,to decrease the influence of the noisy features,we introduced the binary firefly algorithm to perform feature selection in this work.Thirdly,the support vector machine(SVM)was employed to build the prediction model.With multiple informative features and effective optimization algorithm,our method achieves the 0.808 and 0.910 accuracy on two independent benchmark datasets,which outperforms many state-of-the-art predictors. |