Font Size: a A A

Research On Enzyme Identification Based On Feature Selection

Posted on:2012-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:J X WeiFull Text:PDF
GTID:2120330332990192Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In bioinformatics, identifying enzymes from proteins is a prerequisite for further research in enzymes. Its method of research is that taking known enzymes as research object and finding a method could identify enzymes with high accuracy, then applying in identifying unknown enzymes. The traditional method used in enzymes identification is alignment. Although many scientists do lots of work to improve alignment, the method still needs big storage space and computing time. In recent years, machine learning has been applied in this area. SVM(Support Vector Machine), which is a kind of machine learning based on statistics, has been the focuses in the research for its advantage of absence of local minima and prevent over-fitting and performed quit well in enzymes identification.In order to get well result in machine learning, the researchers need to design an integral and effective scheme based on real problem. In this paper, it adopt feature selection as machine learning scheme and took proper number of features as training data to create a identifier with high accuracy. The reason why we do this is: in this experiment, functional domains are taken as features and not all the functional domains exert positive effects on forming good enzymes identifier. As well as, we suggest that there exists noise functional domain. So the opposing functional domains should be removed from the features.Based on the above reasons, two feature selection methods—1-rule and information gain were adopted to compute features information in this paper. Then we sorted the features based on values of features information and tested different numbers of features in some order to judge which number of features were the best choice for building a identifier with best accuracy. For evaluating the results overall, we adopted self-consistency test and leave-one-out test which are widely used. Through analysis of the experimental results, the identifier, which was built on 1-rule feature selection or information gain feature selection, performed much better than that built on other methods.In this paper, our method computes faster than alignment by adopting SVM as learning machine. Through feature selection and the improvement in machine learning scheme, the enzyme identifier performed much better than those previous.
Keywords/Search Tags:enzyme identification, support vector machine, feature selection, self-consistency test, leave-one-out test
PDF Full Text Request
Related items