Research On Enzyme Identification Based On Feature Selection

Posted on:2012-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:J X Wei

Full Text:PDF

GTID:2120330332990192

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In bioinformatics, identifying enzymes from proteins is a prerequisite for further research in enzymes. Its method of research is that taking known enzymes as research object and finding a method could identify enzymes with high accuracy, then applying in identifying unknown enzymes. The traditional method used in enzymes identification is alignment. Although many scientists do lots of work to improve alignment, the method still needs big storage space and computing time. In recent years, machine learning has been applied in this area. SVM(Support Vector Machine), which is a kind of machine learning based on statistics, has been the focuses in the research for its advantage of absence of local minima and prevent over-fitting and performed quit well in enzymes identification.In order to get well result in machine learning, the researchers need to design an integral and effective scheme based on real problem. In this paper, it adopt feature selection as machine learning scheme and took proper number of features as training data to create a identifier with high accuracy. The reason why we do this is: in this experiment, functional domains are taken as features and not all the functional domains exert positive effects on forming good enzymes identifier. As well as, we suggest that there exists noise functional domain. So the opposing functional domains should be removed from the features.Based on the above reasons, two feature selection methodsâ€”1-rule and information gain were adopted to compute features information in this paper. Then we sorted the features based on values of features information and tested different numbers of features in some order to judge which number of features were the best choice for building a identifier with best accuracy. For evaluating the results overall, we adopted self-consistency test and leave-one-out test which are widely used. Through analysis of the experimental results, the identifier, which was built on 1-rule feature selection or information gain feature selection, performed much better than that built on other methods.In this paper, our method computes faster than alignment by adopting SVM as learning machine. Through feature selection and the improvement in machine learning scheme, the enzyme identifier performed much better than those previous.

Keywords/Search Tags:

enzyme identification, support vector machine, feature selection, self-consistency test, leave-one-out test

PDF Full Text Request

Related items

1	Application Of Runway Test And K - S Test In Gene Selection
2	Feature Selection And Application Based On High-dimensional Independence Test
3	Digital Insect Identification Based On Support Vector Machine
4	Prediction Of Plant MicroRNA Using Support Vector Machine
5	Two Models Of Parameter Selection In V-Support Vector Machine
6	Dimension Reduction Of EEG Signals Based On Feature Selection
7	Identification Of The A-to-I RNA Editing Sites Based On Support Vector Machine And Large-scale Detection Of Human Tissue-specific A-to-I RNA Editing Events
8	Identification Of Prokaryotic Cell Wall Lyases Based On Feature Selection
9	Extraction Of Thematic Information Based On Svm Remote Sensing Data
10	Research On Feature Selection For Gene Expression Data