Font Size: a A A

The Application Of Feature Extraction And Classification Algorithm In Predict Membrane Protein Classification Problem

Posted on:2011-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:L P WangFull Text:PDF
GTID:1480303341971359Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Gene is self-replication and preservation unit,Its physiological function is expressed in the form of the protein. There are about 30% of the protein is membrane proteins in Cells. As one of the main components of biomembrane,membrane proteins play a vital role in organisms.With the explosion of protein sequences generated, determination of membrane proteins types by molecule biology experiments is time-consuming ,what's more,it may encounter some difficulties in the experiments that can't be solved at present.Feature extraction of membrane protein sequences is a basic problem in the research of protein classification based on calculation,and is also a key factor that determines the classification performance.This paper studies Membrane protein sequence's feature selection algorithm and classification algorithms ,and to predicte membrane proteins. The main work and innovations of this thesis are summarized as follows:(1)linear dimensionality reduction algorithms are introduced to Predict membrane protein types. This thesis proposes that linear dimension reduction methods be applied to the membrane protein type prediction. Nowadays, In the membrane protein's feature extraction algorithm, Dipeptide composition (DC) has gradually been proven more effective than the conventional amino acid composition (AAC).Although using the dimensionality reduction algorithm helps to increase the predicting accuracy. However, a high dimensional disaster may be caused by using this representation method. Thus, a linear dimensionality reduction algorithm is introduced to extract the indispensable features from the high-dimensional DC space, respectively,and identify the types of membrane proteins based on the reduced low-dimensional features. Finally, experiment results show that using the proposed method to cope with prediction of membrane proteins types are very effective.(2)This thesis Propose five new Combined feature extraction algorithms . This thesis introduces the idea of linear dimension reduction, and construct two combination of feature extraction algorithm based on linear dimension reduction:combination of Dipeptide composition and the principal component analysis algorithm, we construct a new feature extraction algorithm DC_PCA ; Combination of dipeptide composition and linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_LDA. The experiment results show that using feature extraction algorithm based on linear dimensionality reduction to predict accuracy of Membrane protein types are higher than the traditional dipeptide composition (DC)and amino acid composition (AAC) methods.In order to obtain better classification performance of the membrane protein classification model and predicte structure and function information of membrane protein sequence, this thesis constructs three combination of feature extraction algorithm based on nonlinear dimensionality reduction algorithm: Combination of Dipeptide composition and the Kernel principal component analysis algorithm, we construct a new feature extraction algorithm DC_KPCA; Combination of the dipeptide composition and Kernel linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_KLDA; Combination of Dipeptide composition and neighborhood preserving embedding algorithm, we construct a new feature extraction algorithm DC_NPE. The experiment results show that using feature extraction algorithm based on nonlinear dimensionality reduction to predict accuracy of Membrane protein types are higher than the traditional dipeptide composition (DC)and amino acid composition (AAC) methods.To obtain the classification model with best classification accuracy, this paper construct a new feature extraction algorithm DC_KPCA; binding dipeptide composition and core linear discriminant analysis algorithm, we construct a new feature extraction algorithm DC_KLDA; binding dipeptide composition and neighborhood preserving embedding algorithm construct a new feature extraction algorithm DC_NPE.(3) DNA microarray technologies have changed the Methods and efficiency of biological technologies, and had a significant impact on the Genomics and post-genome, but it Presented new challenges for data analysis and information extraction to obtain a great deal of information. In order to solve the problem dimension of genetic data can not be sustained when a sharp increase in the higher classification accuracy and efficiency issues, this approximation in the traditional support vector machine (PSVM) based on the proposed dimension reduction proximal support vector machine (DRPSVM) of microarray data classification. DRPSVM using quadratic programming algorithm for dimensionality reduction, not only the classification of genetic data can be reduced to contain only linear equality constrained quadratic programming problem, while also similar to the traditional support vector machine (Proximal Support Vctor Machines, PSVM) based on the maintenance of a better classification accuracy and reduce the classification time and space complexity.
Keywords/Search Tags:Proteins, Bioinformatics, Gene, Membrane protein, Feature extraction, Linear dimensionality reduction algor ithm, Dipeptide composition, DRPS VM, Microarray data classification
PDF Full Text Request
Related items