Font Size: a A A

The Study Of Protein Classification Based On The New Feature Extraction Algorithm

Posted on:2013-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:G Q JianFull Text:PDF
GTID:2230330374983371Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
In the post-genome era, predicting the types of proteins by molecule biology experiments is time-consuming and costly when confronted with the tremendous amount of sequence information. What’s more, it may encounter some difficulties in the experiments that can not be solved at present. So it is important to develop new bioinformatics tools and explore efficient and reliable computer algorithms for studying the problem of protein classification.Feature extraction of protein sequences is a basic problem in the research of protein classification based on calculation, and is also a key factor that determines the classification performance. Based on feature extraction in the prediction of protein classification problem, this paper propose two new protein vector. One is based on the pK value and the other is based on the relative frequency. There are three basic types of problems in the research of protein classification and many datasets are tested, which makes people easy to believe our new method is effective and feasible. The main work and the creative achievements in this thesis are shown as followed:(1) Apoptosis proteins play an important role in the growth and homeostasis of organism. Functions of those proteins will be helpful to make the mechanism of programmed cell death clear. The knowledge of the subcellular location of apoptosis protein is important to understand the function of apoptosis proteins. This article uses the new vector based on the pK value to predict the two classical datasets and receive good results. For the dataset CL317, the overall accuracy in Jackknife test is91.8%, about0.7percentile higher than that of the best existing results. For the dataset ZD98, the overall accuracy of this model in Jackknife test is94.9%. almost the same with the best results and about2--10percentile higher than that of the existing general results。(2) Outer membrane proteins (OMPs) are of primary research interest for antibiotic and vaccine drug design as they are on the surface of the bacteria. Discriminating outer membrane proteins from other folding types of globular and inner membrane proteins is an important task both for discriminating outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. Recently, several methods have been proposed for discriminating OMPs from protein sequences. In this paper, we use two new feature factor to predict the three datasets and gain good results. For the dataset GS1319, Y970and P1087, the overall accuracy in Jackknife test is95.6%,96.1%and94.2%separately. The results show that the proposed algorithm is no less than the existing prediction methods and the new algorithm is more simple and easy to implement.(3) Biomembrane occupy an important position in the field of biological research, the membrane protein is the main embodiment of the biofilm. Membrane protein is a class of unique protein structure. It is embedded in the membrane lipid specificity which makes it at the junction between the cell and the outside world. It also can formate the membrane receptors, carriers, enzymes and antigens. So we can predict the functions by predicting their classification. This article predict the membrane proteins using the new method based on pK value and relative frequency. For the dataset CS3249created by Chou and Shen, the overall accuracy in Jackknife test is76.6%and71.8%in the two methods separately. The overall prediction reasult is good and we can find that the method basd on pK value in better than the method basd on relative frequency when facing the large data sets.In general, the two new methods not only contain more information of protein sequence, but also greatly decreases the computation complexity. They solve the problem of the computational complexity and the limited status of the application according to the conventional amino acid composition.
Keywords/Search Tags:protein classification, feature extraction, apoptosis protein, OMPs, membrane protein
PDF Full Text Request
Related items