Font Size: a A A

Prediction Method Research Of Protein Structure And Function Based On Amino Acid Sequence

Posted on:2012-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SunFull Text:PDF
GTID:2210330338969289Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
In recent years, high-throughput technologies make protein data with exponential growth. In these diverse, vast amounts of protein data, it contains a large number of innovative new laws of biology and new concepts. With the Human Genome Project accomplishment, a mass of data need to be predict through the theory and algorithm with high-speed, high accuracy and developable. Therefore, a new subject-bioinformatics comes into being. The relationship on the structure and function of proteins is an important question in bioinformatics fields and is one of the core problems in the post genome era. According to the research actuality of protein structure and function, multiple feature extraction and multiple clssifier were proposed to predict protein structure and function. The main contents are listed as follows:(1) A new model, based on discrete wavelet transform (DWT) and support vector machine (SVM), was proposed to predict the homo-oligomers protein. In this paper, discrete wavelet transform is employed to represent the characteristic information of the protein sequences. Then, different classifier algorithms are applied to class and predict the homo-oligomers protein. The overall accuracy rate of the jackknife test was obvious higher than other algorithms, which showed that DWT_SVM model are very powerful and vigorous. On that basis, the influence of the dataset size is further investigated. The results demonstrate that algorithm needs more training data to make its prediction mechanism work properly. With the training data decreasing, the small datasets not only lose a host of sequence information but also make its prediction mechanism work improperly.(2) A new method that couples discrete wavelet transform with support vector machine was proposed to predict homo-oligomers and hetero-oligomers protein based on the physical and chemical properties of amino acids. The sequences of homo-oligomer and hetero-oligomer protein are decomposed to multilevel through the approximation coefficients and detail coefficients of the DWT. This is performed by mapping a one-dimensional AA sequence in the space domain into a two-dimensional space-frequency representation of the AA sequence. Finally, different classifier algorithms are applied to class and predict the homo-oligomers and hetero-oligomers protein. In order to investigate the impact of sequence identity on estimation of the classification accuracy, the Xiao's and Chou's dataset were used. The results indicate our model not only enhances significantly the accuracy of prediction for the datasets with high sequence identify, but also possesses obvious and effective character in the aspect of resistant sequences identify.(3) A new method based on the combining of discrete wavelet transforms and decision tree (DT) is introduced to predict protein quaternary structure and substructure. In this paper, the hydrophobic and polarity of amino acid are mainly investigated. The results by using hydrophobic values were significant superior to polarity. The analysis shows that using hydrophobic values may be more significant to affect protein structure prediction than by using polarity. Because classification methods are sensitive to over-fitting so it was important to measure the significance of the obtained ROC and precision-recall (PR) graphs. The analysis of PR graphs shows our model is efficient to overcome serious over-fitting. Moreover, the web server provides user friendly input and output interfaces. Users can download the predicted results to further analysis, and web site can also be proposed http://bioinfo.ncu.edu.cn/Services.aspx.(4) In this work, a new model WSM-Plam, fusing weight of amino acid composition (WAAC), auto-correlation functions of amino acids (ACF) and accessible surface area of amino acids (ASA), have been developed to predict the palmitoylation sites. In compared with single method of feature extraction, fused method (WSM-Plam model) can grasp more feature information of palmitoylation of cysteine and can further extract effectively classified information. Moreover, the WSM-Plam model has the following features:simple calculation, higher classification precision and strong capabilities in aspect of self-adaption, generalization and application. The online service is available at http://bioinfo.ncu.edu.cn/services-ptm.aspx. All the above algorithms have complete processing programs and provided online service. They can be uses easily and download data.This study is supported by the National Natural Science Foundation of China and Natural Foundation of Jiangxi Province.
Keywords/Search Tags:discrete wavelet transforms, protein quaternary structure, palmitoylation sites, classification machines, fusing multiple feature extraction approaches
PDF Full Text Request
Related items