Font Size: a A A

Prediction Of Glycosylation Sites In Protein Based On Principal Component Analysis And Neural Network

Posted on:2011-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2120330332481636Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics has become one of the important frontiers in life science and natural science, as well it is also one of the core fields of natural science in 21st century along with the continuous development of computer science and biology science. Whose research are emphasised mainly on these two aspects of genomics and proteomics. Protein glycosylation, is the important process of common post-translation modifications of protein and important content in the research of proteomics. Because of the limited known glycoprotein structure and new emerging glycoprotein structure, it is great significant to predict and analyze on glycosylation sites in proteomics by use of computational intelligence technology.Principal component analysis (PCA) is an extraction technique of data feature, which is that the data is reduced from the original high-dimension to low dimension, and the key information was saved after doing that, this analysis makes the data easier to handle, as a result of improvement in the analysis efficiency. Conventional neural network method can be applied to predict glycosylation site in protein.The prediction accuracy mainly depends on the dimension of feature vector(the length of encode protein sequence). But with the increasing of window size, the structure of neural network becomes more complex definitely and it is time-consuming. In this research, a new method is proposed based on principal component analysis(PCA) and neural network for pattern analysis and prediction O-linked glycosylation site in different windows size to solve this question. Firstly, PCA is applied to extract feature and reduce dimension, Then, the neural network is used to predict whether a particular site of serine or threonine is glycosylated.This research is carried out around the following aspects:(1) Firstly the knowledge of protein glycosylation is elaborated, focusing on the sparse coding, and which is used for protein coding sequence in the study.(2) Using principal component analysis for the completion of data preprocessing, the sample data were dropped from a high-dimensional to a low-dimensional based on fully use of original information, so that the neural network prediction in next step with a more efficient.(3) A novel algorithm for predicting Protein O-glycosylation sites is proposed based on PCA and BP neural network, and this algorithm detailed analysis and design is carried on. In order to better verify thevalidity of the algorithm, this research compares with the experimental results of traditional BP algorithm.The experimental results shown that:(1) Convergence rate of network is accelerated significantly and enhanse computation time is reduced. (2) The prediction accuracy of protein glycosylation sites are increased obviously, which indicates the BP neural network improved with principal component analysis has a very great advantage in the prediction of protein glycosylation sites.
Keywords/Search Tags:Principal component analysis, BP neural network, O-linked glycosylation, Protein coding
PDF Full Text Request
Related items