Font Size: a A A

Research On Several Problems Of Protein Structure Prediction And Its Software Realization

Posted on:2010-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:F GuFull Text:PDF
GTID:1100360302478533Subject:Biological information
Abstract/Summary:PDF Full Text Request
Protein structure prediction is a hot spot in protein structure analysis. In this dissertation, protein structure was predicted based on its sequence since protein sequence was considered the basic and decisive factor in forming of protein structure. The main work and conclusion can be summarized as follows:(1)The extraction of characteristic information from protein sequenceIn order to predict protein structure more accurately, a suit of comprehensive and representive characteristic information was proposed from protein sequence. The characteristic information included three kinds of information: sequence statistical property, physiochemical property and sequence signal property. Also, the characteristic information is able to reflect the relationship of short-term, medium-term and long-term of protein sequence.(2)Protein structural class predictionProtein structural class prediction algorithm is a hot area in protein structure prediction. In order to overcome the defect of neglecting long-term relationship in traditional algorithms, the new characteristic information proposed in (1) was used in this dissertation.In the research of protein structural class, the direct average and neural network methods were used in training process, and the neural network, nearest neighbor, bayes and maximum information content methods were used in classification process. Among them, the nearest neighbor method was used to compare the result in our dissertation with the result from other papers. The neural network and the maximum information content method were used to improve the accuracy of structural class prediction.Dataset and evaluation system is one of the important factors in protein structural class prediction. In this paper, a new dataset was constructed from SCOP with low sequence complexity. This dataset is comprehensive, effective and reliable. Several kinds of indices were used to assess the result of prediction.By use of the characteristic information and method proposed in the dissertation, the accuracy for protein structural class prediction is 74. 3% using jack-knife test. This result is 2%—20% higher than traditional methods.(3)Protein secondary structure predictionIn this dissertation, the sequence alignment information was not included in protein structure prediction. Thus, the only information considered is from protein sequence. This may lead to an objective measurement of characteristic information and algorithm.The key problem in protein secondary structure prediction is the location of secondary structure fragment and the prediction of fragment type. In this dissertation, the local property of protein hydrophobicity was used to position secondary structure fragment. The characteristic information of secondary structure fragment was trained to calculat the representive eigenvector of three kinds of secondary structure (helix, strand and coil), and the eigenvector was used as the parameter in prediction process. The wavelet transformation method was used to position the secondary structure fragment, and the direct average method was used to determine the fragment type. The protein secondary structure prediction model used in this dissertation was novel and biologically significant.The CB396 dataset was used in protein secondary structure prediction. This dataset is low sequence similarity and low sequence complexity, and many methods were tested based on it. This made it possible to compare the result proposed in the dissertation with others. The result showed that the accuracy for protein secondary structure prediction is 70.21% for Q3 and 67. 14% for SOV. The result is comparable to current popular methods.Finally, the protein structural class information was used to improve the propensity factor of protein secondary structure.(4)Software development of protein structure predictionThe software was developed based on the characteristic information, predicted methods and models of protein structural class prediction and protein secondary structure prediction proposed in this dissertation. Multi-parameter and a number of methods are the advantage of this software. It also allows user to customize the characteristic information and machine learning methods which is a very convenient and friendly function. Under the default condition, the software will use the parameters and methods proposed in this dissertation. Thus, users can use the result of the dissertation directly.
Keywords/Search Tags:protein structure prediction, characteristic information, structural class, secondary structure, information content, wavelet transformation, dataset, cross-validation
PDF Full Text Request
Related items