Font Size: a A A

The Application Of The Protein Sequence Analysis Model In The Drediction Of Transmembrane Domains

Posted on:2016-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y S BaiFull Text:PDF
GTID:2310330488498830Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the rapid growth of data in protein sequence database, the analysis models of protein sequence become more and more important in bioinformatics. The graphical representation of protein sequences as a class of methods, which are visual and easy to use numerical characterizations to analysis of biological sequences, have attracted more and more attention. Based on three kinds of physicochemical properties of amino acids, a new 3D graphical representation method of protein sequences without degeneration was introduced, which is called as Gs graph in this paper. To illustrate the efficiency of our approach, the phylogenetic tree of nine ND5 proteins was constructed which is consistent with the ClustalW's result. And then, the phylogenetic trees of nine kinds of protein sequence of H7N9 avian influenza virus were constructed based on the method. At last, we analyze and infer the evolution path of H7N9 avian influenza virus.At present, due to the difficulty of obtaining membrane protein's crystalline structure from experiment, the prediction spatial structure of membrane protein is a very significant task to the study of membrane protein's function. In this thesis, applying principal component analysis method, we analyze the 554 physicochemical indexes of amino acids and introduce a method to transform every membrane protein sequence into a numerical matrix. And then, based on the singular value decomposition, each protein segment is transformed into a vector in space. Using artificial neural network and the support vector machine, we extract the numerical characteristics of the transmembrane region and obtain the probability of the transmembrane region. Finally, models are suggested to predict transmembrane region based on probability model, the hydrophobic property and isoelectric point index. Compared with Tmpred and ALOM methods, the results showed that our model can distinguish effectively the transmembrane domains of membrane protein sequence.
Keywords/Search Tags:graphical representation of protein sequences, physicochemical properties of amino acids, principal component analysis, artificial neural network, Support vector machine
PDF Full Text Request
Related items