Font Size: a A A

Prediction Of Protein Folding Shape Code Based On Artificial Neural Network And Profile

Posted on:2013-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:C YuFull Text:PDF
GTID:2230330371983830Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
To determine the protein structure is the key step of life science. The protein which havebiological ability must have a3D structure, furthermore, protein’s biological ability has greatrelationship with its3D structure. Currently, the methods which widely used to determineprotein structure is X-ray, NMR (Nuclear Magnetic Resonance) these methods are very timeconsuming and could not meet the needs of the development of modern science of biology. Tofulfill those demands, determine protein structure by theoretical method is widely concerned.To understand and descript a3D structure more precisely, biologists and biochemists havebeen carried out years of research. A more detailed description of the protein folding shape isvery important for further research.Protein Folding Shape Code (PFSC) is a symbolic definition of protein structure, whichdefines the details of protein structure between second structure and tertiary structure. In thisarticle, we build a two-stage neural network model based on PSI-BLAST profile to predictProtein Folding Shape Code. First of all, we use PSI-BLAST to generate the position specificscoring matrices (PSSM), the PSI-BLAST search using5iterations; then use the slidingwindow method to encode the matrix to get the input of the classifier, we found that thesliding window length of15amino acid residues, the predicted result will be optimal. Theinput layer of315nodes, divided into15groups, each group of21units. These21unitsrepresent20kinds of amino acids and a tag bit. The tag bit is used to record the amino acidresidues at the N-terminal or C-terminal of the protein chain. The hidden layer using doublehidden layer, the first layer of hidden layer contains10nodes, the second layer of hidden layercontaining five nodes. The output layer containing27nodes,27orthogonal unit vectorsrepresenting27kinds of protein folding shape. Sliding window of15amino acids, the middleof the amino acid corresponding to the PFSC coding for the corresponding output encoding.AS PFSC’s own characteristics, select the top3maximum value can effectively circumventthe structural similarity to nearly bring the prediction error.The data of the training and testing sets were all picked out from the CATH database.CATH classifies the protein sequences by domain,128unique protein folds were picked out for both testing and training.(No similar folds were presented in both the testing and trainingsets). Those folds were chosen by structural similarity criteria rather than similarity criteria ofsequence. After evaluated by the three-fold cross-validation, our model can reach the accuracyabout65%while considering the top3predicted PFSCs. Protein structure’s determination isgreatly depend on the second structure’s accuracy, and PFSC could provide a more accuratedescription than traditional second structure. However, in nowadays, the prediction ofsecondary structure is already mature and difficult to exceed, but the prediction of PFSC hasnot. been proposed before. Here we provide a new perspective based on PFSC firstly. Providea breakpoint and an open eye of protein3D structure’s prediction.
Keywords/Search Tags:Protein Folding Shape Code, Artificial Neural Network, Profile
PDF Full Text Request
Related items