Font Size: a A A

Prediction Of Protein Residue Contact And Its Application To Protein Structure Modeling

Posted on:2019-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YangFull Text:PDF
GTID:1360330590470369Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In post-genome era,proteomics is becoming one of the most important research areas.Proteins act essential functional roles in living organisms,and knowing their three-dimensional?3D?structures are valuable for analyzing their functions,which also help to better understand the essence of the life.Protein structures are crucial for drug design and protein design;however,there are relative few protein structures so far when compared with the number of available protein sequences.Moreover,the gap between the number of structures and the number of sequences becomes larger and larger.Fortunately,it is possible to develop powerful algorithms to quickly and accurately predict protein structures based on the primary sequences with biological big data and advanced technologies of machine learning and data mining.Since protein structure prediction has the advantages of fast speed and low cost,it has become a supplementary method of solving protein structures via X-ray or NMR.Spatial restraints are very important for protein 3D structure prediction,such as,angular restraints and distance restraints,which can reduce the entropy of unfolded states and improve structure prediction.Residue-residue contacts from protein residue network,especially for long-range residue contacts,can assist structure modeling algorithms to generate high-quality models.In fact,disulfide bond is a special type of residue contact,and it has been demonstrated to be crucial for protein folding.In recent years,many residue contact prediction methods have been proposed based on the theory of pattern recognition and other related technologies.The prediction results have been successfully transferred to spatial distance restraints and applied to protein structure modeling.Nevertheless,it is difficult for researchers to tackle those proteins with few homologous sequences due to inaccurate features.Currently,most of algorithms were developed for soluble proteins.But for transmembrane proteins,inter-helix residue contacts have been received little attention.The reason could be that the limited number of membrane protein structures hinders the progress of developing high-quality contact prediction model for membrane proteins.In this study,we focused on residue contact prediction based on the primary sequence,including contact prediction for both soluble and membrane proteins and also disulfide connectivity prediction.In addition,we applied spatial distance restraints to protein structure prediction to see the usefulness of this kind of restraints.The main contents and creative points of this work are listed below:1.Proposing a residue contact predictor called R2C for soluble proteins.It uses a dynamic fusion strategy,which takes full advantage of machine learning?ML?-based methods and correlated mutation analysis?CMA?-based approaches.For different targets,it assigns different weights for fusion,and thus improves the prediction accuracy for all contact ranges?short-,medium-and long-range?.As we know,CMA-based approaches can remove false positives caused by transitive effects,however,Gaussian noise can still be observed from the predictions of CMA-based approaches.Therefore,we used noise filter to further remove the Gaussian noise in the original outputs to improve the prediction performance of long-range contacts.2.Building an inter-helix residue contact predictor called MemBrain based on convolutional neural network.For membrane proteins,inter-helix contacts are more important because it can guide helix packing.Previous methods just used inter-helix residue pairs to train ML model,however in this work;we used all residue pairs with sequence separation no less than 6.On one hand,more training samples are available for model training.On the other hand,MemBrain is capable of predicting contact map for the entire sequence.Since residue contacts are densely distributed in native structures,MemBrain uses two-stage architecture,where the first stage is used to generate the contact potential of all residue pairs,which will be fed into convolutional neural network together with predictions from three CMA-based approaches in the second stage.This framework can mine latent structural features,which exist in original feature space.Therefore,it can improve the prediction accuracy visibly.3.Developing the prediction model Cyscon to predict disulfide connectivity patterns.Since the number of possible disulfide connectivity patterns increases exponentially with the number of disulfide bonds.Thus,the entire pattern prediction accuracy will be very low for sequences with more than 5 disulfide bonds.To solve this tough problem,the proposed Cyscon introduces an idea of order reduction by first finding the most confident disulfide bonds through sequence alignment,and then the problem is reduced to finding the correct combination among the remaining bonds of the protein sequence.Under this framework,Cyscon can process sequences with more disulfide bonds?typically more than 5?.Meantime,the prediction accuracy can also be improved.In this work,we also systematically validated the usefulness of the predicted disulfide bonds for protein 3D structure modeling.4.Designing the algorithm ExSSO to aid NMR-based structure determination of symmetric transmembrane oligomers.Different from the predicted residue contacts,the NMR-derived NOE restraints are very accurate but have two-fold directional ambiguity for oligomers higher than dimer.Given the protomer structure and the number of protomers in oligomer,ExSSO can find the proper structures of symmetric oligomer guided by ambiguous inter-protomer NOEs.It is an exhaustive and fast conformational space search algorithm with complexity and running time unaffected by the amount or form of restraints.By uniformly sampling three Euler angles,we can ensure near complete search of the orientation of the protomer.Finally,the oligomer structures are selected by a clustering algorithm.In this way,it can effectively avoid solving the direction of each NOE restraint.
Keywords/Search Tags:Machine Learning, Convolutional Neural Network, Protein Residue Network, Disulfide Connectivity Pattern, Protein Structure Modeling, Exhaustive Constrained Search
PDF Full Text Request
Related items