Font Size: a A A

Research Of Inter-Protein Residue Contacts Prediction Based On Deep Learning

Posted on:2022-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2530307154479344Subject:Engineering
Abstract/Summary:PDF Full Text Request
Proteins exert corresponding activities and biological functions to maintain the functional order of cell in life by interacting with other proteins.The protein interface consists of the spatial structure of interaction sites.Since the structure of complex determines their functions,it is necessary to determine and understand protein interface,in order to correctly understand molecular mechanisms and related biological processes at the structural level.It will give biological insights into the research of diseases and drugs.Traditional methods used experimental methods to determine interface.However,the difficulty and high cost of the experimental method have made the number of sequences whose structure has been resolved is much smaller than the number of sequences obtained by genome sequencing.It prompted the emergence of computational methods,which have gradually been developed to make up for the shortcomings of experimental methods.Recently,a breakthrough has been made in intra-protein residue contacts prediction.However,due to the limited number of the known protein structure and homologous sequences of complexes,the prediction of inter-protein residue contacts is still a challenge.In this study,we have developed a deep learning framework HDIContact for inferring inter-protein residue contacts from sequential information.First of all,we constructed concatenated Multiple Sequence Alignment(MSA)for complex based on the genome distance or species,for enriching the homologous sequences of the complex.Then,we used pre-train protein language model to produce MSA two-dimensional(2D)embeddings,which could reduce the influence of noise on MSA caused by mismatched sequences or less homology.Finally,for MSA 2D embeddings,we took advantage of multiple Bi-directional Long Short-Term Memory to capture 2D context of residue pair from two different directions of receptor and ligand,to directly predict residue contact map on the hetero-dimer interface.Our comprehensive assessment on the Escherichia coli test dataset showed that HDIContact outperformed other state-of-the-art methods,with top precisions of 65.96%,AUROC of 0.8308,AUPR of 0.2502.In addition,we also proved the potential of HDIContact for human-virus protein-protein complexes,by achieving top 5 precisions of 80% on O75475-P04584 related to Human Immunodeficiency Virus.Our method is a valuable technical tool for predicting inter-protein residue contacts,which will be helpful for understanding protein-protein interaction mechanisms.It will also further promote the understanding of related diseases pathogenesis,such as AIDS and pneumonia.At the same time,it can also give some insights into the design and development of drug target-protein for diseases.
Keywords/Search Tags:Inter-protein residue contact prediction, Hetero-dimer interfaces, MSA embeddings, Deep learning, Sequential information
PDF Full Text Request
Related items