Font Size: a A A

Prediction Of Hot Spots And Feature Analysis Of Hot Regions At Protein-DNA Binding Interfaces

Posted on:2022-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:L S YaoFull Text:PDF
GTID:2480306542467364Subject:Biology
Abstract/Summary:PDF Full Text Request
The interactions between protein and DNA play a critical role.The protein-DNA interaction can be changed by a small part of interface residues,namely hot spot residues.Hot spot residues could affect most of the binding free energy in the proteinDNA interactions,are hugely important for investigating the underlying molecular mechanism and the stability of protein-DNA interactions during biological processes.There are a few tools available for identifying the hot spot residues in the protein-DNA complexes based on three-dimensional protein structures.However,it is well known that the three-dimensional structures are unavailable for most proteins.Considering this limitation,there is a need to develop sequence-based method to predict.Meanwhile,recent researches show that hot spot residues are not randomly distributed at the proteinprotein interfaces,but clustered in region.These hot spot residue clusters at the interfaces are called hot regions.To data,there is no research on the statistics and the analysis of the hot regions in the protein-DNA complexes.In view of the above shortcomings we carried out two important researches in this paper to overcome the limitations:1.We proposed a method called SPDH for predicting hot spot residues at proteinDNA binding interfaces based on protein sequences.Firstly,we obtained 214 samples in 64 complexes(88 hot spot and 126 non-hot spot)from a database of hot spot dbAMEPNI databases and data set of SAMPDI method,and treated them for the further analysis.Secondly,we obtained 133 features from physicochemical property,conservation,predicted solvent accessible surface area and two-dimensional protein structure.Then,we screened these features based on three feature selection method obtain the optimal feature subset and compared the models using four classical machine learning algorithms on the training dataset.Finally,we used 17-dimensional features obtained by the forward feature selection method,and combined with SVM to build the prediction model.On the test set,our method achieved F1 = 0.700,MCC = 0.458,ACC= 0.719,AUC = 0.760,and it is better than other prediction method based on threedimensional protein structure.Through a model ablation method SPDH,we found that the variability of physicochemical property features between wild and mutative types was the most important on improving the performance of the prediction model.2.We analyzed the feature bias of hot spot residues and hot regions at the proteinDNA binding interfaces.First,we collected a total of 5,035 protein-DNA complexes from PDB database,which contained 12,663 protein chains and 2,460,707 residues.Next,we used the tool Inp PDH(http://bioinfo.ahu.edu.cn/inp PDH/)to identify the hot spots in these complexes,and obtained 141,318 hot spot residues and 224,905 non-hot spot residues.We defined the hot spots with a distance < 6.5 (?) between C? atoms as in one contact space.Among more than two hot spots,when there is one contact space between any two hot spots,we defined this hot spot set as a hot region.A total of 16,404 hot regions were found in these protein-DNA complexes.For the hot spots,we found that there were a large amount of arginine and lysine at the protein-DNA interaction interface,among which arginine was more likely to be hot spot residues,while lysine was more likely to be non-hot spot residues.Furthermore,threonine,isoleucine and serine tended to be hot spot residues,while non-hot spot residues with more acidic amino acids,such as aspartic and glutamic.For hot regions,the amino acid distribution was similar to that of hot spot residues.The difference was that glycine tended to form hot regions.Maybe due to the short chain and small steric hindrance,glycine could bind with various types of residues to form hot region.The ratio of polar residues in hot spot regions was significantly lower than that of hot spot residues and interface residues.Moreover,we found that singlet hot spot residues were more conservative than the hot spots in hot regions.Through the above research,this paper proposes a method based on sequence information for predicting hot spot at the protein-DNA interface.This method can effectively predict the hot spot residues on the protein-DNA complex whose structure has not been solved,and the prediction effect is good.We also further analyzed the feature bias of hot region and constructed a database.This article aims to provide relevant researchers with understanding of the protein-DNA interface interaction mechanism and the development of some drugs.
Keywords/Search Tags:hot spot residues, hot region, protein-DNA interactions, protein sequences, machine learning
PDF Full Text Request
Related items