Font Size: a A A

The Construction Of Disease Protein-ligand Database And Theprediction Of Drug-target Interactions

Posted on:2019-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:M C ZhuFull Text:PDF
GTID:2334330545961708Subject:Biology
Abstract/Summary:PDF Full Text Request
With the development of molecular pathology,people have a more in-depth understanding of the pathogenesis of disease and the molecular mechanism of drug effectiveness.With the development of structural biology,more and more protein crystals are resolved,and the three-dimensional structure of interaction between ligand molecules and their targets is revealed.Disease-related protein-ligand complexes crystal structures can help even deepen our understanding of the mechanisms of disease pathogenesis.The crystal structure data of drug-target complexes are of great significance to the discovery of drug targets,drug design and drug relocation.With the continuous accumulation of biological data,drug design and drug reposition based on the interaction between drug and its target are paid more and more attention,and the interaction between drug target and ligand molecule is very important data.Drug-target interaction data are often used to identify data sources in drug-target interaction research based on computational methods,and are also the basis of research in this field.Protein-ligand interaction is almost the basis of the biological process,and plays an important role in cell activity.In the past few decades,with the development of life science and technology,a lot of data have been accumulated in the field of protein-ligand interaction.Based on these data,we are dedicated to the collection of disease-related proteins and their ligand complex crystal structure data,using computational methods to study drug molecules and their target protein interaction identification.The main contents of the text are summarized as follows:1.The construction of disease protein-ligand structure database.In order to provide a complete resource for exploring the protein-ligand interaction of disease with crystal structure information.We first collected data from five sources,namely,PDB,UniProt,DrugBank,PDBbind,binding MOAD.A total of 8,833 protein-ligand complexes(including 1010 proteins and 4,508 ligand molecules)were collected.Then,we use the method of text mining to provide information annotation of four aspects of protein-ligand crystal structure information,ligand physical and chemical information,drug information,protein disease information and protein-ligand interaction information in each structure of the database.On this basis,we constructed the disease protein-ligand structure database website dbHDPLS(http://www2.ahu.edu.cn/pchen/web/dbHDPLS/index.php),users can be very convenient to the data query,browse,Download and so on.Finally,in order to have a more in-depth understanding of our data,we have a brief statistical analysis of more than 8,000 data in the database,protein functional classification,compared with other protein-ligand interaction database,using Cytoscape software to do a simple network analysis.The DBHDPLS database is a comprehensive database of human disease protein-ligand complexes that will provide researchers with free,easily accessible data resources.2.The prediction of drug-target interaction.Drug-target protein recognition is a very important problem in drug discovery.Experimental methods are inefficient and costly in drug discovery.However,in drug research and development,the use of calculation can greatly accelerate the speed of research and development,reduce production costs.Therefore,we propose a model for predicting drug-target interaction based on stochastic forest method.First,we constructed a standard dataset with enzymes,ion channels,G-protein-coupled receptors and nucleotide receptors in four categories by looking at the relevant literature.The percentage of positive and negative samples for a dataset in each category is 1:2.Then,we propose a coding method for calculating the target protein of amino acid composition,and characterize the target protein sequence with 544 kinds of physicochemical properties of amino acids provided by AAINDEX1 database.Use Software "Padel-descriptor" to generate 1444 kinds of 1D and 2D properties to characterize drug molecules.The drug-protein target is composed of the target protein characteristic vector and the drug molecular eigenvector.Finally,by correlation coefficient,the physicochemical properties of amino acids and the 1D and 2D properties of the drug molecule were processed respectively to extract the most representative characteristics of the interaction between the drug and its target.The random forest algorithm in the Weka algorithm packet is called to construct the drug-target interaction recognition classifier.By optimizing the parameters of classifier,a model of predicting drug-target interaction is obtained.The result shows that our prediction model has good effect on the three kinds of data,such as enzyme,ion channel and G protein coupling receptor,however,the prediction performance of nucleotide receptor is not ideal.
Keywords/Search Tags:Protein-ligand complexes structure, Database, Disease, Drug-Target interaction, Random forest
PDF Full Text Request
Related items