Font Size: a A A

Study On Drug-Target Interaction Prediction Method Based On Cluster Analysis

Posted on:2021-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:D H YuFull Text:PDF
GTID:1361330614950801Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Drug discovery and design is a system engineering with high cost,long cycle,high risk,low success rate and inefficiency.Statistically,a new drug takes an average of10 ~ 15 years and costs average of 0.8 ~ 1.5 billion from its initial concept to market.Nevertheless,only about 10% of drugs are approved by food and drug administration(FDA)every year.Recognition of drug-target interaction(DTI)is a key step in drug discovery and design.It helps to insight into both complex biological interaction and key biological processes,accelerate new drug discovery,reduce research and development costs and improve human medical level.With the accumulation of drug and target-related data,the development of machine learning,data mining and network pharmacology,it is possible to predict the drug-target interaction by computational method.Therefore,this thesis studies deeply these issues of the drug-target interaction query and validation,the clusters number and cluster analysis and the drug-target interaction predicting.Using single source data only,the fusion method between super cluster and feature projection fuzzy classification is proposed to predict drug-target interaction based on cluster analysis of drug and target data.The contents are divided into four parts as follows:(1)Aiming at the issue of drug-target interaction query and validation,a efficient query and verification method(DTcheck)is proposed.Currently,it mainly relies on that researchers manually query and validate the huge drug-target interaction in each database one by one and they need to repeat the same work,which results to inefficiency,omission and error easily.The method proposed in this thesis makes full use of the characteristics of crawling data of crawler and replace the direct querying and validation manually,which improves the efficiency of querying and validation and reduces the probability of error.All unknown drug-target interaction in the four standard datasets of enzyme,ion channel(IC),G-protein-coupled receptor(GPCR),and nuclear receptor(NR)are queried and validated,and the new interaction is collected to amplify the drug-target interaction data of the four standard datasets.(2)Aiming at the issue of determining the suitable clusters number of drug and target data without the prior information of the relevant label,a density peak method based on weighted local density sequence and nearest neighbor assignment is proposed.The local density calculation method that the fixed k-nearest neighbors and the remainpoints make the differentiation contribution to the local density overcomes the defect of predefined parameter of density peak algorithm.The nearest neighbor assignment added into the original process reduces the error label propagation during the assignment process of density peak algorithm.Using the decision graph of the improved density peak algorithm,the clusters number of drug and target data of enzyme,IC,GPCR,NR datasets are successfully determined.(3)Aiming at the issue of the cluster analysis of drug and target data,an improved Kmedoids algorithm is proposed.The candidate center subset avoids inappropriate points to be candidate center points.The incrementally optimization center points ensure that the local optimal clustering result is jumped out as much as possible.The improved method inherits the robustness of the K-medoids algorithm.Based on the cluster number conclusion of DPCSA decision graph and compared with other clustering methods,the improved K-medoids algorithm significantly improves the isolated cluster phenomenon and provides more reasonable clustering results on enzyme,IC,GPCR,NR datasets.(4)Aiming at the issue of the drug-target interaction predicting,the fusion method between super cluster and feature projection fuzzy classification is proposed based on cluster analysis of drug and target data.Since the known drug-target interaction(positive example)is small and there is no strict negative example,the unknown drug-target interaction is regarded as negative example resulting in serious imbalance between positive and negative example.The "optimistic" super cluster prediction method,which is a fusion of super-target and super-drug,reduces the impact of sparse drug-target interaction data.This method increases the drug-target interaction and reduces sparsity based on cluster analysis result.The "pessimistic" feature projection fuzzy classification method that preserves negative constraints compromises the "optimistic" super cluster method and the negative effect regarding the unknown drug-target interaction as negative example.This method develops from matrix factorization and overcomes the dimensional constraint of drug and target in implicit feature decomposition,and retains all feature information.The introduction of fuzzy membership function ensures the fusion of super cluster and feature projection fuzzy classification in the same range.The experimental results on enzyme,IC,GPCR,NR datasets show that the proposed fusion method improves the prediction performance of drug-target interaction and more robustness compared with other methods.The difference between datasets has little impact on the fusion method,but other methods will produce significantly different predicting results.In addition,based on the result ofINCK cluster analysis and the data of drug-target interaction,the super cluster hypothesis is consistent with the cluster analysis result,which further enhances the reliability of the fusion method for predicting drug-target interaction.In conclusion,this thesis focuses on the drug-target interaction predicting and proposes a fusion method between super cluster and feature projection fuzzy classification based on single source data.Compared with other methods,experimental results on the standard datasets show that the fusion method improves the accuracy of drug-target interaction predicting and is more robustness.At the same time,this thesis uses the developed efficient query and validation method to amplify the drug-target interaction data which help to better evaluate other drug-target interaction predicting methods.
Keywords/Search Tags:Drug-target interaction, Drug repositing, Drug-target network, Feature projection classification, Super cluster prediction, Fusion method
PDF Full Text Request
Related items