Font Size: a A A

Prediction Of Drug-Target Interactions Based On Matrix Completion

Posted on:2021-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:W XuFull Text:PDF
GTID:2504306458477794Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the prediction of drug-disease target interaction relationship,due to the limitations of traditional methods,such as insufficient training sample size,less known 3D crystal structure of protein targets,and low computational efficiency,the effect is often unsatisfactory.As the Human Genome Project is completed,more data on the chemical structure or genome of the drug or disease can be obtained.This means that for predictive tasks,the data source is not only interactive data,but also a large amount of auxiliary information,so how to efficiently integrate the acquired auxiliary information is particularly important.Arbitrary integration of auxiliary information from multiple data sources may be counterproductive due to the introduction of a large amount of heterogeneous noise and bias,reducing the accuracy of DTIS predictions.There are several problems between the existing heterogeneous data:(1)the data information usually has noise,missing values,etc.,and information and information there is also a large amount of redundant information;(2)the overall size of the data is large,the dimension is high,it is difficult to express its characteristics efficiently,but the known data is often sparse.It can be seen that how to effectively integrate multiple similarity information from multiple data sources is worth studying and thinking here.This article focuses on the above issues in depth,the main work is summarized as follows:Firstly,a machine learning method based on multiple similarity selection is proposed to construct a drug-disease heterogeneity map containing information about known interactions,as well as similarities between drugs and diseases obtained from different data sources.Similarity between target proteins.First,for a plurality of similarity data,the information redundancy of each similarity data is measured by using the entropy value and the correlation coefficient,and a similarity data subset with less redundant information is selected.Secondly,the nonlinear similarity fusion method,SNF,is used to fuse different similarities between drugs and target proteins.Using the fusion similarity data and the graph feature information based on the heterogeneous graph,the machine learning classification method-random forest model is used to predict the unknown relationship.The experimental results show that the proposed improved method improves the prediction accuracy to some extent.Secondly,considering that multiple similarity data are concentrated in one similarity matrix,it is easy to cause information overload,which leads to the decrease of efficiency.Based on this,a matrix decomposition model based on multiple similarity is proposed.In the subset of best similarity data selected above,joint decomposition of multiple similarity matrices and interaction relationship matrices is performed,and the potential factor characteristics of each similarity are used to perform prediction tasks.The cross-validation experiments carried out show that the model is reasonable and effective in predicting the interaction relationship.
Keywords/Search Tags:Drug-Target Interaction, Multiple Similarity Fusion, Heterogeneous Map, Random Forest, Matrix Joint Factorization
PDF Full Text Request
Related items