Font Size: a A A

Research On Robust Matrix Factorization Method And Its Application In Disease-related Data

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z CuiFull Text:PDF
GTID:2430330605960017Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,more and more complicated diseases,such as cancer,diabetes,cardiovascular and cerebrovascular diseases,have become diseases with extremely high mortality.Moreover,according to research by biologists and medical scientists,these complex diseases are often associated with multiple biological molecules,such as Drug-Target Interaction?DTI?,Drug-Disease Interaction(Drug-Disease Interaction?DDI?,miRNA-Disease Association?MDA?,and lncRNA-Disease Association?LDA?.These disease-related data contain key biological information rules for solving complex diseases.However,in the existing data,there is a lot of noise,which will interfere with us in mining potential disease-related information.In bioinformatics,matrix factorization model is a widely used prediction model,such as Graph Regularized Matrix Factorization?GRMF?and Collaborative Matrix Factorization?CMF?.However,in the disease-related data,the traditional matrix factorization model has some disadvantages:the noise value in the disease-related dataset will interfere with the accuracy of the algorithm;the square term of the error value will increase the sensitivity of the algorithm to outliers and reduce the prediction accuracy of the algorithm;the traditional algorithm only uses a single disease semantic similarity and ignores the network similarity between them;the internal geometry between the data is not taken into account.Therefore,in response to these problems,based on the traditional GRMF and CMF algorithms,corresponding improvements have been made respectively.Compared with some other advanced algorithms that currently exist,the improved algorithm model has higher prediction accuracy.For different disease-related data,the research mainly includes the following four aspects:?1?For the drug-target interaction dataset,a sparse graph regularized matrix factorization method(L2,1-GRMF)is proposed.Considering that predicting DTI is time consuming and expensive,it is important to improve the accuracy of the calculation method.There are many algorithms that can predict global interactions,some of which use drug-target networks to make predictions.Since the dataset is usually located on a low-dimensional non-linear manifold,the L2,1-norm is introduced in the GRMF method to generate row sparseness of the matrix to learn these manifold structures.By experimenting on different DTI datasets,the L2,1-GRMF method is superior to other methods in most cases.?2?For the miRNA-disease association dataset,a robust collaborative matrix factorization?RCMF?method is proposed.It is time-consuming and expensive to predict potential MDA,so it is urgent to improve the accuracy of prediction results.Therefore,it is important to develop a new computational model to predict new MDAs.Although some existing methods can effectively predict potential MDA,there are still some shortcomings.Especially when dealing with the disease matrix,its sparseness is an important factor affecting the final results.The L2,1-norm is introduced into CMF to achieve the sparsity of the algorithm,which proves that the algorithm is robust and obtains a higher AUC value than other advanced methods.?3?For the drug-disease interaction dataset,a dual network sparse cooperative matrix factorization method(DN L2,1-CMF)is proposed.The development of a new drug is extremely difficult and takes a lot of time and money.At present,the commonly used method is to predict unknown DDI based on known DDI.Therefore,an effective data mining method becomes critical.Gaussian Interaction Profile?GIP?kernel functions are used to calculate the drug network similarity and the disease network similarity.Then,the lncRNA network similarity matrix is combined with the lnc RNA expression similarity matrix,and the disease network similarity matrix is combined with the disease expression similarity matrix.Finally,in order to increase the sparsity of the disease matrix,a L2,1-norm constraint is introduced on the disease submatrix.Experimental results show that the proposed method has better prediction performance and can effectively predict potential DTI.?4?For the lncRNA-disease association dataset,a weighted graph regularized collaborative matrix factorization method?WGRCMF?is proposed.With the development of biology and medicine,more and more studies show that lnc RNA is related to diseases,so it is important to predict some novel LDAs.More importantly,some potential LDAs are beneficial for the treatment and prevention of diseases.Considering that manifold learning can recover low-dimensional manifold structures from high-dimensional sampled data and low-dimensional manifolds can be found in high-dimensional spaces,graph regularization constraints are introduced into CMF.In addition,a weight matrix is also introduced into the method,and its importance is to prevent unknown associations from contributing to the final prediction matrix.Finally,the prediction accuracy of this method is better than other advanced methods.
Keywords/Search Tags:Disease association prediction, Graph regularized matrix factorization, Collaborative matrix factorization, Graph regularization, L2,1-norm constraint
PDF Full Text Request
Related items