Font Size: a A A

Robust Collaborative Matrix Factorization Method And Prediction On RNA-disease Association Data

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:M M GaoFull Text:PDF
GTID:2514306323486324Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,through the statistical analysis of large amounts of data,some experts andscholars have discovered that the pathogenesis of diseases such as cancer,Alzheimer’s disease and diabetes is related to lnc RNA and mi RNA.Therefore,the development of effective methods for predicting lnc RNA-disease association(LDA)or mi RNA-disease association(MDA)has great benefits for the prevention,diagnosis and treatment of some complex diseases.In the current research,collaborative matrix factorization(CMF)and graph regularized matrix factorization(GRMF)are widely used in the prediction of RNA-disease association.Although these methods have good predictive performance,they have shortcomings.Firstly,the traditional CMF method ignores the noise value in the similarity matrix,which leads to poor prediction performance and insufficient robustness of the method;secondly,the use of two single similarity matrices cannot fully mine the potential information in the dataset,and the traditional CMF method ignores the internal geometric structure of the data space,which lead to low accuracy of the algorithm.Based on this,this paper improves the collaborative matrix factorization method and applies them to LDA and MDA datasets to verify the performance of the method.The specific improvements are as follows:(1)Aiming at the problem that the traditional collaborative matrix factorization method issusceptible to noise,the dual sparse collaborative matrix factorization(DSCMF)method is proposed.By introducing the L2,1norm into the traditional CMF method,the row sparse matrix can be generated,which can eliminate redundant data and reduce the influence of noise value in the data,thereby enhancing the robustness of the algorithm.At the same time,the Gaussian interaction profile(GIP)kernel is added to calculate lnc RNA network similarity matrix and disease network similarity matrix,thereby increasing the network topology and mining more potential information.Finally,the method is used for LDA prediction,and many new associations are successfully predicted.(2)Aiming at the problem that traditional collaboration matrix using a single similarity matrixto conduct experiments,the multi-label fusion collaborative matrix factorization(MLFCMF)method is proposed.Firstiy,multiple labels are used to optimize the lnc RNA space and disease space to reduce the influence of noise in the original single similarity matrix.Secondly,a non-linear fusion method is used to process multiple labels.The processing process includes normalization,iterative integration,and adding a weight matrix,which avoids the introduction of noise in the fusion process.At the same time,more comprehensive information can be obtained by weighing the effects of different labels,which can effectively prevent the loss of label information,eliminate noise inside the label,and ultimately enhance the robustness of the algorithm.Finally,the method is used for LDA prediction,and the experimental results show that the method has good prediction performance.(3)Aiming at the problem that the traditional collaborative matrix ignores the original internalgeometric structure of the data space,the dual network sparse graph regularized matrix factorization(DNSGRMF)method is proposed.Firstly,the graph regularization item is introduced to make it fully consider the manifold structure of the original data,and fully learn the geometric information inside the original data.Besides,the L2,1norm is introduced to eliminate some unattached disease pairs,thereby generating row sparsity constraints and enhancing the robustness of the algorithm.In addition,the GIP kernel is added to calculate the mi RNA network similarity matrix and the disease network similarity matrix,which is conducive to mining more potential information from the original data.Finally,this method is applied to MDA predictions,and the results show that the method has high prediction performance.The experimental results show that the proposed methods can effectively reduce the noise of the original data.These methods have good robustness and high prediction accuracy.
Keywords/Search Tags:Collaborative matrix factorization, RNA-disease association prediction, L2,1norm, Gaussian interaction profile kernel, Multi-label learning
PDF Full Text Request
Related items