| More and more evidence indicates that long non-coding RNA(lncRNAs)are closely associated with a variety of diseases.Identifying meaningful lncRNA-disease associations will help us better understand the molecular mechanisms of these diseases.However,due to the long time and high labor cost,only a limited number of lncRNAs can be inferred to be associated with diseases using traditional biological experiments.Therefore,the development of computational methods to infer potential lncRNA-disease associations can not only reduce the time and cost of research,but also accelerate the diagnosis and treatment of diseases.At present,most lncRNA and disease association prediction methods only regard lncRNA and disease as a simple and direct binary relationship,but fail to explore the implicit and indirect high-order relationship between biological information fields,which results in the limited prediction ability of these prediction methods.In order to overcome this problem,many researchers have introduced other data sources,such as lncRNA and miRNA,miRNA and disease,to improve the prediction ability.However,the introduction of these heterogeneous data sources often leads to the existence of noise,thus limiting the improvement of prediction performance.Therefore,instead of introducing heterogeneous data sources,this thesis improves prediction performance by mining higher-order relationships.In this thesis,the high-order relationship between lncRNA and disease was modeled by high-order proximity and hypergraph,and then two prediction methods that could explore the high-order relationship were proposed to predict lncRNA-disease associations based on hypergraph double random walk.The content is as follows:(1)Matrix completion based on high order proximity(HOPMCLDA).In biology,lncRNAs associated with the same disease are directly or indirectly related,and complications of one disease may affect the prevalence of other diseases.Inspired by the above biological observations,HOPMCLDA was proposed in this thesis.Step 1: the high-order proximity network of lncRNAs and diseases was calculated based on lncRNA expression similarity network and disease semantic similarity network.Step 2: singular value decomposition(SVD)was used to reconstruct lncRNA and disease high-order proximity network,and the main feature vectors were extracted.Step 3: construct a heterogeneous lncRNA disease network integrating disease,disease-lncRNA and lncRNA networks.Step 4: Matrix completion algorithm was used to calculate the predicted score based on heterogeneous lncRNA and disease network.We compare our method with five other classical and advanced computational methods(GMCLDA,SIMCLDA,DSCMF,BRWLDA and RWRlncD).The results show that the AUC values of the proposed method are 0.8755 and 0.8353±0.0045 in LOOCV and 5-FOLD CV,respectively,which indicates that HOMCLDA has better predictive performance.We also used three case studies: Gastric cancer(GC),osteosarcoma and hepatocellular carcinoma(HCC)to demonstrate HOPMCLDA’s practical predictive power.(2)Based on hypergraph double Random Walk(HBRWRLDA).In biology,the same lncRNA plays different roles in different diseases,and the regulatory effects of different lncRNAs on the same disease may not be similar.Inspired by the above biological observations,we mined the higher-order relationship between lncRNA and disease through hypergraphs,and then proposed HBRWRLDA.Step 1: the known lncRNA expression similarity and disease semantic similarity were used to calculate the interaction probability matrix between lncRNA and disease.Step 2: lncRNA and disease hypergraphs were obtained based on the probability interaction matrix of lncRNA and disease.Step 3: according to the different topologies of lncRNA and disease hypergraphs,we used the double random walk algorithm to calculate the potential relationship between lncRNA and disease.In addition,in the test framework of LOOCV and 5-FOLD CV,compared with GMCLDA,LRWHNLDA,PMFILDA,DSCMF and BRWLDA,HBRWRLDA obtained AUC values of 0.8792 and0.8688±0.0037,respectively,showing better prediction performance.HBRWRLDA can effectively predict renal cell carcinoma(RCC),gastric cancer(GC)and osteosarcoma when case studies are used to verify the predictive ability of HBRWRLDA.In conclusion,the two prediction methods proposed in this thesis based on high-order relationship have better prediction ability compared with the other prediction methods,and can obtain good results in the actual prediction of lncRNA and disease association. |