Font Size: a A A

Research On LncRNA Disease Association Prediction Based On Heterogeneous Graph Neural Network

Posted on:2022-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2480306785959849Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(lncRNA) is a non-coding RNA with a length of more than200 nt(nucleotides).More and more studies have shown that lncRNAs play crucial roles in many biological processes.Prediction of lncRNA-disease associations can help biologists understand the molecular mechanisms of human diseases,which is beneficial for disease diagnosis,treatment and prevention.Due to the expensive and timeconsuming methods validated by biological experiments,computational-based methods have gradually become a meaningful way to predict disease-related lncRNAs.Existing lncRNA-disease association prediction methods fall into two categories:bioinformatics-based and machine learning-based methods.Among these methods,most models ignore the heterogeneity of biological data.The information is not fully utilized,so they cannot solve the isolated lncRNA and disease problems well,and the models have low interpretability.In response to the above problems,we propose a lncRNA-disease association prediction model based on a heterogeneous graph neural network called HGNNLDA.Through research on the theory,mechanism and implementation of lncRNA-disease association prediction task modeling,we propose a heterogeneous graph construction mechanism for multi-source biological data fusion and a lncRNA-disease association prediction model based on a heterogeneous graph neural network.For the problem of insufficient utilization of heterogeneous information,based on the research and collection of lncRNA and disease-related data,a lncRNA-disease network was constructed through the multi-association data between known lncRNAs,diseases,genes and mi RNAs.The full utilization of lncRNA-disease multi-association information can solve the problem of isolated lncRNAs and diseases to a certain extent.At the same time,each node in the heterogeneous graph carries multiple modal information.We use the k-mer algorithm and the Word2 vec model to extract the multimodal node features and efficiently combine multiple data sources by introducing various types of linked data.For the link prediction of heterogeneous information fusion and the problem of model interpretability,HGNNLDA firstly samples neighbor nodes based on the RWR node sampling technology,a heterogeneous graph neural network architecture consisting of three neural network modules is constructed according to the sampled node types and numbers,which are respectively used to encode different node contents,aggregate nodes of the same type,and generate final node embeddings.Bi-LSTM is used as the node feature aggregator of each module to capture deep feature interaction and achieve efficient fusion of heterogeneous data.In the process of embedding generation,an attention mechanism is introduced to learn the influence of different types of neighbors on node embeddings to improve the interpretability of the model.Finally,an average pooling layer is used on all hidden states to obtain the content embedding of nodes,and a binary logistic classification function is used to judge the association prediction results between two nodes.Finally,we evaluate the model performance through comparative experiments and specific case analyses.The cross-validation results show that the HGNNLDA model achieves an AUC of 0.8971,which is better than the HGLDA model's 0.8324 and the DMHLDA model's 0.7640.In the case studies of gastric cancer and breast cancer,17 and 16 of the top 20 disease-related lncRNAs predicted by HGNNLDA were confirmed by the literature,respectively.Both comparative experiments and case studies demonstrate the effectiveness of HGNNLDA.
Keywords/Search Tags:biological information, lncRNA-disease association prediction, heterogeneous graph neural network, deep learning
PDF Full Text Request
Related items