Font Size: a A A

Study On Disease-related Genes Recognition Based On Heterogeneous Network

Posted on:2022-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:N R ZhangFull Text:PDF
GTID:2480306539498224Subject:Engineering
Abstract/Summary:PDF Full Text Request
For a long time,diseases have a great impact on our daily life,ranging from common colds and fever to cancers that are difficult to treat in modern medicine,such as breast cancer and pancreatic cancer.Therefore,how to identify disease-related genes has aroused the interest of many researchers,which is of great significance for the early diagnosis and late treatment of the disease.Traditional identification methods(such as genome-wide association research and linkage analysis)require a lot of time and money,so it is necessary to study new methods to identify disease-related genes.With the rapid development of modern science and technology,more and more biological data related to diseases and genes have been found.many computational methods have been proposed for disease-related gene recognition based on biological heterogeneous networks.however,how to make full use of multi-source information(such as disease-phenotypic association and protein-protein interaction)to improve the performance of disease-gene prediction is still an open question.In this paper,a new gene recognition method,which is based on biological heterogeneous network and uses fast network embedding to predict disease-related genes(PrGeFNE)is proposed.Specifically,five heterogeneous networks containing connections between different biological entities are constructed by using disease-gene,disease-phenotype,protein-protein and gene-GO associations.The above association data come from DisGeNet database,human phenotype ontology database and gene ontology database.Then a fast network embedding algorithm is used to extract the low-dimensional vector representation of nodes from heterogeneous networks.Then,we use the 5-fold cross validation method to test the performance of disease gene recognition methods based on LVR similarity in five heterogeneous networks.According to the experimental results,we find that the five heterogeneous networks that directly integrate the interactions of various biological entities have little difference in the recognition of disease-related genes,but the heterogeneous networks with the least information perform best,and the performance of the network with more information is not significantly improved,so then we use the low-dimensional vector representation of disease nodes and gene nodes combined with k NN algorithm to construct disease-disease network and gene-gene network respectively.And integrate with the disease-gene association network to reconstruct a new two-layer heterogeneous network,which can contain more concentrated and related information from different sources.Finally,we apply a random walk of a heterogeneous network with restart as a network propagation algorithm to the two-layer heterogeneous network to identify disease-related genes and get a list of genes related to each disease.We use AUROC,AUPRC,top-k Recall and top-k Precision as evaluation indicators to evaluate the ability of PrGeFNE to identify disease-related genes,and carry out experimental analysis through 5-fold cross validation and new association verification.In the verification experiment,PrGeFNE has obvious advantages over other methods.Comprehensive experimental analysis shows that different types of association data play an important role in improving the ability of disease-related gene recognition,and the excellent performance of PrGeFNE algorithm is proved by comparison with several classical algorithms.
Keywords/Search Tags:Disease-gene prediction, Multi-source biological data, Heterogeneous network, Network embedding, Network propagation
PDF Full Text Request
Related items