Font Size: a A A

Research And Design Of Gene Similarity Search Method Base On Heterogeneous Network

Posted on:2020-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:K M YangFull Text:PDF
GTID:2370330623456467Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of gene sequencing technology,the number of gene data grows rapidly.Under this circumstances,the efficiency of identifying similar genes by biological experiments is relatively low.Therefore,genetic similarity search algorithm based on heterogeneous network structure has become a hotspot in bioinformatics research.In gene-disease-phenotype heterogeneous network,exploring potential associational information of diseases and phenotypes respectively and quantifying link weights reasonably play a crucial role in exploring gene similarities based on pathSim algorithm.However,the two factors mentioned above are seldom taken into account in the current research on genetic similarity based on pathSim algorithm,which leads to the sparse link problem and makes the accuracy of genetic similarity calculation relatively low.To solve the problems mentioned above,an improved weighted metapath genetic similarity search algorithm gSim-Search is proposed.The research contents of this paper include:(1)In view of the problem that the potential associational information of diseases and phenotypes is not considered comprehensively in current research,this paper will study the self-correlation of diseases and phenotypes from two aspects: semantic association and topological association.For the calculation of semantic association of disease and phenotype,a method of semantic contribution graph is used to measure the semantic similarity of disease and phenotype in directed acyclic graph formed by disease and phenotype respectively by calculating the maximum semantic contribution of different disease or phenotype ancestor nodes to a specific disease or phenotype.For the topological associational calculation of disease and phenotype,the similarity between disease and disease,phenotype and phenotype is calculated by using Gauss nuclear similarity based on gene-disease network and disease-phenotype network respectively.Then,the semantic correlative matrix and the topological similarity matrix are fused to obtain the correlative matrices of diseases and phenotypes.(2)In view of the problem of sparse links in gene-disease-phenotype heterogeneous networks and unreasonable quantification of link association degree,this paper uses bipartite graph algorithm to explore the link association degree.Firstly,the fused semantic and topological associational networks of diseases and phenotypes are integrated into the gene-disease-phenotype heterogeneous network by constructing resource diffusion matrix.Secondly,based on the resource diffusion matrix,the bipartite graph method is used to realize the unequal diffusion of resources.In order to ensure that the original gene-disease,disease-phenotype topological correlative relationships are not destroyed,this paper quantifies the associational degree of potential links reasonably without weakening the existing link association degree.(3)Based on the works above,a weighted heterogeneous network with rich biological information is constructed,and the genetic similarity is calculated by using pathSim algorithm based on the weight of path instances.In order to verify the effectiveness of gSim-Search algorithm,direct neighbor method is chosen as the evaluation criterion.The comparative experiments show that the algorithm improves the accuracy of predicting the genetic similarity of breast cancer and obesity greatly.For example,when ranked in the top 20,the accuracy of genetic similarity of breast cancer is increased by 10%,and the accuracy of genetic similarity of obesity is increased by 20%.Moreover,on the accuracy of predicting virulence gene similarity,the accuracy of this algorithm are generally higher than those of other algorithms.This fully verifies the effectiveness of the algorithm.
Keywords/Search Tags:heterogeneous network, gene similarity search, bipartite graph, pathSim, gSim-Search
PDF Full Text Request
Related items