Font Size: a A A

Prioritization Of Candidate Disease Genes By Combining Topological Similarity With Semantic Similarity

Posted on:2016-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2404330473964834Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The emerging topic is exploring the relationship between diseases and their causing genes in current systematic biology.With the rapid accumulation of different types of genomic data,calculation methods for prioritizing disease genes were largely proposed.One remarkable advantage of these calculation methods is saving manpower and material resources.The modularization of prodict disease gene based on PPI network.As a result of incomplete date source in the PPI network,some interactions between the protein encoded by candidate gene and the protein encoded by disease gene are very weak.Thus,some candidate genes can't be well identified.Therefore,this paper increases some other data resource to identify the causative genes,accurately.In this paper,the gene prediction of disease gene was firstly carried out by using protein interaction network.The specific work is as follows: One method is RWRAHRSS and RWRMHRSS.The difference of this two methods is used the maximum value of the semantic similarity between candidate gene and disease genes or the average value of the semantic similarity between candidate gene and disease genes to set the initial probability vector of RWR.The specific process is as follows.The method used the semantic similarity between the candidate gene and the disease gene to set the initial probability vector of random walk with restart algorithm which is in the protein-protein interaction network.And ranking candidate genes based on the results of the final walk.In the corresponding parameters,the AUC values of RWRAHRSS and RWRMHRSS relative to the RWR and DP_LCC all increased.Further,in this paper,on the basis of the first kind of method,the PPI network is insteaded by heterogeneous network,and linear correlation is used to measure the topological similarity.Then,we propose another predict disease genes method.The specific work is as follows: the other method is AHRWR and MHRWRL.The method used the semantic similarity between the candidate gene and the disease gene to set the initial probability vector of random walk with restart algorithm which we used on the heterogeneous network.Then,this paper used liner correlation between disease diffusion profile and candidate diffusion profile to measure the topological similarity on the heterogeneous network.Finally,this paper combined the above results to predict disease genes.In the corresponding parameters,the AUC values of AHRWR and MHRWRL relative to the DP_LCC and RWRH all increased.The AUC value of AHRWR and MHRWRL relative to the the RWRAHRSS and RWRMHRSS all increased.New causing genes of multifactor diseases including Alzheimer' s disease,breast cancer and Diabetes mellitus are predicted by this paper.The good consistence between the top predictions and literature reports further illustrates our method.
Keywords/Search Tags:protein-protein interaction network, heterogeneous network, initial probability vector, random walk with restart algorithm, liner correlation
PDF Full Text Request
Related items