Font Size: a A A

Research Of Disease Gene Prediction Based On Ontology And Gene Network

Posted on:2017-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:K BaiFull Text:PDF
GTID:2310330503486895Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Although the human genome project has been accomplished and has achieved great success, and new methods that verify gene function with high-throughput have been applied, studying genetic problems that induce diseases is still one of the major challenges facing humanity. Using the biological experiment method to identify genes associated with disease requires a large number of human resources and capital support. However, the study of candidate association works well by using a set of know functional candidate genes, because new disease genes cluster with the set of known disease genes in the protein-protein interaction network in the most of cases. Many computational methods use this rule to calculate the candidate genes functional similarity with known disease genes, then ranking the candidate gene with the similarity scores. Guiding by this ranking list, biological experiment can use less expense to achieve large output.In this work, we construct a two-layer heterogeneous network with biomedical ontologies and other six databases. The network has 78,786 vertexes of biological conception, 105,875 directed edges from ontology terms, and 398,642 undirected edges from ontology terms to genes and genes to genes. In order to get less redundancy in our network, we use mapping tools such as super thesaurus to unify the different identifier in different databases. Since the edges have different evidence codes in most cases, we design a scoring method to combine the different evidences codes. Finally, we built a weighted two-layer heterogeneous network to predicted disease genes. Based on the edges' endpoints have different types, we manual classified them by seven types. Changing each edge type's weight can directly impact the finally ranking list. By using supervised random walk(SRW), each edge type can get a training weight, which can improve the gene prediction results. Then we modify the supervised random walk by laplacian normalization(LN-SRW), due to the long executing time of LN-SRW, a simple laplacian normalization supervised random walk(SLN-SRW) has been put forward. In the synthetic scale free network, LN-SRW, SLN-SRW out performance SRW with less absolute errors. Then we compare the original random walk with SRW and SLN-SRW in the heterogeneous network, the AUC(Area Under roc Curve) value for random walk is 76.9% and SRW AUC value increased by 0.8%, SLN-SRW AUC value increased by 2.3% compare with the original random walk algorithm.
Keywords/Search Tags:ontology, data fusion, disease gene prediction, random walk, supervised random walk, laplacian normalization
PDF Full Text Request
Related items