Font Size: a A A

An RDF Model Of Gene Ontology And Its Associations

Posted on:2009-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q W XuFull Text:PDF
GTID:1100360275471028Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Gene Ontology (GO, http://www.geneontology.org) is by far the most widely used bio-ontology. As of August 2007, it contains approximately 23,700 terms, linked to a database of more than 16 million annotations of genes and gene products, originating from about 20 organisms. As a Semantic Web application domain, Gene Ontology Consortium provides a RDF-XML data file (http://archive.geneontology. org/latest-full/go_ 200708-assocdb.rdf-xml.gz). It is an export of the database, containing both the GO vocabulary and associations between GO terms and gene products. However, this file has drawbacks, making it unsuitable for providing complex semantic query and inference services.The first drawback is the lack of relationships between concepts among different GO subontologies, limiting the power of inference based on them. The second drawback is that the RDF-XML data file is organized with a term-centric view of GO annotation data. The third drawback is the lack of support for GOSlim.In this paper, we present a RDF model GORouter, which mainly demonstrates how to use multiple semantic web tools and techniques to integrate heterogeneous resources and to provide a mixture of semantic query and inference solutions of GO and its associations. Most of the original files come from the Gene Ontology Consortium. We encoded these heterogeneous resources in uniform RDF format, and created a set of RDF datasets. Each dataset consists of two RDF files, metadata and data. The metadata RDF files are encoded with RSS1.0. Each metadata RDF file has a data RDF files associated with it. We assign only one unique LSID to each URL of data RDF files.By introducing GLUE system, we create ontology mappings between pairs of terms coming from the three independent GO sub-ontologies. To improve the match accuracy, the GLUE system uses a Relaxation Labeler, which searches for the match configuration that best satisfies the given domain constraints and heuristic knowledge.We use the Oracle Network Data Model (NDM) as the native RDF data repository and the table function SDO_RDF_MATCH to seamlessly combine the result of RDF queries with traditional relational data. As a result, the scale of GORouter is minimized; information not directly involved in semantic inference is put into relational tables. We believe that this is an effective way to partly overcome the bottleneck of conventional semantic web applications.GORouter is licensed under Apache License Version 2.0, and is accessible via the website: http://www.scbit.org/gorouter/.
Keywords/Search Tags:Semantic Web, Gene Ontology, GLUE System, Oracle NDM, Ontology Mapping
PDF Full Text Request
Related items