Font Size: a A A

Composing and conveying lineage metadata for environmental science research computing

Posted on:2005-02-01Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Bose, Rajendra KumarFull Text:PDF
GTID:1455390008981404Subject:Computer Science
Abstract/Summary:
Although the online propagation of environmental and Earth science data is increasing, the record of its origins and processing history---its lineage---is often absent, inadequate or irretrievable by potential users. This work is meant to address the following research question: How can organizations that perform data processing to support environmental and Earth science research attain the ability to compose (arrange in proper form) and convey (communicate to others) lineage metadata for the data products they create?; Lineage retrieval requires the capability to assemble a retrospective view of workflow using extant metadata. A review of lineage-related research from the past two decades provides a framework to clarify the architecture of previous prototypes, and direct the architectural design of new systems. Through experience with example workflows for calculating oceanic primary production, we explore workflow and metadata for script-based data processing. We investigate a prototype lineage server that introduces a level of indirection for the metadata objects presented in lineage graphs.; A significant problem facing potential data consumers is lineage retrieval for the results of data processing that span multiple research groups or organizations. The linchpin data products that connect the workflow invocations from different organizations as the links of a data processing chain are the key to maintaining the continuity of the complementary, retrospective lineage. We propose using the Resource Description Framework (RDF) as a standard, portable format for summarizing the lineage metadata of workflow invocations.; This work investigates alternatives to the Earth System Science Workbench proposals for composing and conveying the lineage of satellite-derived data products. Research contributions include: developing a workflow and metadata model to facilitate composing both fundamental and lineage metadata for script-based data processing; proposing the concept of a standalone lineage server to provide additional flexibility for delivering the metadata of objects in the lineage; and investigating the use of embedding or linking RDF/XML lineage metadata within the fundamental metadata for a data product to connect the links of the "lineage chain," that is, a chain of workflow invocations, across organizations.
Keywords/Search Tags:Data, Lineage, Science, Environmental, Workflow invocations, Processing, Composing, Organizations
Related items