Font Size: a A A

Retro: A Methodology For Retrospection Everywhere

Posted on:2014-04-03Degree:Ph.DType:Thesis
University:Brandeis UniversityCandidate:Shaull, RossFull Text:PDF
GTID:2455390008451172Subject:Computer Science
Abstract/Summary:
Changes in data over time are increasingly important in business intelligence and auditing applications. Having access to past states enables users to detect trends and anomalies, verify assumptions based on previous calculations, recover from input errors, test the efficacy of decisions, and make predictions about the future.;Retro is a new methodology and portable design for systematically adding the ability to save and access past states to an existing transactional data store. We developed efficient algorithms for in-memory and on-disk indexing of past states, including Skippy, an index for past state metadata. We designed novel protocol extensions to leverage existing data store multi-version concurrency control and recovery. We prototyped Retro in Berkeley DB.;A key performance metric is non-disruptiveness; whether saving snapshots with Retro interferes with database performance. Retro can non-disruptively save snapshots with high frequency (after every transaction) with minimal impact to update throughput, with an average increase to checkpoint length of about 15%.;Retro provides retrospection, a novel and simple interface that allows any read-only query to execute "as of" a snapshot. Existing query languages and programming APIs can be used with retrospection, simplifying the use of Retro for query programmers. In our experiments using selected TPC-H queries and custom SQL queries, running an I/O-bound query with retrospection is anywhere from 2.5x to 19x slower compared to running the same query without Retro. This slow down is due to declustered I/O from copy-on-write (COW) snapshots, a cost comparable to other state-of-the-art approaches that keep the current state intact and copy snapshots out using COW. New storage technologies such as flash memory offer an opportunity to avoid the main source of slow-down (cost of seeks due to declustering), while still taking advantage of the benefits of Retro.;This thesis presents the design, implementation, and evaluation of a solution to the problem integrating a snapshot system into an existing transactional data store without disrupting performance. A data store with Retro offers snapshots that stay "out of the way" when they aren't needed, and retrospection at any time on any snapshot using any query code.
Keywords/Search Tags:Retro, Past states, Query, Data
Related items