Font Size: a A A

Free factories: From the quantum coreworld to the Personal Genome Project

Posted on:2010-03-26Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Zaranek, Alexander WaitFull Text:PDF
GTID:1440390002974813Subject:Biology
Abstract/Summary:
This dissertation develops technical and governance infrastructure for a "free factory" by building on parallels with free and open source software and related communities. By viewing varied technologies and people as comprising free factories---or a federation of co-operating and competing factories with certain common ideals and infrastructure---I argue many scientific questions become easier to answer.In the first chapter, I briefly summarize the dissertation. I then describe the hardware, staff and other resources required to implement the computational aspects of a free factory with reasonable economies of scale. In the next chapter, I use the infrastructure to search for DNA and RNA editing events in more than 600 million genomic traces from ten organisms at NCBI. I find numerous examples of traces that support the existence of these phenomena and set the stage for a more comprehensive investigation. The subsequent chapter uses the same tools to analyze four individual human genomes for variants of clinical interest. This work demonstrates such analyses need not lead to costly or harmful medical workup. In the last chapter, I describe the initial data release of the Personal Genome Project. The release is derived from two gigabases of targeted sequence data from ten individuals. I investigate the quality of the data by comparison with Affymetrix 500K SNPs and discuss one variant of clinical interest. This data release---linking scientists, physicians and members of the general public---demonstrates the utility of free factories for advancing the state-of-the-art in personalized, genomic medicine.In Appendix A, I indicate how the Quantum Coreworld---earlier work on a digital evolution system consistent with the rules of quantum information processing---could efficiently use free factories. Such projects could allow free factories to fully utilize idle resources. Finally, in Appendix B, a novel, open-source primary data analysis pipeline is used to reprocess 100 gigabytes of image data derived from the exome of a Personal Genome Project participant. This approach demonstrates a 14% increase in placeable reads, on the PGP sample, over the vendor's pipeline.
Keywords/Search Tags:Free, Personal genome, Quantum
Related items