Font Size: a A A

A Virtual And Dynamic Genome Database Of The Chinese Population

Posted on:2015-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C LingFull Text:PDF
GTID:1220330467480041Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Data released by1000Genome Project contributed a large amount of personal genomes sequenced from different nations and populations with a large amount of genetic variations. Big biological sequencing data raises urgent problems to scientists--how to handle this massive scale data in transferring, processing and storage, and finally discover the potential knowledge and pattern in it. Current genomic study is changing from single, static human genome to complex, dynamic personal genomes. However, the current human reference genome is based on the sequencing of a limited number of samples. The linear reference genome is certainly not adequate for genomics, transcriptomics, epigenetics and genome wide association studies.Based on whole genome sequencing of194individuals from the1000Genomes Project data, the project constructed virtual Chinese dynamic genome database (VCGDB). VCGDB provides dynamic genomic information containing35millions of single nucleotide variations (SNVs),0.5millions of insertions/deletions (InDels) and29millions of rare variations, together with corresponding genomic annotation information. According to this dynamic genomic information, we build a consensus genome sequence of Chinese population and validate it against the existing human reference genome using true sequencing data. The mapping result prove the Chinese reference genome based on the dynamic genomic information more suitable of representing genomic features of Chinese population.VCGDB is a "virtual" database. Because the reference genome does not belong to and represent any real existed human being; it is a statistics result of tera-base sequencing data from hundreds of Chinese individuals, thus adequate of describing the genetic variation features and preference of Chinese population. VCGDB is also a "dynamic" database. We use methods as comentropy to analyze and evaluate the dynamic variation rate and probability of all genetic variation information including SNVs, InDels and structure variation from different Chinese individuals, in several levels like sample or population. VCGDB integrates dynamic variations of individual characters and genomic annotation information, such as reference genes, genomic duplications, and GWAS clinical traits. All these sum up the total dynamic information related to Chinese population.VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with several significant functions. It is both a web-based applet and a client-based cross platform application, which can be used as an online browser or downloaded as local software. It can be used cross-platform and highly compatible. VCGBrowser presents a multi- dimensional view that directly demonstrates and compares the dynamic variations in a consensus coordinate along human genome and a canvas to mark all dynamic variations no matter inner or between populations. The browser is highly flexible, supporting real-time seamless zooming to any resolution, from genomic level that shows the dynamic distribution of interested region, to nucleotide level that all residues and detail information can be recognized clearly. Taking advantage of the highly structured and indexed VCGDB, VCGBrowser is implemented with real-time querying service that a click on the browser would trigger an instant query to the database, in return of detail information.In sum, VCGDB implements the high-efficient usage and successfully demonstrates the large amount of data released by the1000Genomes Project. VCGDB offers a feasible strategy for big data processing and attempts to provide a robust resource for genomics study and other disease related realms, especially for personal genome studies.
Keywords/Search Tags:Chinese population, Dynamic genome, Database, 1000Genomes Project, Big data
PDF Full Text Request
Related items