Font Size: a A A

Combining Variants Data For Genome Indexing

Posted on:2013-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Z GuoFull Text:PDF
GTID:2180330392967959Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genome Mapping is a process of mapping the data that produced by highthroughput sequencing technologies to human reference genome. Mapping systemis the foundation of processing and analysis of biological data, has an importantmeaning to expression analysis and SNP site forecast. The index structure is animportant part of the mapping system, is the foundation of the sequence alignmenton a large scale.The main research goal is to improve existing mapping system index-buildingmodule, to design and provide an implementation of new index generationalgorithm, to construct genome index structure combined with variation data.This paper firstly states the basic concepts of sequence alignment andBurrows-Wheeler Transform (BWT) data structure. Also the index content based onBWT structure is shown,An analysis and discussion of exact match algorithm isgiven, and illustration some important sorting algorithms involved with the indexformation process. Then concept of absolute axis on genome is put forwardintroducing the Hapmap database mutation data. Presentation of index files creationprinciple of public part and variation part, including each part’s specific data storageformats. Detailed description of the public part’s conversion process to BWTsequence, and an analysis of variation part’s index data structure and content ofevery index file. Finally, systemic introduction of corresponding verification methodrespectively according to different index structure. Demonstration of the results ofvalidation and the validity analysis. Each part’s relevant alignment strategy is shownfor the technical support of subsequent sequence alignment process on large scale.In conclusion, original reference genome data and variation data are effectivelycombined in this paper, a new genome index structure with combination of variationdata is established, BWT index building principle is explained, and this paper alsodevelops a new perspective of designing mapping system index.
Keywords/Search Tags:Genome Mapping, Genome Index, BWT, absolute axis
PDF Full Text Request
Related items