Font Size: a A A

Design And Implementation Of De Novo Genome Assembly

Posted on:2015-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:X F SunFull Text:PDF
GTID:2180330422992277Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Genome assembly is the main part of bioinformatics. The purpose of assembly is to assemble short DNA sequence sequenced by current sequencing technology into complete DNA fragments. Human Genome Project, which cost ten years, used the first-generation DNA sequencing technology to complete a working draft of the genome. In recent years, with the development of sequencing technology, the second-generation emerged. These new technologies can fastly provide ultrahigh throughput with lower unit data cost. At the same time the data sequenced by the new technologies have short length and high error rate. In order to rapidly, memory saving, accurately assembly, we have developed a new method which can effectively deal with the second-generation DNA sequencing technology.By analyzing the features of DNA sequence, this paper introduces a new assembly system GSnake which based on Markov Model. The system firstly builds a Markov Model which satisfied the characteristics of genome using vast amounts of sequence data. The system then constructs the state transition probability matrix using short reads which is stored by hash table. A fragment of bases is seen as a state of the model. Based on the model, the paper proposes a de novo assembly method. The system chooses the appropriate initial state on the basis of the initial state probability and then gets the best present state from state transition probability matrix. After that, a series of contigs are generated. At the same time, because of there are so many repeat parts and sequence errors in the genome, this paper uses a series of heuristic rules to optimize the system choose the best present state in order to get a best assembly result.At last, this paper compares the result wich SOAPdenovo and Velvet using the evaluation software GAGE. It has been proved that the system has a better result in the number of contigs, the length of contigs, coverage and precision.
Keywords/Search Tags:second-generation sequencing, de novo assembly, Markov Model, parallel assembly
PDF Full Text Request
Related items