Font Size: a A A

Comparison Of Transcriptome Assembly Software For Next-Generation Sequencing Technologies

Posted on:2014-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LuFull Text:PDF
GTID:1260330425467541Subject:Ecology
Abstract/Summary:PDF Full Text Request
The high throughput and sensitivity of next-generation sequencing (NGS) has brought unprecedented opportunities for transcriptomic study. In contrast to microarray methods and Sanger sequencing of EST libraries, RNA sequencing (RNA-Seq) using NGS has many advantages in the characterization and quantification of transcriptomes. However, transcriptome assembly from billions of short Reads poses a significant informatics challenge, which is also the bottleneck for the accuracy of the final result. Currently, there are many strategies and software for transcriptome assembly. Due to unfamiliar with the capability and performance of the software, the selection of a suitable assembler remains to be a tough task. Here, we sequenced six samples from Bos grunniens and two samples from Populus euphratica and obtained14giga bases and7giga bases filtered Reads respectively. We assembled these Reads with four de novo assemblers and two reference-based assemblers. Our aim is to evaluate which software is the best to assemble to transcriptome of one species.After the establishment of the six assembly platform, we assembled the yak six samples and two poplar samples separately and drew three following conclusions:(1) Within the four de novo assemblers, SOAPdenovo-Trans and Trinity which are using single K-mer method outperform5%more than the other two multi K-mer software, Rnnotator and Oases, in generating contigs with a better coverage of the known genes. The continuity of the individual sequence assembled with a single K-mer was also12%better than the others. Although the transcript number of the multi K-mer software could be several times than the single K-mer software, most of those transcripts belonged to the redundancy sequences which were generated in the process of merging the result of the different K-mer.(2) The application of the gap-filling strategy of the de novo assembly software SOAPdenovo-Trans lowered the quality of the assembly result. Compared to the other three assemblers, SOAPdenovo-Trans lost6%accuracy but did not bring longer transcripts and better quality. The correlation between gene coverage and completeness or continuity suggests that the gap-filling strategy may take no effect on low-coverage gene.(3) The comparison of two reference-based software shows that Cufflinks which adopt a more conservative strategy has higher sensitivity and specificity than Scripture when detecting previously annotated genes.Our study indicates that de novo assembly is well-suited for non-model organisms which don’t have a genome or a high-quality genome. Trinity, which has higher accuracy and completeness, is the best software for de novo assembly. For organisms which have a high-quality genome, reference-based assembly would be more appropriate; and Cufflinks is superior in transcript variant detection. Our study will be valuable for researchers in choosing a well-suited assembler and offer essential information for the improvement of existing assemblers or the development of novel assemblers.
Keywords/Search Tags:Next generation sequencing, transcriptome assembly, denovo assembly, reference-based assembly, assembly strategies
PDF Full Text Request
Related items