Font Size: a A A

Unknown Pathogen Detection System Based On High-throughput Sequencing Platform

Posted on:2017-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:D C LiFull Text:PDF
GTID:1224330488955777Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
We are facing with increasing worse disease situation caused by unconfirmed and new emerging pathogens currently, which seriously threatened the health and safety of human life. In recent years, there were several epidemics occurred in cities of our country which were caused by new burst pathogenic microorganisms. For examples, the Severe Acute Respiratory Syndrome(SARS) outbreak in 2003, the Avian Influenza Virus outbreak in 2006 and the H1N1 Virus outbreak in 2009 remain unforgotten. Worldwide, there are Germany E. coli O104: H4 outbreak in 2011, Dutch super-resistant Klebsiella pneumoniae Oxa4 epidemic and the Ebola virus outbreak in West Africa in 2014 which shocked the whole world.Achieving accurate detection of unknown pathogens is an essential requirement for major epidemic disease control and daily biosecurity oversight. Similarly, achieving the accurate identification of pathogenic microorganisms in clinical is crucial to diagnosis and effective prevention and control. The traditional methods of pathogen identification are mainly divided into two categories: the cell culture-based methods such as morphology, physiological and biochemical characteristics of cells, bacterial culture genotyping, gene chips, and automated microbial analysis systems, and methods based on specific primer / probe / antibody such as antigen-antibody reaction, PCR reactions and kinds of rapid detection systems for specific pathogenic microorganisms. These technologies play an important role in the daily confirmation of pathogenic microorganisms, but there are some shortcomings, for instances, the former methods rely on cell culture, long turnaround time and low identification accuracy, while the later methods require some prior knowledge of microbial sequence and is unable to deal with unknown pathogens or mutations.With the advent and rapid development of next-generation sequencing(NGS) technology, it begins to play a more important role in various fields of biology and medical research, and enter the first-tier clinical work. In the field of molecular diagnostics, it has become possible that direct sequencing and identification of pathogens in uncultured samples using NGS technology which is reported by many studies. In the case of the unknown pathogen outbreaks and clinical diagnosis, the NGS technology has more advantages than traditional methods. It does not require culture and prior knowledge of pathogen genome sequence, and could provide more pathogen genomes information.Therefore, the establishment of a technology platform to response to new outburst of unknown pathogenic microorganisms is favorable to facilitate rapid medical response and carry out prevention and control measures against major public health and safety incidents. So, we designed and established an unknown pathogen detection system based on high-throughput sequencing platform and high-performance computing platform, and realized the direct high-throughput sequencing of uncultured samples for pathogen diagnosis. The system made up for the difficulties that traditional methods could not solve in clinical pathogenic microorganisms identification. It had many advantages such as without culture, without prior knowledge of pathogen genome, small initial amount of nucleic acids required, high resolution and high accuracy, and providing more genomic information. High-throughput sequencing platform included Hi Seq2500 sequencer and Mi Seq sequencer which constituted the laboratory pathogen sequencing platform, and Ion torrent PGM sequencer, a spot sequencing platform. The platform had completed the sequencing of more than 1000 samples. High-performance computing platform consisted of a computing system with peak computing speed of no less than 20 trillion times / second and a highly reliable storage system with capacity of no less than 500 TB. We also deployed important pathogen and host nucleic acid databases on the platform which could provide basis for alignment, genome assembly, genotyping and resistance analysis. We also developed bioinformatics analysis pipelines for rapid confirmation of unknown pathogen, genome assembly and other analysis. Finally, we carried out the performance testing, and found that the computing ability of this platform was outstanding and the efficiency of acceleration is significant.How to scientifically pretreat clinical samples to improve the pathogen detection efficiency is not clear. So we designed and studied how the pretreatments i.e. the background depletion(removal of r RNA and m RNA, BD) and whole transcriptome(c DNA) amplification(WTA) influence the subsequent sequencing, particularly the identification of pathogen from uncultured clinical specimens. Using mixtures of human and influenza A virus(H1N1) RNA as a model, we applied NGS on these simulation samples and compared the pathogen genome recovery efficiency under different experimental pretreatment methods by bioinformatics analysis. We found that direct sequencing of uncultured samples without pretreatment is favorable to recover pathogen genome. It is not only capable of accurate identification and classification, but also to generate the largest genome coverage, the minimum coverage bias and the most efficient genome recovery. Direct sequencing method, compared to background nucleic acid depletion, c DNA amplification and many other methods, was recommend to detect unknown pathogens from uncultured clinical samples, for its relatively simple experimental operation and without special pretreatment, which means less experimental processes, lower cost, much lower technical error rate and fewer experimental turnaround time.As for bioinformatics analysis, although there are many software tools and workflows with similar functions, but currently we lack a comprehensive comparative evaluation of existing bioinformatics analysis softwares and pipelines. We studied and analyzed the performance of different softwares when applying NGS technology on identification of unknown pathogens from uncultured samples. Ultimately we made an overall evaluation for each of them and gave the best options according to the specific application conditions. In general cases with good sequencing quality, assembly pipelines was better than pipelines without assembly. When a new sequence or new pathogen contained in sample, a pipeline for de novo assembly is optimal as its independent on any pathogen genomic sequence. When contig could not obtained by assembly, a pipeline aligning reads to microorganism sequence databases is recommend because it could enrich reads to identify unknown pathogens. When it came to host sequences dominant cases, a step of host sequence alignment and depletion was necessary.Finally, we carried out dual-RNA-seq on 18 paired cases of head and neck cancer(HNC) tissue samples by using high-throughput sequencing at the same time. By means of bioinformatics analysis, we found the Treponema denticola were differentially expressed in tumor and paraneoplastic tissues. It suggested that it may be a potentially relevant microorganism or marker of HNC, which required more experiments to verify. As a research to test our detection system by identifying an unknown pathogen, it proved that our unknown pathogen detection system based on high-throughput sequencing platform is successful and effective.Our research and evaluation result will help to guide future clinical pathogenic microorganism identification and analysis, to improve the identification accuracy and scientificalness. It will also help us to make a swift response to outbreaks and deal with difficult clinical diagnosis in future, and contribute to better service for human health and safety.
Keywords/Search Tags:next-generation sequencing, high-performance computing platform, pathogenic microorganism detection, uncultured, background depletion, whole transcriptome amplification, bioinformatics analysis pipeline, head and neck cancer
PDF Full Text Request
Related items