Font Size: a A A

Re-annotation And Construction Of Trans-omics Database System For Yersinia Pestis

Posted on:2017-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q MaoFull Text:PDF
GTID:1224330488955801Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
Yersinia pestis is a Gram-negative bacterium and the causative agent of bubonic, pneumonic and septicemic plague, which are systemic, invasive diseases. This notorious pathogen has caused hundreds of millions of deaths in three major plague pandemics in human history and keeps active until current age. According to the World Health Organization, 18 plague outbreaks occurred from 2001 to 2015. In China, there are 12 natural plague foci that distributed in 15 provinces, accounting for 15% of the country’s total land area.Since the first complete genome, strain CO92, was published by the Wellcome Trust Sanger Institute in 2001, there have been 12 complete genomes of Y. pestis with annotation information were deciphered. With the rapid development of experimental methods and techniques, especially advancements in the next-generation sequencing technology, numerous data from extensive studies of Y. pestis had been accumulated. Therefore, when we retrospected genome annotation results, contradictions and even errors are found because of previously limited knowledge. These inaccuracies could possibly be amplified and incorporated in subsequent annotation works. Researchers had used comparative genomics, transcriptomics and proteogenomics to re-annotate an individual strain of Y. pestis, but these works focused on the discovery novo gene elements and function correction of genes, and the content was not comprehensive enough. To improve current annotations to systematically understand the function, biological behavior and pathogenesis of Y. pestis, the induction, integration and improvement of knowledgebase of Y. pestis are required through including recently published experimental results, as well as genomic re-analysis by updating algorithm.Data sharing is important in promoting advances of science. Except for several large-scale public databases, few trans-omics database of prokaryotic organism was established. Although many automatic, semi-automatic annotation platforms were useful in providing information interested by researchers in specific area, the database system that included complete, accurate annotations and research knowledge of Y. pestis was need for researcher that desired to acquire comprehensive information of the species in a user-friendly way.The data used in this work include:(1)12 complete genome maps of Y. pestis from NCBI, which are the basis for re-annotation.(2)91001 proteomic results from mass spectrometry, which are formatted, standardized data and are easy to be processed.(3)The RNA-seq data from RNA sequencing of the 91001.(4)The gene expression profile data from microarray experiments of 91001 in a variety of environments. Additionally, the experimental data from relevant literature are also included.The data re-annotation can be divided into two major steps. The first step is data pre-processing. In this step, CDSs and the reliable gene set were determined based on the gene prediction results, and the allele genotype set was built after performing the homology and functional analyses. Then, TISs and pseudogenes were re-annotated according to the screened results obtained from the homologous gene data, MS data, and reliable gene prediction sets. In addition, repeat sequences, mobile elements, prophages, and GIs were re-annotated across the whole genome, and nc RNAs were identified in non-coding regions. The second step is to organize and analyze the re-annotation results. Because a variety of computational tools and public databases are incorporated in this process, pre-processed data must be screened, reduced, corrected, classified, and standardized. Then, processed, structured data are integrated to generate the final reannotation results. The homology analysis and the allele analysis are completed simultaneously. More than 30 software and databases are localized and employed in this study.The trans-omics database is composed by omics-based tables, and close interrelationships presented between the different types of data. Considering the diversity and complexity of omics data, we use information systems approach combining biological characteristics of Y. pestis to design and establish the database system. After determining research objectives, demand analysis and feasibility assessment are needed. According to the functions and service requirements, data standards are developed and data model is built. Then the structural and functional designs of databases are carried out. Based on My SQL relational database and Python Django framework, the web service system is coding and finally established.In this study, the complete genome of Y. pestis strain 91001 was re-annotated using genomics and proteogenomics data. One hundred and thirty-seven unreliable coding sequences were removed, and 41 homologous genes were relocated with their translational initiation sites, while the functions of seven pseudogenes and 392 hypothetical genes were revised. Moreover, annotations of non-coding RNAs, repeat sequences, transposable elements and allele diversity have also been incorporated. We also built a semi-automatic re-annotation pipeline that is suitable for Y. pestis and re-annotated the other 11 complete genomes of Y. pestis. Finally, based on relational database and web framework, we built the trans-omics database of Y. pestis(TODY, http://tody.bmi.ac.cn/), to storage the re-annotation results and other related information. In the process of allele diversity analysis and implementation of web service system, parallel computing method and distributed scheduling system are used to reduce the computing time and cost, which would facilitate further researches on larger scale information process.This study combines biological experiment results, bioinformatics analysis and computer technology, to provide further understanding on structure and function of genomes of Y. pestis. In the future work, we will continuously enrich the database system by including more relevant literatures and experimental data, and verified re-annotation results by molecular biological experiments. The appropriate models of data mining will be conducted for in-depth analysis, and the web service system will be migrated to the cloud-computing platform for large-scale data processing services in the big data era.
Keywords/Search Tags:Yersinia pestis, re-annotation, trans-omics database, web service system
PDF Full Text Request
Related items