Font Size: a A A

Building Whole Genome-wide Cotton Gene Annotation System Through Omics Data Integration And Functional Module Analysis

Posted on:2018-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q YouFull Text:PDF
GTID:1313330515982248Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Cotton,an economically important crop worldwide,is essential to the agriculture and textile industries.With the release of the whole-genome sequences of diploid and allotetraploid cotton,the demand for refined annotation of cotton genes on the whole-genome level becomes high because we know so little about what most cotton genes do,such as the functional genes for fibre development,quality improvement,disease resistance,drought resistance and salinity resistance.Considering the low proportion of annotated genes in the cotton genome,it is necessary and urgent to conduct big data mining to yield novel insights into cotton development and stress response.Through integrating transcriptomic data,we discovered that multi-dimensional co-expression network analysis was powerful for predicting cotton gene functions and functional modules.Here,the recently available transcriptome data on Gossypium arboreum and Gossypium hirsutum,including data on multiple growth stages of tissues and stress treatment samples were applied to construct whole-genome co-expression networks exploring multi-dimensional expression(development and stress)through multi-layered approaches.Based on differential gene expression and network analyses,functional modules with important agronomic traits such fibre elongation and water stress response were identified.Meanwhile,a data mining system was combined with several functional analysis tools,including orthologue annotation,gene family classification,cis-element analysis and gene ontology analysis,to evaluate the reliability of the predictions.In addition,the Clique Percolation Method were used to predict possible functional modules.As a result,1,155 and 1,884 co-expression modules,213 and 135 miRNA target modules were identified in G.arboreum and G.hisutum,respectively,which cover multiple biological processes like metabolism,pathogen and stress responses,hormone regulation,development,etc.We provided network comparison analysis for the orthologous genes across the diploid and allotetraploid Gossypium.In total,96,466 ortholog pairs in 16,142 homologous groups,including 32,417(78.4%coverage)and 62,050(88.0%coverage)genes in G.arboreum and G.hirsutum,respectively.Four aspects,such as module sizes and components,orthologous pairs,gene expression profiles,and cis-regulatory elements of the co-expression networks of homologous genes were compared.Furthermore,in-house H3K4me3 ChIP-seq data together with RNA-seq data were integrated to explore conservation and variation of genome structures and gene functions between diploid and allotetraploid cotton.As a result,6,773 and 12,773 new transcripts were discovered in G.arboreum and G.hirsutum,including coding or non-coding transcripts.The qRT-PCR,ESTs and synteny comparison improved the accuracy of the predicted results.Meanwhile,co-expression networks were linked with histone modification and tried to explain differential H3K4me3 modifications correlated with changes in gene transcription during cotton development and evolution.Finally,we have constructed an online ccNET database(http://structuralbiology.cau.edu.cn/gossypium/)for comparative gene functional analyses at a multi-dimensional network and epigenomic level across diploid and polyploid Gossypium species.In addition,we established the Setaria italica Functional Genomics Database(SIFGD,http://structuralbiology.cau.edu.cn/SIFGD/)for bioinformatics analyses of gene function or regulatory modules.We also determined rules for identifying primary transcription start sites of miRNAs by integrating public omics data in the model plant Arabidopsis,and constructed PTSmiRNA(http://structuralbiology.cau.edu.cn/PTSmiRNA/)platform for result visualization.Taken together,we conducted omics data integration and functional module analysis in this study,to efficiently refine annotation of cotton genome structure and gene function,which helps to yield novel insights into gene/module functions related to cotton development and stress response and might be beneficial to cotton molecular breeding.Our analysis strategy with conditional dissection,functional module classification and comparisons for functional module prediction and refined gene function annotation is very effective and might be useful for studying conservation and diversity in other polyploid plants,such as T.aestivum and Brassica napus,etc.
Keywords/Search Tags:cotton, omics data integration, diploid and polyploid, comparative analysis with functional modules, construction of gene function annotation platform
PDF Full Text Request
Related items