Font Size: a A A

The Establishment Of Conventional Genome Analysis Pipeline For Plants With Complex Polyploid Genome

Posted on:2016-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Q ZhangFull Text:PDF
GTID:2180330467477703Subject:Crop Science
Abstract/Summary:PDF Full Text Request
With rapid development of high-throughput sequencing technologies, mass genomic data are available, especially for plants with complex polyploid genome, which presents big challenges for bioinformatics analysis. Polyploidization results in a large number of redundant segments and a larger genome size, which make the genomes become complicated and difficult for genomic and transcriptomic level analysis and integration. In this study, using E. crus-galli as a case study, we compared different bioinformatics softwares, integrated the most applicable methods with Perl and finally established a genome analysis pipeline for polyploid plants.The pipeline includes three functions implementing by three soft models:(1) Assessment of quality of genome assembly. The assessments include independent genome size estimation and comparison with segments by fosmid clone sequencing.(2) Genome annotation. The repeat sequences were predicted by the integrated results of de novo prediction according to their features and structures, and similarity searches against the known repeat sequences in the databases. A hybrid strategy for gene prediction was applied involving ab initio prediction and RNA-seq support. Two ab initio gene finders were run on the repeat-masked genome for de novo gene prediction. All predicted gene structures by above procedures were integrated into consensus gene models. The predicted genes were aligned against non-redundant plant protein sequences and pathway databases for functional annotation.(3) Comparative genomics. Paralogs and orthologs were identified using a Markov cluster algorithm to group proteins from different species according to the protein sequence similarity. For genomic synteny, the predicted protein data set was aligned to related species and itself, respectively. Syntenic blocks and large evolutionary conserved regions of genomes between two organisms were detected according to the alignments and the gene location of aligned gene pairs. Moreover, fourfold degenerate sites (4DTv) value of each syntenic block was caluated by an in-house Perl script. Divergence times between different species and speciation time in the same specie were estimated according to the transversion rate of orthologous and paralogous gene pairs based on syntenic blocks.Finally, we provide a pipeline of genome annotation and comparative genome analysis for de novo genome assembly of complex polyploid. The pipeline, which was automatized by Perl language, includes the available latest toolsets and best-practice parameters, and provides a valuable bioinformatics analysis tool for genomic studies.
Keywords/Search Tags:polyploid plants genome, bioinformatic methods, genome analysis, bioinformatic pipeline and softwares, E. crus-galli genome
PDF Full Text Request
Related items