Font Size: a A A

The Analysis Methods Of Gene Prediction And Long Noncoding RNA Identification With RNA-Seq

Posted on:2015-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:L P LiFull Text:PDF
GTID:2180330434458822Subject:Crop Science
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology, a new approach, termed RNA-Seq (RNA sequencing), is widely used in life science researches. It is mainly about mapping and quantitative analysis of transcriptome and exhibits obvious advantages in obtaining a large number of novel genes and novel IncRNAs and improving efficiency of gene prediction, compared with currently available methods.Before analysis of RNA-seq data, it’s required to make use of some software, FASTX-Toolkit and Trim Galore!, to control the quality of RNA-Seq raw data, which can improve accuracy of gene prediction and IncRNAs identification. Then, reads are mapped onto the reference genome with Tophat.Gene prediction is an indispensable process for genome annotation. We here propose a four-step prediction model for gene prediction for corn and rat. Firstly, by integrating EST with RNA-Seq data into gene prediction model to improve accuracy, protein-coding genes are predicted with AUGUSTUS. Secondly, by comparison of homologous genes from other species with liftOver or Blast, redundant genes are removed. Finally, we annotate202,048transcripts in corn and32,197transcripts in rat, including165,820and4,802novel genes, respectively. By analyzing the gene expression level in different tissues, the new prediction method can detect more genes with low expression level.Long noncoding RNA identification is another hot topic of life regulation. We employ Cufflinks to assemble the reads into transcripts. Based on the structure, function and location of IncRNAs, many novel IncRNAs are identified with help of annotated genes from other species by PhyloCSF and Blast. Finally, we identify2,761and40,626IncRNAs in rat and human genome, respectively, and analyze the expression level and distribution of IncRNAs in different tissues. This new method could improve current IncRNA databases.These studies provide effective methods for biologists to study genetic mechanism of complex disease or traits by genetic analysis of gene and IncRNAs.
Keywords/Search Tags:RNA-Seq data, Tophat, Cufflinks, Gene prediction, AUGUSTUS, Longnoncoding RNA, PhyloCSF, Blast
PDF Full Text Request
Related items