Font Size: a A A

Identification Of LncRNAs And Platform Design For Analyses Of LncRNAs In Domestic Animal

Posted on:2016-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:A M LiFull Text:PDF
GTID:1220330488973903Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the domain of molecular biology, scientists keep going on the research on noncoding RNAs. Of which, the most important RNAs are microRNAs(miRNAs) and long non-coding RNAs(lncRNAs). The pace of studies on miRNAs is slowing, whereas, the research on a large number of important lncRNAs is in its infancy. Identification of lncRNAs is the only way to open the door to the lncRNA domain, and is a significant fundamental and leading work. As the identified lncRNAs, their molecular mechanism and functions could be further investigated. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. All these research would depend on bioinformatics algorithms, software tools and platforms for analysis. Therefore, we engaged in the research on lncRNA related domain based on bioinformatics pipelines. In details, they are:(1) We developed an algorithm which is especially useful for the identification of lncRNAs from de novo assembled transcriptome.High-throughput transcriptome sequencing(RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs(lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing. We present an alignment-free tool called PLEK(predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine(SVM) algorithm to distinguish lncRNAs from messenger RNAs(mRNAs), in the absence of genomic sequences or annotations.(1) Accuracy: The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets.(2) Robustness: PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets(sequenced by Pac Bio and 454 platforms) with relatively high indel sequencing errors.(3) Computational complexity: In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index(CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator(CPC), in a single-threading running manner.PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from m RNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.(2) We annotated the lncRNAs of domestic animals and built an lncRNA database of domestic animals.Long noncoding RNAs(lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. With advancing sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs(lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases(lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci(QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool(BLAST), the Generic Genome Browser(GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs.ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.(3) We investigated the interaction between miRNAs and m RNAs, and between miRNAs and lncRNAs on a genome-wide scale and analyzed the effect of SNPs on their interplay.Domestic animals show considerable genetic diversity. Previous studies suggested that animal phenotypes were affected by miRNA-mRNA interplay, but these studies focused mainly on the analysis of one or several miRNA-mRNA interactions. Whereas in this study, we present miRNA-mRNA and miRNA-lncRNA interactions on genome scale by using the miRanda and TargetScan algorithms. There has been strong directional artificial selection practiced during the domestication of animals. Thus, we investigated the SNPs that were located in miRNAs and miRNA binding sites, and found that several SNPs located in 3’UTRs of m RNAs had the potential to affect miRNA-mRNA interactions. In addition, a database, named miRBond, was developed to provide visualization, analysis and downloading of the resulting datasets.Our results open the way to further experimental verification of miRNA-m RNA and miRNA-lncRNA interactions, and SNPs’ influence upon such interplay.
Keywords/Search Tags:long non-coding RNA, domestic animal, coding capasity judgment, identification, RNA-RNA interaction
PDF Full Text Request
Related items