Font Size: a A A

Construction And Functional Annotation Of Bacterial SRNA Target Database

Posted on:2017-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1220330488955798Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Bacterial s RNAs has emerged as a pervasive regulator of diverse cellular processes, with established roles in metabolism, quorum sensing, biofilm formation, iron regulation, and virulence. They implement their functions primarily by binding to the target m RNAs or proteins. therefore, systematic collection of experimentally verified bacterial s RNA targets and development of the relevant database would not only be helpful in understanding s RNA function, but also provide support for developing prediction models of bacterial s RNA targets.To date, there are seven databases associated with bacterial s RNA, which are s RNAMap, s RNAdb, Rfam, Regulon DB, NPInter, BSRD and s RNATar Base respectively. These databases emphasized different aspects in data collection and annotation. For example, s RNAMap is a database of gram-negative bacteria s RNAs, including 397 s RNAs, 62 regulators/s RNAs and 60 s RNAs/targets in 70 microbial genomes. Additionally, more valuable information of the s RNAs, such as the secondary structure of s RNAs, the expressed conditions of s RNAs, the expression profiles of s RNAs, the transcriptional start sites of s RNAs and the cross-links to other biological databases, are provided for further investigation.s RNAdb is database platform which collects s RNAs from gram-positive bacteria, and contains 671 experimentally verified s RNAs and 9993 automatically predicted s RNAs from 558 gram-positive genomes and plasmids. The local version including analysis and visualization tools facilitates complex bioinformatics analyses for users.Rfam is a database of functional RNA families, each of which is represented by a multiple sequence alignment and a covariance model. A total of 3115 bacterial s RNAs are included in this database, but no information regarding s RNA-target interactions is available.Regulon DB is a comprehensive database for transcriptional regulation in E.coli, such as transcription units(TUs), promoter and transcriptional regulators(TRs), and contains 110 s RNAs and 227 s RNA-target interactions. Among these interactions, 53 binding regions of target m RNAs are known and 50 s RNAs are involved, but with no detailed information regarding the binding regions of s RNAs and related mutation experiments.NPInter v2.0 mainly provides interactions between non-coding RNAs and other biomolecules in Homo sapiens and Mus musculus. There are only 107 bacterial s RNAtarget interactions from four organisms, including B. subtilis, E.coli, S. typhimurium and S. aureus. Among them, 32 s RNAs are involved, but no binding regions are provided.The comprehensive s RNA database BSRD contains 897 validated s RNAs, 8248 s RNA homologs and 507 candidate s RNAs from high-throughput datasets. BSRD includes 203 s RNA-target interactions, some of which were derived from s RNATar Base 2.0. However, no binding regions of s RNA-target m RNAs interactions were provided. A total of 57 s RNAs are involved in these interactions. s RNATar Base is a database for s RNA targets verified by experiments previously developed by our group in 2010. It contains 138 s RNA-target interactions and 252 non-interaction molecules. To date, the database has been applied in many aspects.In summary, the aforementioned six databases except for s RNATar Base 2.0 do not provide comprehensive information about s RNA-target interactions, their binding regions and related mutation experiments. Therefore, they cannot be applied to developing prediction models of binding regions of s RNA–m RNA interactions. Additionally, s RNATar Base hasn’t been updated for a long time. To provide a comprehensive and timely support to the s RNA research community, we recently updated the database to a new version and conducted relevant functional annotation research.To provide a comprehensive bacterial s RNA target database, we employ three strategies for data collection:(1) rechecking the 392 entries from s RNATar Base2.0, such as updating the sequence of s RNAs and targets.(2) querying Pub Med using keywords such as ‘bacterial s RNA function’ or ‘bacterial s RNA target’ or ‘bacterial small regulatory RNA target’, and found 4524 publications. Considering that s RNATar Base 2.0 took into consideration articles published before May 1, 2010, we mainly focused on those published between January 1, 2010 and June 1, 2015, i.e. 3124 papers. Then, their abstracts were carefully reviewed, and 120 full papers associated with s RNA-target interactions were extracted. From these papers, information about s RNA-target entries was extracted.(3) To avoid missing s RNA targets, we extracted s RNA-target datasets from all literatures of bacterial s RNA target prediction models, and compared them with the data in our database. To the date of June 1, 2015, we finally obtained 771 s RNA-target entries, including 492 with validated interactions and 279 with no reported interactions. For each interaction, relevant information was recorded as much as possible: s RNA sequences and genomic positions; target m RNA sequences and genomic positions; binding regions of s RNA–m RNA interactions and related mutation experiments; validation methods such as ‘Reporter assay’, ‘Mutation’, ‘Knock out’, ‘s RNA deletion’ and ‘Footprinting’.To provide better service, we built a new database server. The database web site(http://ccb1.bmi.ac.cn/srnatarbase/) includes six main functions.(1) Users can query the database through common information, including s RNA information, target information, s RNA-target interaction information and experimental evidences, blast comparison, literatures.(2) Dynamic display of RNA secondary structure.(3) NCBI sequence viewer for s RNA-target interaction.(4) Display of s RNA-target regulatory network.(5) Target prediction using s RNATarget and s Tar Picker, and functional enrichment analysis using DAVID, GOEAST and PANTHER.(6) Phylogenetic analysis.On the basis of the database, we found that some s RNAs have multiple targets or some targets are regulated by multiple s RNAs. To investigate the regulation relationship between a s RNA(or a target) and a group of targets(or s RNAs), we have developed a web server Cos Tar, by which an experimentalist can gain biological insights on sets of molecular entities. Such sets are the product of large-scale experiments such as s RNA profiling, mass spectrometry proteomics and gene expression data. To this end, we first extracted 897 s RNA sequences from BSRD database and bacterial genome sequences from NCBI. Then predicting their targets using s RNATarget and s Tar Picker. For a set of s RNAs, Cos Tar outputs a ranked list of gene targets according to their likelihood to be targeted by the s RNA ensemble and vice versa for a set of regulated genes.In summary, this paper consists of two parts:(1) We have updated the s RNA target database s RNATar Base. The new version holds 771 entries collected from 213 articles manually. The validated s RNA-target interactions and binding regions reached 492 and 316, respectively. In comparison with related databases such as Regulon DB, BSRD and NPInter v2.0, s RNATar Base 3.0 not only provides the largest number of bacterial interactions, but also includes 279 non-interactions, as well as detailed information about 316 s RNA-target m RNA binding regions and related mutation experiments. Data update along with new features, including NCBI sequence viewer, s RNA regulatory network, predicted target-based GO and pathway annotations and other functions, make this database a useful resource for developing prediction models for binding regions of s RNA–m RNA interactions, and related s RNA functional annotations.(2) We present Cos Tar as a novel integration scheme for a dozen of s RNA-target prediction resources. Cos Tar provides a double-sided view for ensembles of s RNA or genes. As some s RNAs are predicted to pair with hundreds of targets, a reduction to a small but significant set of targets is valuable for the experimentalist. Cos Tar application towards a set of genes as input is powerful in proposing potential s RNAs ensemble. Thus, the online tool can be used to generate hypotheses on the role of a specific genes or specific minimal set of s RNAs in any cellular settings.There are three main features and innovations in this work:(1) Construction of the bacterial s RNA target database 3.0 would provide comprehensive, accurate data to relevant research(such as the development of bacterial s RNA target prediction models, etc.).(2) The database Web site offers a variety of tools, such as NCBI sequence viewer, s RNA regulatory network and GO analysis, which would offer help to the relevant researchers.(3) Cos Tar can be used to generate hypotheses on the role of a specific genes or specific minimal set of s RNAs in any cellular settings.
Keywords/Search Tags:bacterial sRNA, target mRNA, Database
PDF Full Text Request
Related items