Font Size: a A A

Large Data Comparative Analysis And Database Construction Of Plant G-quadruplex

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:F F GeFull Text:PDF
GTID:2370330575972081Subject:Plant protection
Abstract/Summary:PDF Full Text Request
G-quadruplex(G-Q)is a high-level DNA or RNA secondary structure formed by the folding of G-rich sequences,its structural unit is G-quartet.G-quartet is a square planar formed by hydrogen bonds with guanine bases.G-Q is widely found in plant genomes and involved in important biological processes such as transcription,translation and telomere maintenance.Although some databases and tools were developed for predicting and studying G-Q,but none of them were for plant.With the development of next-generation sequencing technology,a large number of plant genomes have been assembled and annotated.In order to better promote the mining and analysis of G-Q large data in plants,this study uses the public data platform of plant genome with bioinformatics and structure-bases methods to identify GQ structures in plant genome.Using all G-Q information mined in this study,a comprehensive and user-friendly database was built.The results of this study are as follows:(1)The construction of plant genome information repository.Through reading and collecting information on plant genome articles,195 plant genomes and annotation files were obtained.The Latin name,English name,publication time,version number,reference name and acquisition address,genome acquisition address,genome size and other details of the species were extracted,and a plant genome information resource library was constructed using this information.(2)The identification and annotations of plant G-Qs.Scanning G-Q in the whole genome based on genomic information from 195 plant species,finally a total of 626,341,645 G-Qs were identified.The numbers of G-Qs of two G-tracts,three G-tracts,and four G-tracts were 610,897,949,14,326,347 and 1,117,349,respectively.Apparently,most of them are two Gtracts,accounting for 97.43%,three G-tracts are less than two G-Qs,accounting for 2.38%.Four G-tracts is the most stable,but these number are the smallest,which is accounting for 0.19%.Based on the annotated information of 195 plant species,the G-Q positions which are on intergenic or on the genes have been found.Based on the annotation information of 195 plant species,the location of the intergenic or gene between G-Q is annotated.(3)G-Q large data analysis and application.Based on the plant of their families,the frequency and position distribution of G-Q structures were analyzed and compared in 13 families with a large number of species.From the observation of G-Q distribution in the genome,the highest frequency is the two G-tracts and four G-Q is the lowest.Statistical analysis on the distribution frequency of G-Q structure in genome of 18 species of Gramineae,Cruciferae and Solanaceae found that the distribution rule is still that the two Gtracts has the highest frequency distribution on the genome,the three G-tracts distribution frequency is the second,and the four G-tracts distribution frequency is the lowest.Analysis of the G-Q position distribution of 13 families indicates that most species of G-Q structures distributed on intergenic are greater than the number of on genes,there are also part of the species of G-Q distributed on genes number is more,suggesting that the distributions of G-Q may be participate in some important physiological functions.(4)G-Q plant database and analysis tool database.To facilitate the query of G-Q information,Plant-GQ provides query function page.Main query results including the Latin names of species,chromosome number,type,the starting position,the end position and G-Q sequence information in detail.This database has also developed the G-Q online tools to predict G-Q structures with the FASTA format of the genome sequence.In addition,Genome browser Jbrowse shows 195 plant genome sequence,annotation information and G-Q information.Finally,all the G-Q data,as well as the Perl script,are available in the download page.(5)The construction of the plant G-Q platform.This research used the LAMP(Linux + Apache + PHP + MySQL/ThinkPHP)construct the function interface.In addition,a userfriendly database Plant-GQ(http://biodb.sdau.edu.cn/plantgq/index.php)was constructed for querying,browsing and downloading G-Q information.This database will pay an important roles in the study of biological relevance of G-Q in plant.Comparative analysis of plant G-Q large data and construction of G-Q large data platform will not only accelerate the study of various regulatory effects of G-Q structure,but also make up for the current situation of G-Q database vacancies in plants.
Keywords/Search Tags:Plant, G-quadruplex, Geome, A mass of datasets, Database
PDF Full Text Request
Related items