Font Size: a A A

Research On Intelligent Computation Models And Algorithms Of Gene Expression Regulation Mechanism

Posted on:2022-05-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Y HeFull Text:PDF
GTID:1520307034462574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the implementation of many whole-genome sequencing projects,genomics research is developing from genome sequencing to genome synthesis.The continuously emerging technologies,such as DNA based molecular assembly technology,genome editing technology,directional evolution technology and DNA storage technology,will greatly promote the researches of artificial precise regulation of synthetic biological products.The precise regulation of gene expression is very important to synthetic biology,but it still faces the key issues of elucidating the regulation mechanism of gene expression,especially to label the relevant regulatory elements and to find the corresponding relationship between gene and the real function.So we need to concentrate on how to identify a variety of relevant regulatory elements and how to explore and annotate the unknown-functional sequences and sites in the post-genome era.Meanwhile,a great number of multi-omics datasets have been generated by the development of high-throughput sequencing.These datasets provide extensive information to gain insights into biological systems and complex diseases,which potential need to be further mined by more efficient analysis such as the technology of machine learning.Machine learning aims at data analysis.It has become an important part of modern biotechnologies via building prediction models from multi-dimensional and large-scale datasets.By such well-generated models,machine learning algorithm makes predictions which help to the high-level analysis of more and more complex datasets.Therefore,based on the machine learning,we study the prediction of gene expression related regulatory elements and the modified sites at first.And we then systematically explore the reconstruction algorithm of the gene regulation network.The specific contributions of this thesis are shown as follows:(1)Identification of non-coding DNA sequences based on multiple features.The Non-coding DNA(nc DNA)sequences are the most important part of the organism genomes.We proposes a computational model that can accurately and automatically identify nc DNA sequences.By using the collected nc DNA benchmark dataset of Saccharomyces cerevisiae,we design an optimal feature extraction strategy from the mononucleotide,dimer,trimer,tetramer,pentamer and hexamer at first.Then we construct a support vector machine(SVM)classifier Sc-nc DNAPred to predict the nc DNA.This method not only saves the expensive cost of genome wide detections but also achieves a high accuracy of 0.98 for the nc DNA prediction.(2)Identification ofσ70 promoter in prokaryotes based on position-specific difference.Promoters regulate the transcription of many genes in prokaryotes.In order to recognize the promoter,which is a crucial part of the recognition of gene structure,we develop a promoter recognition method called 70Pro Pred.The 70Pro Pred combines the position-specific trinucleotide propensity based on a single-stranded characteristic(PSTNPSS)and the electron-ion interaction potential values for trinucleotides(Pse EIIP).This proposed method significantly outperforms the state of the art(for both accuracy and stability)when predictsσ70promoters in the prokaryote.Moreover,it can also promote the prediction of other species promoters.(3)Identification of DNA cytosine methylation sites based on position-specific difference.N4-methylcytosine(4m C)plays a critical role in DNA repair,expression and replication.Therefore the researches on how to identify 4m C sites benefit for understanding the biological functions and mechanisms.We implement a new tool4m CPred which detects the 4m C sites in Caenorhabditis elegans,Drosophila melanogaster,Arabidopsis thaliana,Escherichia coli,Geoalkalibacter subterraneus and Geobacter pickeringii.After the extensive experiments including the independent testing and the cross validation of different species,we verify the efficiency of 4m CPred to predict 4m C sites.Besides,we specifically discuss the importance of each feature for the prediction results.(4)Research on gene regulatory network reconstruction algorithm based on multi-source expression data.Lots of biological processes are controlled by gene regulatory networks(GRNs),such as,growth and differentiation of cells,occurrence and development of the diseases.For the studies of GRN,it is important to determine the relationships between different genes by mining the gene expression data.We thus build a multi-source and multi-model fusion method MMFGRN which reconstructs the GRN and discovers the potential regulatory relationships across genes.The empirical experiment results confirm the robustness of MMFGRN on different scales of the network.At the same time,the given strategy(including both integrated model construction and weighted fusion method)opens a gateway to further rebuild the biological network model without prior knowledge.
Keywords/Search Tags:Regulation of gene expression, Non-coding DNA, Promoter, Methylation, Gene regulatory network, Post-translational modification, Machine learning, Information theory
PDF Full Text Request
Related items