Font Size: a A A

Research On The Construction Of Transcriptional Control Algorithm Based On Deep Auto Encoder And XGBoost

Posted on:2018-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2310330536981936Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Gene regulatory network is a complex network composed of multiple genes,and transcriptional regulatory network is an important component of gene regulatory network.Transcription regulation is the main process of gene regulation,that is,transcription factors control the expression of target genes through binding sites.In recent years,the prediction and reconstruction of transcriptional regulatory networks is the focus of research in systems biology and bioinformatics,and the construction of transcriptional regulatory networks provides an effective means to reveal the reaction mechanism within cells.The main research problem of transcriptional regulation network is to reconstruct the network by using the expression spectrum and other omics data.With the development of high-throughput sequencing technology and gene chip technology,it is more convenient to acquire gene expression data,gene sequence data and gene annotation data.In addition,with the continuous research of machine learning in recent years,the construction of gene transcription regulatory network has entered a new stage,and it is hopeful to construct a high precision gene transcriptional regulatory network.The traditional method of probabilistic inference for transcription regulation is simple and fast,but the prediction accuracy is not high.Moreover,when the network structure is complex,the complexity of constructing transcriptional regulatory networks is too high.In recent years,machine learning based methods have become the mainstream in the field of transcriptional regulatory networks reconstruction.There are two types of methods for the reconstruction of transcriptional regulatory networks.One kind is the regression model based on the integrated tree.It is necessary to construct a large number of regression models to obtain the transcriptional regulation network,and the computational complexity is high.The other is to build a transcriptional regulation network problem into a classification problem,using SVM for classification.However,there are some problems with such methods.For example,the use of biological data is inadequate and the network construction of gene transcription regulation is low,especially on larger data sets.To solve the above problems,this paper integrates the gene expression data,gene sequence data and gene annotation data,and then present DAXL model(combined model with XGBoost and Logistic regression based on Deep Autoencoder)for constructing the transcriptional regulatory networks.To illustrate the performance of DAXL constructing a transcriptional regulatory network,experiments were carried out on Arabidopsis data.First,the Arabidopsis data set are collected;Then,we construct trusted negative samples set based on the biological background;Third,the deep autoencoder are applied to the high dimensional sparse gene annotation data to study the upper condensed representation;Finally,model XGBoost and logistic regression are trained to obtain the transcriptional regulatory network.At the same time,comparison experiments with current methods are carried out in this paper,such as logistic regression,support vector machines,neural networks,XGBoost and GENIE3.F1 score is selected as the evaluation metrics in all experiments.In the comparison with other methods,it is found that the DAXL method get higher accuracy.The comparative experiments fully demonstrate that the proposed DAXL method is more suitable for constructing transcriptional regulatory network.
Keywords/Search Tags:Transcriptional regulatory network, deep autoencoder, XGBoost, logistic regression model
PDF Full Text Request
Related items