Font Size: a A A

Research On Deep Learning Based Transcription Factor Target Gene Prediction

Posted on:2021-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:S MeiFull Text:PDF
GTID:2370330611998848Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Transcription factors(TFs)are key regulators of gene transcription.Transcription factors can regulate gene transcription levels in cells,thereby affecting gene expression in cells.TF target gene prediction,which aims to identify genes that may be regulated by TFs from interaction data between TFs and genomes,is an important issue for gene expression mechanism research.Existing TF target gene prediction methods may be camped into rule-based methods,statistical methods,and machine learning-based methods.These methods still have certain shortcomings.Rule-based methods normally ignore TFs' binding properties and preferences in certain cell types,while statistical methods and machine learning-based methods cannot effectively model the regulatory relationships between genes and their surrounding TF binding sites(TFBSs).In particular,most existing methods are not able to detect the regulation from long-range TFBSs.Furthermore,these methods rely on data diversity and feature selection for predictions,and their application scenarios are limited by data availability.To address the above problems,this study investigates TF target gene prediction method based on TF binding Ch IP-seq data from the prospective of sample representation and predication algorithm design.In terms of sample representation,this study proposes a target gene representation method based on histone modifications.This method uses TFBSs' histone representation to construct gene representation.In terms of algorithm design,this study applies a bi-directional Long Short-Term Memory network(Bi LSTM)to model the dependency relationships between TFBSs around gene transcription start sites(TSSs)so as to capture the regulatory relationships between multiple TFBSs and genes.On this basis,a self-attention mechanism is introduced to solve the long-range regulation detection problem in TF target gene prediction.Experimental results on 24 TFs in GM12878 show that the proposed method outperforms the state-of-the-art method with improvements 2.94% to 9.24% in F1 score,and 1.7% to 7.55% in AUC.It indicates that dependencies between different TFBSs around TSSs and regulation from long-range TFBSs play an important role in TF target gene prediction.Considering that different TFs in the same cell type show similar binding patterns on the genome,based on the above proposed method,a TF target gene prediction method that combines transfer learning and adversarial training is proposed.Utilizing transfer learning and adversarial training techniques,the method aims to learn TF-type independent transferable features from target gene data of other types of TFs to improve the target gene prediction task performance of the concerned TF in the same cell.Experimental results show that the proposed method improves the F1 score for 1.89% to 9.35% and the AUC for 1.65% to 8.14%,compared with existing methods with state-ofthe-art performance,which shows the feasibility of cross-TF-type target gene prediction.
Keywords/Search Tags:transcription factor, target gene prediction, deep learning, attention mechanism, transfer learning
PDF Full Text Request
Related items