Font Size: a A A

Research And Implementation Of Enhancer Prediction Algorithms Based On Ensemble Learning

Posted on:2024-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:M D LiuFull Text:PDF
GTID:2530307121459424Subject:Engineering
Abstract/Summary:PDF Full Text Request
Enhancers are cis-regulatory elements in the non-coding regions of the DNA sequence that are used to enhance gene expression by interacting with the promoter of the gene they target.Super enhancers are composed of multiple common enhancers that can activate gene expression to a greater extent.Enhancers and super enhancers have become two hot topics in the field of gene expression regulation,as they can maintain cell identity,determine cell fate and drive transcription in cancer cells.The identification of enhancers and their strength from DNA sequences is difficult due to their lack of sequence specificity and high tissue cell specificity,while super enhancers,as superstructures of enhancers,are of greater value in cancer research.Using traditional methods to predict enhancers and super enhancers suffers from high cost,is time-consuming,insufficient information mining,and low accuracy,making it difficult to meet complex prediction problems with large amounts of data.Ensemble learning combines the decision results of different base classifiers,which have the advantages of high error tolerance,good generalization,and robustness,and can usually achieve better performance than traditional methods in complex scenarios with large amounts of data.Based on the idea of integration learning,this paper carries out a series of studies on augmented sub-prediction algorithms based on integration learning,and proposes integration learning models applicable to specific application scenarios for augmented sub and super augmented sub-prediction tasks respectively,with the main research work as follows:(1)Research on the algorithm of enhancer and its strength prediction based on stacking ensemble learning.Aiming at the problems of insufficient feature mining of sequence data,low fault tolerance rate of classifiers,and low prediction accuracy of traditional enhancers and their strength prediction algorithms,this study conducts feature coding for base frequency,physical and chemical properties,and spatial properties of DNA sequence data,integrates multiple high-quality base classifiers,and constructs a method based on multi-feature fusion and stacking ensemble learning—i Enhancer SKNN,The enhancer,and its intensity are predicted.The experimental results show that this model is not only superior to single feature-based models and single machine learning models but also superior to existing ensemble learning methods.In addition,this study also analyzed the motifs of potential transcription factor binding sites in the enhancer region,which confirmed that the enhancer has a key biological function in the process of transcriptional regulation.(2)A study of super enhancer prediction algorithm based on stacking ensemble learning.The traditional super enhancer prediction algorithm is lack DNA sequence data,insufficient mining of sequence features,low fault tolerance,and weak generalization in cross-cell line prediction scenarios.This research proposes a super enhancer prediction algorithm Stack SE based on Stacking ensemble learning.This algorithm fuses multiple traditional sequence features to represent DNA sequence data.Using multiple base classifiers with good classification performance,it realizes the accurate prediction of the super enhancer algorithm using only DNA sequence data.In view of the high tissue specificity of super enhancers,a cell-specific prediction model Stack SE specific was proposed based on the Stack SE model,which realized cell-specific prediction of super enhancers and cross-cell line prediction.On the basis of the Stack SE model,a general prediction model of multi-cell lines Stack SE integrated with good migration ability is also proposed,which can accurately predict the super enhancer of different cell lines,and the algorithm is superior to existing computing models.Research shows that the super enhancer may have hidden shared sequence patterns in different cell lines.(3)Research on super enhancer prediction algorithm based on stacking ensemble learningThis part combines the enhancer and super enhancer prediction algorithm based on ensemble learning proposed in this paper and designs an enhancer and super enhancer prediction system that can realize human-computer interaction.The system is divided into a landing interface module of an enhancer prediction system based on ensemble learning,a scene selection module for enhancer prediction tasks,a DNA sequence data information display module,and a prediction result statistics module.Each module can carry out enhancer and super enhancer prediction experiments based on ensemble learning through human-computer interaction,and statistics of the experimental results.
Keywords/Search Tags:DNA sequence analysis, Ensemble learning, Prediction, Enhancers, Super enhancer
PDF Full Text Request
Related items