Font Size: a A A

Research On And Application Of DNA Methylation Prediction Method Based On Multiple-Algorithm Fusion

Posted on:2024-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:C D WuFull Text:PDF
GTID:2530307103990159Subject:Mechanics (Professional Degree)
Abstract/Summary:PDF Full Text Request
DNA methylation is a crucial aspect of epigenetic mechanisms that regulates gene expression by modifying chromatin structure,DNA stability,and DNA-protein interactions.Changes in DNA methylation status can cause the inactivation of tumor suppressor genes and activation of oncogenes,resulting in cancer.Therefore,investigating DNA methylation is highly significant in understanding the association between health and disease.Although biological experiments are capable of achieving high accuracy in detecting DNA methylation,the cost and time required for detection make it impractical for largescale methylation data analysis.Additionally,traditional machine learning algorithms require manual definition of features,which can be subject to personal expertise and lead to lower accuracy.Deep learning,on the other hand,can automatically extract features,effectively addressing the issue of manual feature definition in traditional machine learning algorithms,and has become a crucial supplement for predicting methylation.Nevertheless,the variability in DNA methylation data may hinder the generalization ability of a single model when predicting methylation data for different samples.Moreover,the DNA sequence features extracted by deep learning algorithms may not be readily apparent,leading to lower accuracy in predicting DNA methylation.Therefore,the primary focus of this article is to address these issues through a comprehensive research effort.(1)An improved method of feature fusion was proposed to address the problem of DNA sequence distinctive feature extraction in deep learning models.This method involves establishing a feature splicing model called CRNN.It begins by using a convolutional neural network to extract global features of DNA sequences,and a recurrent neural network to capture temporal features.The global and temporal features are then fused using feature splicing technique to improve the feature expression of DNA sequences during training,resulting in improved accuracy in predicting DNA methylation.The DNAVGG model was improved on the basis of the VGG16 model.The DNA sequence data that underwent feature fusion were trained using an improved DNAVGG model,resulting in a DNA methylation prediction accuracy of 92.67%.In comparison to DNA sequence data without feature fusion,the accuracy of DNA methylation prediction increased by 2.25% when using the data that underwent feature fusion to train the DNAVGG model.This result confirms that the proposed feature fusion method can address the issue of distinctive DNA sequence features.(2)Due to the differences in DNA methylation data among different samples,there is a weak generalization ability issue when a single model predicts DNA methylation data for different samples.To address this issue,this paper proposes a DNA methylation prediction model called CpGspr based on multi-algorithm fusion to improve the generalization ability of the model.The CpGspr model is comprised of three main components: the CRNN feature stitching model,the feature extraction model,and the output part.The feature extraction model comprises a convolutional neural network and a bidirectional long and short-term memory network.The attention mechanism is employed to enhance the weight parameters of DNA sequence order features present in the vicinity of CpG sites,thereby improving the expression of DNA sequence features.Therefore,the CpGspr model has the advantages of both CNN in extracting features and RNN in extracting temporal features.In this paper,the CpGspr model was trained using human embryonic stem cell methylation data(GSM432685),and the experimental results showed that the CpGspr model predicted DNA methylation with an accuracy of 93.96%and an AUC of 97.27%,and by comparing with the MRCNN model,the accuracy and AUC of the CpGspr model were improved by 0.72% and 0.61%.To verify the generalization ability of the CpGspr model,this paper used the CpGspr model and the MRCNN model to predict four kinds of DNA methylation data for the same individual but different samples.The experimental results showed that the methylation prediction accuracy of the CpGspr model was above 90% and the AUC was above 95% for all four data types,with an accuracy error of only 1.13%.This confirms that the CpGspr model has good generalization ability.(3)This paper presents a DNA methylation prediction system based on the CpGspr model,which can achieve efficient prediction of DNA methylation and improve the accuracy of DNA methylation detection caused by insufficient human expertise.This system enables intelligent DNA methylation prediction and provides development space for intelligent medical robots that predict DNA methylation.
Keywords/Search Tags:DNA methylation, bi-long short-term memory, feature fusion, attention mechanism, deep learning
PDF Full Text Request
Related items