SNP Site-Drug Association Prediction Based On Machine Learning

Posted on:2023-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:X B Feng

Full Text:PDF

GTID:2544306848477454

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Single Nucleotide Polymorphism(SNP)site is an important basis for the study of human family,animal and plant genetic variation,so it is widely used in population genetics,disease-related genes and other research,and plays an important role in pharmacogenomics,diagnostics and biomedicine in pharmacogenomics research,identifying the SNP site-drug association is the key to clinical precision medicine.However,the traditional biological experimental method is not only costly and inefficient,but also has a certain degree of blindness when verifying the association between a large number of SNP sites and drugs,which makes it unable to be widely used in practical applications.In recent years,with the development of bioinformatics technologies,such as machine learning and data mining have opened up new and efficient strategies and methods for predicting SNP site-drug associations.Therefore,this article proposes a machine learning-based SNP site-drug association prediction algorithm.The main research contents of the article are as follows:First,numerically characterize SNP sites and drug molecules.Since SNP sites and drug molecules are stored in the database in the form of strings,they cannot be directly input into the classifier as feature vectors for prediction.Therefore,this article proposes a numerical characterization method of SNP sites based on k-mer,and adopts a method for numerical characterization of drug molecules based on molecular fingerprints.These methods describe the essential properties of SNP site and drug data,and provide data assurance for subsequent feature extraction algorithms.Second,feature extraction is performed on SNP site-drug features.There are noise data and high-dimensional data in the SNP site feature information after numerical characterization.Therefore,this article proposes a SNP site-drug feature extraction algorithm based on Denoising Variational Auto-Encoders(DVAE),which makes the generated features efficient and does not lose biological information.Next,The extracted SNP site effective features and drug molecule features are fused to form SNP site-drug fusion features,which are input into the random forest classifier for experimental training,validation and testing.In order to evaluate the ability of denoising variational auto-encoder to extract features,a five-fold cross-validation experiment was performed on the model,and good results were obtained.Then compare different feature extraction algorithms and different classifiers respectively,the results show that the feature extraction algorithm proposed in this article can accurately and efficiently extract the features of SNP sites,and can improve the accuracy of SNP site-drug association prediction.Finally,a SNP site-drug association prediction model was constructed.In order to further improve the prediction accuracy,this article proposes a SNP site-drug association prediction model based on Stacking ensemble learning.The first layer model introduces four base classifiers(support vector machine,decision tree,random forest,XGBoost)for prediction;the second layer model uses logistic regression as a meta-classifier to train the predicted values obtained by the first layer model,building the Stacking integration model.The results show that,compared with the aforementioned five single classifiers,the stacking model constructed in this article can effectively improve the prediction accuracy of SNP site-drug associations,and has a higher reference value in practical applications.

Keywords/Search Tags:

SNP site-drug association prediction, K-mer, Molecular fingerprints, Denoising variational auto-encoder, Random forest, Stacking ensemble learning

PDF Full Text Request

Related items

1	Study On Drug Recommendation Based On Improved Random Forest Model
2	Research On MiRNA-disease Association Model Based On Autoencoder
3	Research On EEG Signal Recognition Method Based On Deep Stacking Network With Adaptive Learning Rate
4	ECG Signal De-noising And T Wave Detection Based On Deep Learning
5	Research On LncRNA-disease Association Prediction Method Based On Random Fores
6	Research And Implementation Of Ensemble Learning Methods In Cytotoxicity Prediction
7	Localization And Segmentation Of Two Kinds Of Medical Imaging Lesions Based On Variation Model And Supervised Learning
8	Research On ECG Signal Denoising And Heartbeat Retrieval Algorithm Based On Deep Learning
9	Research On The Application Value Of Stacking Architecture And Transfer Learning In The Prediction Model Of Infectious Diseases
10	Prediction Of Disease Indices Based On Ensemble Learning