Font Size: a A A

Prediction Of Autism Spectrum Disorders Risk Gene Based On Convolutional Neural Network

Posted on:2024-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:H XiongFull Text:PDF
GTID:2544307157483324Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Autism spectrum disorder is a neurodevelopmental disorder.Genetics has a great influence on autism spectrum disorder,and the disease has the characteristics of early onset and obvious symptoms.Therefore,the earlier the diagnosis,the faster the treatment,the better.The currently known risk gene for autism spectrum disorder can encode proteins.Traditional wet experiments are based on sequencing technology to complete the identification of risk gene,but it takes a lot of time and effort,and is expensive.Therefore,based on genetic information,including gene expression values and RNA transcript sequences,we conduct research on the prediction of risk gene related to autism spectrum disorders with convolutional neural network.The main research contents are as follows:(1)Gene expression values are rich in genetic information and can characterize the performance of an individual or the extent of a gene mutation,and thus are often used in disease-related prediction studies.In this study,gene expression values of the human brain development transcriptome are used as the benchmark dataset.Then four methods,autoencoder,principal component analysis,singular value decomposition,and non-negative matrix decomposition,are used to extract features from gene expression values and Boruta method is used for feature selection.After obtaining the optimal feature set,a convolutional neural network model is constructed to train the feature data and compared with various traditional machine learning algorithms.Based on this,we propose a new method MCASDPred for the prediction of autism spectrum disorder risk gene based on multiple unsupervised feature extraction methods and convolutional neural network.the model are evaluated using 50 times ten-fold cross-validation and obtaines the accuracy is 0.856.By comparing with existing methods,the result indicates that MCASDPred is an improvement in the task of predicting risk gene associated with autism spectrum disorders.(2)RNA plays an important role in the translation of biological genetic information,and RNA modifications are associated with a variety of diseases in organisms.In this study,RNA transcript sequences are added to the gene expression values together to form a benchmark dataset,and four feature extraction methods(one-hot coding,K-mer,pseudo nucleotide component,and Hilbert curve coding)are tried in the process of feature extraction for RNA sequences.Subsequently,chi-quare test combined with logistic regression is adopted to select the effective features and obtain the best feature subset.The deep learning model is compared with three traditional machine learning algorithms when training based on the optimal feature subset.Finally,we propose a new method called Deep ASDPred for the prediction of autism spectrum disorder risk gene,which is based on convolutional neural network and long-short memory network.After 50 times ten-fold cross-validation,the accuracy of this model on the benchmark dataset is 0.937.The results shows that Deep ASDPred is effective in predicting autism spectrum disorder-related risk gene task significantly outperformed existing methods.
Keywords/Search Tags:autism spectrum disorder, autoencoder, Boruta feature selection, convolutional neural network, long short-term memory
PDF Full Text Request
Related items