Font Size: a A A

Deep Learning For Identification Of Pathogenic Genes In Complex Diseases

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q X YueFull Text:PDF
GTID:2370330614958613Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Complex disease is a kind of disease caused by multiple genes with complicated genetic mechanism.At the same time,it is also related to environmental factors,which is the result of the combined action of genes and environment.The identification of biomarkers is the focus of studying the pathogenesis,diagnosis and treatment of complex diseases.In recent years,deep learning has shown excellent performance on complex data sets.It provides the possibility of researching methods for finding biomarkers of complex diseases in the field of biomedicine.This study attempts to explore a method for screening disease-causing genes for complex diseases based on deep learning.The research content uses data from two complex diseases,bipolar disorder and type 2 diabetes.The first part is a disease classification study of complex diseases based on convolutional neural networks.We applied to download and obtain the SNPs needed in this study,and the original data was filtered and converted by GWAS to achieve BMP format image data that meets the input requirements of convolutional neural networks.Then we organize the processed image data into Case-Control data set in units of samples.We built separate convolutional neural networks for two complex diseases,and trained them multiple times to obtain better models for their respective disease classification tasks.The accuracy of the final training bipolar disorder disease classification model is 94.5%,and the accuracy of the type 2 diabetes disease classification model is 97.81%.The second part use Gradient-weighted Class Activation Mapping(Grad-CAM)to carry out an explanatory study on the trained model and select risk genes.We conducted an explanatory study on the disease classification models trained for two complex diseases to obtain the effect of each SNP in the classification task of the respective disease classification model.We count these results,set thresholds to screen out risk SNPs and match them to risk genes.Our explanatory study on the bipolar disorder disease classification model statistically screened 3372 SNPs,which matched 962 risk genes.Then we used GO and KEGG to analyze these risk genes.Our explanatory study on the type 2 diabetes disease classification model statistically screened 3782 SNPs,which matched 1473 risk genes.We also use GO and KEGG to analyze these risk genes.We searched for type 2 diabetes in the OMIM database and found that the disease-related entries contained 31 related genes.And six genes in the type 2 diabetes risk genes we screened were included in these 31 related genes.In this paper,we use deep learning as a tool to screen disease-causing genes as the goal to establish a disease classification model.Through interpretive research on the trained disease classification model to find risk genes,it provides a new idea for further research on disease biomarkers.
Keywords/Search Tags:complex disease, SNP, convolutional neural network, Grad-CAM
PDF Full Text Request
Related items