Font Size: a A A

Research On SNP-based Feature Selection And Diagnosis Model For Schizophrenia

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:B XingFull Text:PDF
GTID:2404330623479532Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Schizophrenia,as a group of severe mental disorders,will have a severe impact on personal social behavior,reality perception and socio-economic development.The uncertainty of the causative factors of this complex disease has greatly hindered its research,and Single Nucleotide Polymorphism(SNP)refers to the DNA diversity caused by the difference in bases between different people.SNP plays an important role in the identification of schizophrenic susceptible sites and the construction of disease diagnosis models.With the further maturity of machine learning and deep learning,more and more researchers try to mine genetic information from SNP data and build disease diagnosis models.This thesis takes SNP data related to schizophrenia as the object of research.First,the newly proposed K-MIGS / BH-PSO feature selection algorithm is used to generate the information SNP subset of the original SNP data set;the data designed a model based on the ternary partitioned deep belief network to realize the clinical diagnosis of schizophrenia patients.The specific research is as follows:(1)In order to solve the problem that the traditional clustering algorithm can not effectively mine the strong correlation between multiple sites in SNP,a new clustering algorithm-K-MIGS is proposed.On the one hand,the algorithm introduces mutual information and information entropy on the basis of the original K-Means clustering,and proposes a new similarity measurement method MIGS to measure the similarity between multiple SNP sites;on the other hand,it uses the new proposed SNP neighbor number is used to initialize the cluster center to solve the inefficiency problem brought by the traditional K-Means random initialization method.The K-MIGS algorithm solves the problem that the Euclidean metric method in traditional clustering cannot mine the correlation between multiple SNP sites and the efficiency is too low,and greatly reduces the dimension of the SNP data set.Finally,the particle swarm optimization algorithm is applied to the selection of information SNP.According to certain principles,the appropriate SNP is selected from each cluster to generate the final information SNP subset.The experiments of clustering and information SNP generation show that K-MIGS has better clustering effect and higher accuracy of SNP reconstruction than other methods,and uses support vector machines,random forests and neural networks to construct the classification experiments performed by the SNP subset show that comparing the same feature selection methods K-Means / PSO,KCenter / PSO and different feature selection methods ReliefF,MCMR,K-MIGS improves the classification accuracy by 3.25% ~ 6.35 %,which fully shows that the KMIGS clustering algorithm has a better effect in mining SNP deep information.(2)In order to solve the problem that the traditional particle swarm optimization algorithm selects a large number of SNPs and generates too slow convergence when generating the information SNP subset,an improved particle swarm optimization algorithm BH-PSO algorithm is proposed.This algorithm uses a new hybrid initialization to initialize particle swarm optimization,which can select fewer information SNPs and have a faster convergence speed.At the same time,a new particle update strategy is proposed for the traditional particle swarm update without considering the number of features,taking into account classification accuracy and selected information the number of SNPs.The experimental results show that,compared with the feature selection methods K-MIGS/PSO,K-MIGS/ACO,KMIGS/GA and ReliefF,MCMR,the information SNP subset generated by K-MIGS/BHPSO has a higher SNP reconstruction accuracy and fewer SNPs,and in the final classification experiment using support vector machines,random forests and neural networks on the generated information SNP subsets,the accuracy rate was improved by 2.18%~5.01%,which again explained K-MIGS/BH-PSO feature selection algorithm has a better role in the generation of information SNP.(3)In response to the problem that the traditional SNP coding mode cannot effectively use the upstream and downstream biological information of SNP data and the traditional deep learning classification model is applied to the diagnosis of schizophrenia,the accuracy rate cannot meet the requirements of auxiliary medical treatment.The following improvements are made to the original deep belief network: on the one hand,the traditional SNP data 0-1-2 encoding mode is replaced by the new Triad-SNP encoding.This encoding mode combines the SNP to be encoded and the upstream and downstream SNPs to form a ternary SNP encoding,so that its SNP sequence can be directly as an input to the subsequent depth model;on the other hand,make full use of the SNP cluster label information,divide the hidden layer of the original restricted Boltzmann machine into multiple regions,and establish a partitioned DBN network to learn the data belonging to each different cluster SNP data.The last two experiments show that the ternary partitioned deep belief network has significantly improved the diagnostic criteria of schizophrenia compared with other models,and the accuracy and F1 criteria have been improved by 6.30% ~ 7.49% and 7.27% ~ 12.51%,respectively.
Keywords/Search Tags:Schizophrenia, SNP, Feature selection, Clustering, Particle swarm optimization, Restricted Boltzmann machine, Deep belief network
PDF Full Text Request
Related items