| With the deepening of informatization in the field of brain neuroscience,multi-modal neuroimaging data and genetic data have increased dramatically.Integrating multi-modal data to analyze the pathogenic mechanism of mental illness and its use in precision medicine has become a hot spot in the global scientific and technological community,health community,and industry.It mainly uses the function or structure of the human brain as an internal phenotype to evaluate the impact of gene expression on individuals,so that people can explore the impact of gene expression on human behavior or mental illness from the macro level of the brain in a more objective way.Machine learning algorithms are one of the common methods for analyzing image genetics data.By establishing a sparse model,features are extracted from multi-modal image genetics data.The brain regions and genes corresponding to these features can be used to assist clinical diagnosis and treatment of diseases.However,it is still challenging to discover risk genes and abnormal brain regions related to mental illness(such as schizophrenia)from image genetic data with the characteristics of"high dimensionality,small sample".Therefore,it is very important to find a correlation analysis method that can extract significant features from multimodal image genetics data.This thesis mainly uses function magnetic resonance imaging(f MRI)data on Mind Clinical Imaging Consortium(MCIC)database as image phenotype data,single nucleotide polymorphism(SNP)data and DNA methylation data as genotype data to carry out imaging genetic data research of schizophrenia.We have built three mathematical models around the problem of"extracting risk genes related to schizophrenia,epigenetic factors and abnormal brain regions".Specific work and innovations are shown below.(1)Current developments of neuroimaging and genetics promote an integrative and compressive study of schizophrenia.However,it is still difficult to explore how gene mutations are related to brain abnormalities due to the high dimension but low sample size of these data.Conventional approaches reduce the dimension of dataset separately and then calculate the correlation,but ignore the effects of the response variables and the structure of data.To improve the identification of risk genes and abnormal brain regions on schizophrenia,in this paper,we propose a novel method called Independence and Structural sparsity Canonical Correlation Analysis(ISCCA).ISCCA combines independent component analysis(ICA)and Canonical Correlation Analysis(CCA)to reduce the collinear effects,which also incorporate graph structure of the data into the model to improve the accuracy of feature selection.The results from simulation studies demonstrate its higher accuracy in discovering correlations compared with other competing methods.Moreover,applying ISCCA to a real imaging genetics dataset collected by MCIC,a set of distinct gene-ROI interactions are identified,which are verified to be both statistically and biologically significant.(2)With the development of multi-model neuroimaging technology and gene detection technology,the efforts of integrating multi-model imaging genetics data to explore the virulence factors of schizophrenia are still limited.To address this issue,we propose a novel algorithm called group sparse of joint non-negative matrix factorization on orthogonal subspace(GJNMFO).Our algorithm fuses SNP data,f MRI data and epigenetic factors(DNA methylation)by projecting three-model data into a common basis matrix and three different coefficient matrices to identify risk genes,epigenetic factors and abnormal brain regions associated with schizophrenia.Specifically,we introduce orthogonal constraints on the basis matrix to discard unimportant features in the row of coefficient matrices.Since imaging genetics data have rich group information,we draw into group sparse on three coefficient matrices to make the extracted features more accurate.Both the simulated and real MCIC datasets are performed to validate our approach.Simulation results show that our algorithm works better than other competing methods.Through the experiments of MCIC datasets,GJNMFO reveals a set of risk genes,epigenetic factors and abnormal brain functional regions,which have been verified to be both statistically and biologically significant.(3)Schizophrenia is a complex mental illness,the mechanism of which is currently unclear.Using sparse representation and dictionary learning(SDL)model to analyze f MRI dataset of schizophrenia is currently a popular method for exploring the mechanism of the disease.The SDL method decomposed the f MRI data into a sparse coding matrix X and a dictionary matrix D.However,these traditional methods overlooked group structure information in X and the coherence between the atoms in D.To address this problem,we propose a new SDL model incorporating group sparsity and incoherence,namely GS2ISDL to detect abnormal brain regions.Specifically,GS2ISDL uses the group structure information that defined by AAL anatomical template from f MRI dataset as priori to achieve inter-group sparsity in X.At the same time,1L norm is enforced on X to achieve intra-group sparsity.In addition,our algorithm also imposes incoherent constraint on the dictionary matrix D to reduce the coherence between the atoms in D,which can ensure the uniqueness of X and the discriminability of the atoms.To validate our proposed model GS2ISDL,we compared it with both IK-SVD and SDL algorithm for analyzing f MRI dataset collected by MCIC.The results show that the accuracy,sensitivity,recall and MCC values of GS2ISDL are 93.75%,94.23%,80.50%and 88.19%,respectively,which outperforms both IK-SVD and SDL.Compared with the results obtained by IK-SVD algorithm,the accuracy,sensitivity,recall and MCC values obtained by GS2ISDL algorithm are improved by 5.5%,8.51%,5.28%and 9.06%,respectively.Compared with the results obtained by SDL algorithm,the accuracy,sensitivity,recall and MCC value obtained by GS2ISDL algorithm are improved by 6.24%,13.52%,7.65%,and 10.73%,respectively.Moreover,the ROIs extracted by GS2ISDL model(such as Precentral gyrus,Hippocampus and Caudate nucleus,etc.)are further verified by the literature review on schizophrenia studies,which have significant biological significance.In this paper,three mathematical models are constructed to study the information extraction of multi-modal imaging genetics data of schizophrenia.Some risk genes,epigenetic factors and abnormal brain regions related to schizophrenia have been identified,which provide a new theoretical basis for the prevention,diagnosis and treatment of mental illness. |