| Genome-Wide Association Studies(GWAS)have become the essential means for studying complex traits and diseases in biological science,which mainly mark complex diseases using Sing Nucleotide Polymorphism(SNP).However,the statistical information of SNPs and linkage imbalance easily lead to the disclosure of individual private information because SNPs are closely related to sensitive information such as individual identity,phenotype,and blood relationship.However,the existing privacy protection studies mainly focus on single-order SNP detection.It adopts single methods such as anonymity,perturbation,or complex encryption,which cannot solve the complex and diverse privacy leakage problems of high-order SNPs detection.Moreover,it cannot balance the multi-dimensional performance requirements of high-order gene interaction detection,including data utility,efficiency,and security.Therefore,the single-order SNP privacy protection method is no longer applicable in the high-order GWAS security detection scenario.Therefore,effective methods that can systematically solve the security detection problems of different orders of GWAS are urgently needed.Traditional high-order GWAS methods mainly identify SNP-SNP pairs closely related to phenotype(2-order),with low association complexity and a transparent privacy disclosure path.However,3-order and above GWAS methods must comprehensively consider the marginal effect,SNP random distribution characteristics,time cost,and other factors to determine multiple SNPs candidate sets.As a result,the association complexity is high,and the risk of privacy disclosure is more complex and diverse.Furthermore,especially in the distributed scenario,the interactive process of dynamic parameter transmission of the federated detection further increases the uncertainty of privacy disclosure points,which is an NP-hard problem.Therefore,in order to systematically solve the problem of high-order GWAS security detection,this thesis systematically carries out GWAS privacy protection research from four progressive problems: 2-order,high-order,distributed high-order,and decentralized distributed high-order gene interaction detection,proposes optimal adaptive GWAS security detection methods in different scenarios,and constructs a GWAS security detection prototype system based on these methods.The main research contents are as follows:(1)To solve the insufficient problems of statistical accuracy and privacy protection of 2-order gene interaction caused by single fitness function and lack of privacy protection mechanism in GWAS,a secure detecting model of 2-order gene interaction based on multi-objective dynamic optimization is proposed(IPP).This model adopts a multi-fitness linear combination function to optimize the multi-objective dynamic detection process and constructs an adaptive differential privacy disturbance algorithm according to the multi-objective optimization characteristics to effectively improve the dynamic detection process’ s security.Design a global path selection probability function to avoid the detection process falling into local optimum and improve detection accuracy.Finally,the differential privacy mathematical proof verifies the IPP model’s security.The experimental results show that the IPP model can account for detection accuracy and security,and its comprehensive performance is better than other 2-order gene interaction models.(2)To address the problems of low accuracy caused by the uneven distribution of SNPs and complex marginal effect in high-order GWAS,we propose a deep layer-by-layer adaptive privacy protection model(Deep-DPGI)that is more consistent with the characteristics of high-order gene interaction.This model designs the composite loss functions to optimize the high-order gene interaction of the deep learning model to explain the disease pathogenesis better.Uses the forward and backward propagation algorithm to learn the weight parameters of the vector in the hidden layer;based on measuring the difference between the weight parameters and their outputs,adopt an adaptive correlation parameter disturbance method to ensure the privacy security of the detection process and results.Finally,the differential privacy mathematical proof verifies the security of the Deep-DPGI model.The experimental results show that the comprehensive performance of the Deep-DPGI model is much better than that of other high-order gene interaction detection models.(3)To address the problem of data utility and privacy performance balance caused by random disturbance of upload and download in the distributed high-order GWAS,we extend the Deep-DPGI model to distributed scenarios and propose a federal high-order gene interaction security detection model(Fed GI)based on Nash equilibrium and adaptive differential privacy mechanism.This model combines the limited competition mechanism of Nash equilibrium with differential privacy to construct an adaptive differential perturbation algorithm that can effectively balance utility and security.Furthermore,it transforms the sub-model filtering task into a multi-objective optimization task to reduce communication costs.The mathematical proof of the Fed GI security is consistent with that of the Deep-DPGI model and satisfies the differential privacy definition.The experimental results show that the comprehensive performance of the Fed GI model is also much better than other high-order gene interaction detection models.(4)To solve the problems of multi-privacy risk and inefficiency caused by the absence of fixed trusted central nodes in decentralized distributed high-order GWAS,we propose a decentralized,federated high-order gene interaction security detection model based on multiple joint verifications(Fed-MA).The model adopts a storage verification method based on automatic quality control and blockchain technology to improve the efficiency of distributed quality control and ensure the traceability of distributed terminal computing.Furthermore,a hierarchical multiple verification algorithm is designed for customer verification,parameter verification,and aggregation verification.Moreover,a differential privacy protection method based on sequence disturbance is used to ensure the credibility of joint detection terminals and the security of passing parameters.Finally,the security of the Fed-MA model is proved theoretically from the aspects of single parameter sequence intercepted,all parameter sequences intercepted and brute force cracking.The experimental results show that the comprehensive performance of the Fed-MA model is consistent with the Deep-DPGI model and is much better than other high-order gene interaction models.(5)Construct a secure GWAS detection system based on multi-index quantitative evaluation(Vis DP).Propose a Multi-index Quantitative Evaluation method(MQE)to adaptively perturb the identification results to ensure results security and usability.The differential privacy mathematical proof verifies the security of the MQE method.This thesis’ s GWAS security detection method is built into the platform with serial call interface schemes.The platform has been running in a medical university successfully for more than one year,and the results show that the Vis DP platform can provide more secure and practical functions.The research results in this thesis are expected to provide an essential theoretical,and scientific basis for the in-depth development of high-order gene interaction security detection in multiple scenarios and have essential academic value and application prospects.This thesis contains 38 figures,13 tables,and 166 references. |