| Complex diseases seriously endanger human health,and humans are eager to find the root cause of the pathogenesis of these diseases.Although different diseases have different gene mutation forms and causes of pathogenesis,it is certain that genetic changes will directly lead to changes in disease traits.Therefore,identifying genes related to disease traits and measuring the relationship between genes and disease traits is the key to understanding the pathogenesis of diseases,which is also one of the important research focuses of bioinformatics.Aiming at the biological characteristics of genetic data,this paper proposes a method to establish gene latent variables,and innovatively combines structural equation model with the identification of pathogenic genes,quantifies gene interactions,and identifies genes related to disease traits.Based on this,we perform a pan-cancer analysis to compensate for the shortcomings of single-cancer analysis.The specific studies in this paper includes:1.To address the problem that most of the previous gene-disease association studies only consider single genes but not the interaction between multiple genes,this paper proposes a association analysis method for multiple genes and disease traits based on structural equation model.The method quantifies the combined effects of multiple genes through gene latent variables,establishes the association model of multiple genes and disease traits and fits the parameters by maximum likelihood estimation,effectively verifies the gene interactions using various analytical methods such as correlation analysis and cox regression analysis,and identifies the genes most associated with disease traits.2.To address the problems of large amount of gene data,high computational complexity and the difficulty of creating latent variables in structural equation model due to the lack of priori knowledge,this paper proposes a method for constructing gene latent variables.The method uses differential expression analysis of genes to screen differentially expressed genes,clusters differentially expressed genes using hierarchical clustering methods,constructs gene latent variables for each clustering result by factor analysis,and finally determines the best model using BIC evaluation index.The method effectively constructs gene latent variables and plays an important role in the reasonable construction of gene and disease trait models.3.To address the problem that current cancer research still focuses on single cancer analysis,ignoring the specificity and commonality of interacting genes among multiple cancers,this paper selects multiple cancer data for pan-cancer analysis,verifies that genes from different cancers are involved in the same biological process by GO analysis and KEGG pathway analysis,proves that multiple genes are associated with cancer metastasis by cancer metastasis analysis,and uses protein Interaction network and survival analysis to demonstrate the existence of common gene functional groups associated with disease in different cancers.In this paper,we use real cancer gene data to identify 44 genes associated with cancer traits,verifies the gene interactions by comparing multiple sets of experimental analyses,and identifies key gene pathways and two important gene functional sub-modules,which are important for the treatment of complex diseases.The experimental results in this paper demonstrate that the structural equation model method is uniquely advantageous in quantifying the interaction of genes and can accurately measure the relationship between genes and between genes and disease traits,which can help to understand the pathogenesis of complex diseases and thus diagnose and treat complex diseases. |