| Genome-Wide Association Study(GWAS)can detect genes related to complex diseases,which is a very important research topic at present;With the development of science and technology,there are often a lot of genetic data in GWAS,and there are often cases where the dimensions of genetic variables are much larger than the number of samples,the effect of traditional variable selection methods with penalty for ultrahigh dimensional data will be greatly reduced;The huge and complex genetic data often has heterogeneity,they come from different types or groups,the finite mixture of regression models are considered to deal with;At the same time,genetic model is usually unknown,the wrong assumption of genetic models may lead to the failure to identify the real pathogenic genes.For ultrahigh dimensional data,existing processing methods include sure independence screening(SIS),conditional sure independence screening(CSIS)and other feature screening methods,few articles have applied them to the finite mixture of semi-parametric regression models,for the characteristics of genetic data described earlier,the finite mixture of semi-parametric regression models for ultrahigh dimensional are established,considering the uncertainty of genetic models,at the same time,considering the impact of non-genetic variables such as height and weight on traits,using non-parametric models to describe them;On the basis of conditional sure independence screening,a more extensive application is carried out,we first use conditional sure independent screening under the finite mixture of regression models to reduce the dimensions and then variable selection methods with penalty are used to select the final pathogenic gene.Under some given conditions,the parameter estimators have good statistical properties,and the proofs of relevant theorems are given,the results of numerical simulation also confirm the advantages of the method described in selecting important variables. |