| The most important traits of animals and plants are quantitative traits.Genome-wide association studies(GWAS)are the main way to analyze the genetic basis of quantitative traits.However,the population structure of the associated population will lead to a false association between markers and complex traits.At present,the commonly used population structure matrix is the probability matrix of each individual belonging to a subgroup and principal component score matrix.Recently,the newly proposed evolutionary population structure matrix is a population structure based on the division of evolutionary types,which can classify individual clearly.As a widely used classification method,clustering analysis has not been used to measure the population structure of associated population.Therefore,it is necessary to explore the application of clustering analysis on population structure control of genome-wide association studies.To explore the possibility of cluster analysis in the control of population structure in association studies,this study uses the molecular marker information of associated population,selects hierarchical clustering,k-means clustering and sparse subspace clustering methods calculate the population structure of associated population,compares them with the commonly used association population structure methods,and obtains the conclusion that cluster analysis can be used to control the population structure of association studies through real rice data analysis and simulation experiments.The main results are as follows:1)Firstly,to determine the optimal number of clusters.The actual rice data set was analyzed by FASTmr MLM and FASTmr EMMA methods.Through the above four cluster analysis methods,individuals are divided into several clusters for population structure control of association studies.This is called the clustering population structure.Under different cluster numbers,the number of significant QTNs and the number of known genes related to traits near QTNs were compared.The optimal cluster number of different clustering methods is obtained.To verify the effectiveness of cluster population structure,the FASTmr MLM method was used to detect the seed width traits of rice.In no population structure NULL,principal component population structure PC,Q matrix population structure Admixture and the above four cluster population structures En SC,SSC_OMP,Hclust and Kmeans There were 26,12,12,17,22,18 and 22 significant QTNs detected.9,6,7,11,11,11 and10 known genes near QTNs were found respectively.In FASTmr EMMA,10,7,4,8,11,10 and 10 significant QTNs were detected under seven population structures,respectively.2,2,2,6,7,7 and 7 known genes near significant QTNs were found respectively.This indicates that more QTNs and known genes are detected in the cluster population structure than in the common population structure.Although fewer QTNs were found than without population structure,more known genes were found.Therefore,the results of cluster population structure association studies are the best.2)Monte Carlo simulation was carried out.The simulated data sets were analyzed by FASTmr MLM and FASTmr EMMA methods.The effects of different population structures on the results of association studies were compared from three aspects: QTN detection power,QTN effect estimation accuracy and false positive rate.The results show that in the FASTmr MLM method,the average detection power of the Hclust cluster population structure is higher than that of the PC population structure,and the MSE of the estimated value of average QTN effect and false positive rate are lower;the detection power of SSC_OMP cluster population structure is higher than that of Admixture,and the MSE and false positive rate are lower.There are similar conclusions in the FASTmr EMMA method.Therefore,the simulation study shows that the effect of using clustering population structure in multi-locus association studies is better.This study provides a clustering algorithm to calculate the population structure for genome-wide association studies,which can improve the power of QTN detection and better control the false positive rate. |