Longitudinal data widely appears in various scientific fields and there are a large number of research methods.It is generated by repeated measurements of the response variable of each subject at several time points,and can describe the trend of individual response variables over time.For heterogeneous longitudinal data,cluster analysis is an effective tool to characterize differences between individuals.This article first introduces the use of Gaussian mixture model for cluster analysis of longitudinal data without considering covariates when there are fewer repeated measurement time points.When there are more repeated measurement time points,the dimensionality of the longitudinal data will also increase.Due to the correlation between the same individual at different time points,the dimensionality of the covariance matrix will also increase,and the number of parameters will increase sharply,which is a great challenge for cluster analysis.For this reason,we consider using hybrid The factor analysis model is used for cluster analysis.In addition,covariates are very important factors in cluster analysis,which can describe the specific situation of subgroup means.Therefore,this paper proposes the Mixture of Factor analyzers Linear Model with Common Factor Loadings(MCFLM).This model is a combination of a mixed factor analysis model with a common load matrix and a multivariate linear model.Under this model framework,high-dimensional repeated measurements are reduced to low-dimensional potential factors through the mixed factor analysis model,and the multivariate linear model depicts The relationship between the factors and covariates of each subgroup.On the other hand,this paper applies the modified Cholesky decomposition to ensure the positive definiteness of the covariance matrix,and uses the EM algorithm to estimate the parameters.Finally,the Bayesian information criterion is used to select the most appropriate In order to prove the effectiveness of this method,a numerical simulation study was carried out,and finally a set of yeast cell gene expression data was used to verify the feasibility of the method proposed in this article. |