Background:It is of great significance to model longitudinal data from the medical field,which are beneficial to understanding the trend characteristics of health and disease status over time among population and analyzing individual differences in depth in order to providing comprehensive information for the exploration of influencing factors and the verification of intervention effects.Latent class growth mixture modeling(LCGMM),commonly known as growth mixture modeling(GMM),has received widespread attention due to its excellent performance in identifying the latent heterogeneity of longitudinal trajectories.Compared with the traditional frequency estimation framework,Bayesian framework encounter fewer convergence issues when implementing estimates for complex models such as GMM and can obtain better parameter estimates in small sample situations.However,the application of Bayesian GMM in practical research is quite limited.GMM divides the population into different latent classes to model the heterogeneity of longitudinal trajectories,hence determining the number of latent class trajectories is the key to GMM analysis and this process is usually called class enumeration.Factors affecting the enumeration accuracy mainly include the model selection method used and whether the sample data meets the assumption of GMM.At present,the optimal enumeration index and selection strategy of Bayesian GMM are not clear,which greatly hinders its promotion and application.In addition to obtaining the appropriate class enumeration method,the distribution of data within the latent classes will also affect the accuracy of GMM results.GMM holds the basic assumption that the intra-class data follows the multivariate normal(MN)distribution,which is usually violated in real world data,resulting in over-extraction of latent classes and bias in parameter estimation.Constructing Bayesian GMM through flexible multivariate distribution can help to relieve the adverse impact of assumption violation on GMM results and improve the robustness of model estimation so that the accuracy and rationality of the analysis conclusions of heterogeneous trajectories under Bayesian framework can be ensured.Objective:(1)To explore the optimal enumeration index and model selection strategy of Bayesian GMM to provide methodological reference for determining the number of latent trajectories in relevant research practice.(2)Under the framework of Bayesian estimation,establishing GMM which does not rely on the assumption of intra-class multivariate normality and exploring whether the model performance is robust when the assumption is violated,promoting the standardized application of trajectory analysis.(3)Utilizing the optimal enumeration method and robust GMM in case study so as to verify the applicability and effectiveness and provide reference for analysis ideas and modeling procedure.Methods:(1)Monte Carlo simulation was used to generate data of linear GMM with two latent classes.The simulation design factors included two levels of sample size(300and 500),two levels of mixing proportion(50%/50% and 75%/25%),and two levels of class separation,which were defined by Mahalanobis distance(MD)and had two levels of 2.73 and 3.16.A total of 8 simulation scenarios was obtained by the cross combination of these three factors.Bayesian GMM with 1,2,and 3 latent classes were fitted to each simulated data set and the number of trajectories was determined by various class enumeration methods followed by the comparison of accuracy.The general process of class enumeration in GMM consisted of calculating the goodness-of-fit index based on likelihood value of individual’s data,comparing the index values of models with distinct number of trajectories under the specific model selection strategy,and consequently determining the appropriate class number.Therefore,the simulation study involved four model selection strategies,including minimum value(MV)strategy and three absolute difference(AD)strategies with various truncation value of the difference(AD-3,AD-7,and AD-10);two methods of calculating likelihood value including marginal likelihood and conditional likelihood;three categories and six kinds of specific indexes including two kinds of deviation information criteria(DIC)with different penalty items,two kinds of Watanabe-Akaike information criterion(WAIC)with different penalty items,and two kinds of approximate leave-one-out cross validation(LOO-CV)relied on importance sampling(IS)and Pareto smoothed importance sampling(PSIS)respectively.(2)The multivariate skew-normal(MSN)distribution was applied to constructing Bayesian GMM(referred as SN-GMM),whose model performance was compared with traditional Bayesian GMM based on multivariate normal distribution(referred as N-GMM).Monte Carlo simulation was used to generate data of linear GMM with two latent trajectory classes.The simulation settings considered 32 different scenarios that varied in sample size(300 and 500),mixing proportion(50%/50% and 75%/25%),class separation(MD=2.73 and MD=3.16),and distribution of intra-class data(normal,low skewness,and high skewness).Bayesian SN-GMM and Bayesian N-GMM with 1,2,and 3 latent classes were fitted to each simulated data set.The optimal model selection strategy and index determined in(1)were used to perform class enumeration and the accuracy rate of the two models to correctly identify the number of trajectories were compared.In addition,the difference of goodness-of-fit to simulated data between two models with consistent number of trajectories were investigated.Furthermore,on the premise of correct class enumeration,the relative bias(RB),average posterior standard deviation(ASD),empirical standard deviation(ESD),root mean squared error(RMSE),and 95%credible interval coverage probability(CP)were calculated to evaluate the performance of parameter estimation of the two models.(3)Utilizing longitudinal data from the Chinese Health and Retirement Longitudinal Study(CHARLS),the Bayesian SN-GMM was conducted to examine the heterogeneity of depressive symptoms trajectories among Chinese middle-aged and older adults and model results were compared with those of Bayesian N-GMM.Moreover,linear mixed effects models were performed to explore the associations between trajectories of depressive symptoms and rate of decline in two cognitive dimensions,episodic memory and executive function.Results:(1)Regardless of the model selection strategy,the indexes based on marginal likelihood demonstrated superior class enumeration accuracy compared with their corresponding conditional version.The AD strategy did not show apparent advantage over MV strategy in Bayesian GMM class enumeration,especially in scenarios with lower class separation,under which the enumeration accuracy rate of AD-3,AD-7and AD-10 strategies gradually decreased and the probability of selecting single-class model increased.Among the marginal indexes under the MV strategy,the enumeration accuracy of two categories of full Bayesian indicators,WAIC and LOO-CV,were basically accordant and reached more than 80%,higher than that of partial Bayesian indicator DIC.The accuracy rate of class enumeration of marginal WAIC and LOO-CV was jointly affected by sample size,mixing proportion,and class separation.In the scenarios with lower class separation and smaller sample size,the accuracy of indexes was relatively poor and the single-class model tended to be selected as the optimal model.Increasing sample size or class separation could improve the enumeration accuracy of indexes.Besides,when the class separation and sample size were lower,the enumeration accuracy rate in the scenarios with unbalanced mixing proportion was slightly higher than that in the scenarios with balanced mixing proportion.Nevertheless,the apparent difference between the two mixing proportions disappeared with the increasement of class separation or sample size.(2)The Bayesian SN-GMM was constructed by manipulating the residual items within each latent class as MSN distribution.In the scenarios that data met the intra-class multivariate normality assumption,the performance of Bayesian SN-GMM in class enumeration accuracy,goodness-of-fit,and parameter estimation was comparable with Bayesian N-GMM.When the intra-class skewness occurred,Bayesian SN-GMM could still maintain an enumeration accuracy rate of not less than 80% in variant scenarios,which was much higher than that of Bayesian N-GMM.The values of marginal WAIC and LOO-CV of Bayesian SN-GMM were lower than those of Bayesian N-GMM with the same number of trajectories,indicating that Bayesian SN-GMM fitted the data better.Under the setting that both the two models correctly identified the number of trajectories,the RB,ASD,ESD,and RMSE of Bayesian SN-GMM were generally lower than those of Bayesian N-GMM,while the CP was generally higher than that of Bayesian N-GMM.In the scenarios with intra-class multivariate normality satisfied,the impacts of sample size,mixing proportion,and class separation on the enumeration accuracy of Bayesian SN-GMM and Bayesian N-GMM were consistent with the results from(1).As to parameter estimation,increasing sample size or degree of class separation were linked to lower RB,ASD,ESD,and RMSE as well as higher CP.The parameter estimation performance of minority class accounting for 25% in the scenarios with unbalanced mixing proportion was worse in comparison with the scenarios that the mixing proportion was balanced,while the performance of majority class accounting for 75% was better.In the scenarios that data violated the intra-class multivariate normality,the increasement of sample size contributed to improved enumeration accuracy and parameter estimation of Bayesian SN-GMM but did harm to the model performance in class enumeration and parameter estimation of Bayesian N-GMM.Additionally,the divergence of goodness-of-fit between two models with the same number of trajectories became greater.The influence of mixing proportion and class separation on model performance was relatively reduced.(3)Heterogeneity existed in the longitudinal trajectories of depressive symptoms among Chinese middle-aged and older adults.Bayesian N-GMM and Bayesian SN-GMM identified five and four latent trajectory classes,respectively.The four-class SN-GMM illustrated superiority in goodness-of-fit compared with the five-class N-GMM and the differences of population characteristics among distinct trajectory classes obtained by SN-GMM were more obvious.According to the results from Bayesian SN-GMM,the longitudinal pattern of depressive trajectories could be classified into no depressive symptoms class(9.5%),low depressive symptoms class(33.5%),medium depressive symptoms class(42.9%),and high and significant depressive symptoms class(14.1%).Compared with the no depressive symptoms class,other trajectory classes demonstrated gradient increase in the rate of decline in episodic memory,while significant difference in the rate of decline in executive function was not observed among distinct trajectory classes.Conclusion:(1)Two categories of full Bayesian indexes,WAIC and LOO-CV based on MV strategy are recommended to conduct class enumeration in Bayesian GMM.(2)Violation of intra-class multivariate normality in longitudinal data has serious impacts on the results of Bayesian N-GMM.The application of Bayesian SN-GMM is capable of improvement in class enumeration accuracy and parameter estimation performance and is able to achieve preferable goodness-of-fit of data.(3)Bayesian SN-GMM can be effectively applied to research of real-world longitudinal data.There is a dimensional disparity in the association between depressive trajectories and rate of cognitive decline among Chinese middle-aged and older adults.Interventions targeting to alleviating cognitive decline are proposed to be given priority among middle-aged and older adults showcasing higher level or deterioration of depressive symptoms. |