Font Size: a A A

The Study And Application Of Multilevel Propensity Score Model Of Categorical Data In The Hierarchical Structure Data

Posted on:2017-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:F F YuFull Text:PDF
GTID:1224330485982886Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Propensity Score(PS) is an effective method to control the observed confounding factors in recent years. The method can be used to control selection bias in observational studies, which refers to the conditional probability of individuals being allocated to the treatment group in the presence of covariates. The method is mainly divided into two steps. The first step is using the covariates and the treatment factor to construct a model whichi aims to estimate the propensity score value. The second step is the using the estimated propensity score values to construct a model of treatment factor and outcome variables which can get the treatment effect. The method is widely applied by more and more researchers in recent years, but for large-scale multilevel hierarchical structure of the data,such as national health services survey, researches and applications of this method in the data is still relatively rare. Only in the field of education and economy, the propensity score in multilevel data have been applicated. In particular, the existing studies do not contain the situation of the treatment/exposure factor is an ordered or multi categorical variable. At the same time, the existing studies about the treatment/exposure factors of binary variables, is limited in the use of traditional logistic regression and other methods to estimate the propensity score value.Using the boosting algorithm to estimate the propensity score in multilevel data is rarely reported. Aim:This study aims at reducing confounding bias in the data of multilevel structure,like health and medical big data, by exploring and improving the existing multilevel propensity scoring model for binary exposure factors, and constructing the generalized multilevel propensity score model in the exposure factors for disordered multiple categorical and ordered categorical data(mainly in three categories, for example). After constructing the models, this study would use different calculation method of propensity score to estimate the treatment effect in different situation, so as to find out the different conditions of the optimal model. And the model is applied to the data of the Fifth National Health Service Survey(Shanghai District). Methods:(1) Firstly, we performed a simulation study. The simulation study was divided into three parts according to the different types of treatment/exposure factors,such as binary variable, disordered multiple categorical and ordered categorical data. In each part, we also considered different situation according to the complication extent of data. The sample sizes were set up in ten situations, which were 3×30,3×50,3×100,3×200,3×600,20×30,20×50,20×100,20×200 and 200×30. Six covariates in level one and one continuous covariate in level two were simulated.(2) model buildingIn each situation, 13 models were constructed to estimate the treatment/exposure effect, which contained six propensity score estimation methods and two treatment effect estimation methods, the last one was the traditional multilevel model. The six propensity score estimation methods were logistic regression model, logistic regression model including level variable, multilevel 1-level random effect regression model,multilevel 2-level random effect regression model, Boosted regression model and boosted regression model including level variable. The two treatment effect estimation methods were multilevel propensity score weighted regression model and multilevel propensity score covariate adjustment regression model.(3) model evaluationThe standard error, absolute bias and 95% confidence interval coverage were used to evaluate the accuracy and precision of different models we proposed.(4) example analysisThis study took the fifth National Health Services Survey in Shanghai district as an data resource of example analysis. Exposure/treatment factors of binary variables was concentrated in smoking status of old men more than 60 years and their risk of chronic disease; for exposure/treatment factors of disordered multiple categorical variables, the analysis was for 28-44 years old people(older youth), the relationship between their marital status and self-rated health; for exposure/treatment factors for ordinal categorical variables, the association between body mass index and risk of hypertension in residents in Shanghai was detected. Results:(1) treatment/exposure factors is binary variableWhen data structure is only a random intercept or random coefficients, results of the multilevel boosting propensity score adjustment model incorporating level variabel are reliable; when the first and second level covariate interaction were observed in data and the sample size is 3×30, multilevel 2-level random effect propensity score adjustment model could get the highest accuracy and precision; if the sample size is 3×50, multilevel logistic regression propensity score adjustment method precision considering the level variable is more accurate, and the sample size is 3×100,3×200 and 3×600, using multilevel logistic propensity score weighting method could estimate treatment effect which is more closed to the true value. When size is fixed to 20, all kinds of methods in the complex data structure are not stable, especially in the sample size of 20×200. When the sample size is large enough, namely 200×30 and various multilevel propensity score adjustment model can be more steady and reliable.(2) treatment/exposure factors is Disordered multiple categorical variableAs the data structures become more complex, the absolute bias of the results of various methods is increasing. But when size is small(size=3) and data structure only for random intercept), and multilevel propensity score weighting method is better than traditional multilevel model and multilevel propensity score adjustment model. Among them, especially in multilevel 1-level random effect propensity score weighting and multi-level boosting propensity score weighted model considering the level variable performed more stably. When the sample size is 3×30, various methods are not very accurate, sometimes one coefficient estimation is accurate but the other coefficient estimation value has great error. And when the size is 20, no matter in simple or complex data structures, and whether interaction between first and second level variables exist, multilevel multinomial propensity score adjustment models obtained more accurate and precise results and are better than the weighted model.(3) treatment/exposure factors is ordered catogorical variableIn the situation of small sample size, and second level units within the individual number is 3, multilevel generalized propensity score weighted models are better, especially the multilevel multinomial boosting propensity score weighted model and multilevel multinomial 1-level random effects propensity score weighting model. When the sample size was increased to 2000 and above, the multilevel generalized propensity score adjustment model is adopted, and different models of the estimating propensity score method have little difference. But for the large sample size and the data structure with a variety of interaction terms, no matter the weighted model or adjusted model, as well as the exposure effect estimated by the simple multilevel cumulative logistic model, their reliabilities need to be improved.(4) Example analysisFor the example of exposure factors are binary variables, the coefficient estimated by traditional multilevel model is-0.1511 and odds ratio(OR) was 0.86, which means that for people older than 60, smokers have lower risk to get chronic disease than non-smokers. However, by using multilevel Boosting propensity score model, the result did not show any statistical difference, which is closer to the truth. For exposure factors for disordered categorical variables of the instance analysis, results show that unmarried people are more likely to have better self-rated health status(OR=1.60, p=0.0006), and for divorced or widowed population, adjustment method and simple model level have not concluded a significant difference(p=0.7310). As the analysis of the exposure factor is ordinal variables, the relationship between BMI and the risk of high blood pressure in residents were positively correlated(OR=3.00,p<0.0001). Conclusions:In different situation, different models perform differently and no methods could always get best results. 1) When the sample size is 3×30,3×50, 3×100, and there is no interaction between covariates, the multilevel propensity score weighting models were good choices; 2) When the sample size is 3×200, 3×600,20×100, 20×200, and the variable interaction effect is not obvious, it is recommended to use multilevel propensity score adjustment methods; 3) When the sample size is 20 ×200 or 200 × 30 or even more, if the data structure is relatively simple, both of multilevel propensity score adjustment model and traditional multilevel model can be used to get accurate results; but if complex interactions or random coefficients existed between variables, especially for the exposure factors is multinomial, the reliability of various methods is questionable; 4) The results of multiple treatment data showed that using the multilevel 2-level random effect model to estimate propensity score could get non convergence results, and sometimes the models can estimate only one treatment effect value accurately with the other not accurate.5) Boosting algorithm has some advantages in the estimation of the multilevel propensity score model, but it is not applicable in any case. In practical application, the method should be chosen according to the different circumstances of the data.
Keywords/Search Tags:multilevel model, propensity score, boosting algorithm, hierarchical data, categorical data
PDF Full Text Request
Related items