Font Size: a A A

Modeling Of Stabilized Weight Based Targeted Maximum Likelihood Estimation And Its Application In Real-World Studies

Posted on:2021-05-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H D HanFull Text:PDF
GTID:1364330602476664Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Randomized controlled trials(RCT)are widely recognized as the golden standard for causal inference,which ensure comparability between groups through randomization.In rencet years,real-world studies have received much attention.Together with RCT,they aimed to provide scientific evidence to guide clinical practice and medical decision-making.However,in real-world studies,baseline covariates between groups usually distributed unbalancedly.This unbalance affected the propensity of treatment assignment to a certain group,causing confounding in the causal pathway of treatment/exposure and outcomes.In January 2020,the National Medical Products Administration issued a trial version of guideline for the supporting of real-world evidence in drug discovery and evaluation.The guideline emphasized that causal inference methods are the key methods to acquire unbiased estimates in observational real-words studies.A class of methods based on inverse probability treatment weight(IPTW)are important methods for treatment effect estimation which included inverse probability treatment weighting,weighted least square(WLS),augmented inverse probability treatment weighting(AIPTW)and targeted maximum likelihood estimation(TMLE).The latter three are double robust estimators.All the above methods for causal inference are based on the Neyman-Rubin potential outcome model and need to meet with several basic assumptions: strong ignorable treatment assignment(no unmeasured confounders),positivity,stable unit treatment value assumption and correct model specification.Nevertheless,when the propensity score is too large or too small,extreme IPTW occurs and results in the violation or near violation of positivity assumption.The presence of extreme IPTW can directly affect bias and variance of the effect estimates.Therefore,IPTW is a kind of unstable propensity score weight.In rencet years,several stable propensity score weights including truncated IPTW are proposed,and along with IPTW,are defined as a general class of weighs—the balancing weights.In real-world studies,the propensity score/treatment models are susceptible to misspecification due to complicated relationship among variables.Once misspecification,the effect estimates might be seriously biased.For the same reason,outcome models like G-computation are also at risk of misspecification.Double robust estimators provide a remedial measure for parametric model misspecification.If either the treatment model or the outcome model is correctly specified,the double robust estimator is consistent.TMLE is a semi-parametric maximum-likelihood-based double robust estimator that optimizes the bias-variance tradeoff for the interested parameter through the update of initial estimates and sacrifice of bias and variance of nuisance parameters using IPTW.TMLE has several advantages over exsisting methods.For example,the nature of substitution estimator makes TMLE more robust than other double robust estimators under certain outliers and sparity.However,some researchers raisied an issue that extreme IPTW might worse the performance of TMLE,and suggested that truncated propensity score or IPTW could serve as a solution.In fact,this approach changed the original data structure and its validity need to be assessed.Given that,constructing more stable TMLE perhaps can increase the accuracy of effect estimation under extreme propensity score.Further,it is of potential theoretical and practical value to evaluate whether stablized TMLEs have the property of double robustness.Objective:1.This study aimed to establish nine propensity score weights based estimators including prospensity score weighting(PSW),WLS,augmented propensity score weighting(APSW)and TMLE through Monte Carlo simulation and empirical studies,and compare them with unadjusted and G-computation estimates under different scenarios that varied in prevalence of treatment,sample size and degree of overlap,to evaluate the relative performance of stablized TMLEs.In addition,we explored the effects of various levels of weight truncation on effect estimation of standard TMLE.2.Through Monte Carlo simulation and empirical studies,this part aimed to establish nine propensity score wights based estimators including PSW,WLS,APSW and TMLE,and compare them with unadjusted and G-computation estimates under different scenarios that varied in prevalence of treatment,sample siz,degree of overlap and four types of model specification(both models correctly specified,treatment model misspecified and outcome model correctly specified,treatment model correctly specified and outcome model misspecified,both models misspecified),to evaluate the double robustness properties of stablized TMLEs.Method:For the two objectives above,we implemented the research through the following four processes: data simulation,model construction,model evaluation and practical application.1.The relative performance of stabilized TMLEs under different degrees of propensity score overlapMonte Carlo simulation was performed to mimic observational real-world data with binary treatment and continuous outcome.The simulation settings considered different scenarios that varied in prevalence of treatment(0.4 and 0.1),sample size(n=250,1000 and 2500),and degree of overlap(?=0.3,0.5,0.8,1.0,1.5 and 2.0).We established IPTW,IPTW truncated at 1%,5% and 10%,marginal probability adjusted IPTW(MPIPTW),normalized IPTW(NIPTW),shrunken IPTW(SHIPTW),overlap weight(OW)and matching weight(MW)based estimators including PSW,WLS,APSW and TMLE,and compare them with unadjusted and G-computation estimates to explore the relative performance of stabilized TMLEs under different degrees of propensity score overlap.Distribution of nine balancing weights under these scenarios was examined.The effects of various levels of weight truncation on effect estimation of standard TMLE were also explored.Nonparametric bootstrap resampling method was used to estimate standard error(SE)for APSW and TMLE estimators.Overlapping coefficient(OVL),weighted average absolute standardized mean difference(WAASMD),absolute bias,relative bias,root mean squared error(RMSE),SE,standard deviation(SD)and 95% confidence interval(CI)coverage were employed to assess the relative performance of stabilized TMLEs.Finially,based on hospitalized database,we compared the difference in length of stay between robot-assisted laparoscopic radical prostatectomy(RALRP)and open radical prostatectomy(ORP)to assess the practicality of stablized TMLEs.2.Research on the double robustness properties of stablized TMLEsMonte Carlo simulation was performed to mimic observational real-world data with binary treatment and continuous outcome.The simulation settings considered different scenarios that varied in prevalence of treatment(0.4 and 0.1),sample size(n=250,1000 and 2500),and degree of overlap(?=0.3 and 2.0).We established IPTW,IPTW truncated at 1%,5% and 10%,MPIPTW,NIPTW,SHIPTW,OW and MW based estimators including PSW,WLS,APSW and TMLE,and compare them with unadjusted and G-computation estimates to explore the double robustness properties of stabilized TMLEs under four types of model specification: both models correctly specified(Qcgc),treatment model misspecified and outcome model correctly specified(Qcgw),treatment model correctly specified and outcome model misspecified(Qwgc),both models misspecified(Qwgw).Distribution of nine balancing weights under correctly specified and misspecified treatment models in these scenarios was examined.Nonparametric bootstrap resampling method was used to estimate SE for APSW and TMLE estimators.OVL,WAASMD,absolute bias,relative bias,RMSE,SE,SD and 95% CI coverage were employed to assess the relative performance of stabilized TMLEs towards model misspecification.Finally,based on longitudinal follow-up data from the Chinese Longitudinal Health Longevity Survey,we explored the relationship between activities of daily living(ADL)disability and cognitive function decline to assess the double robustness properties of stablized TMLEs in an empirical study.All analyses were performed using statistical analysis software R version 3.5.2.Results:1.The relative performance of stabilized TMLEs under different degrees of propensity score overlap(1)Simulation study results(a)As propensity score overlap between groups worsened,OVL gradually decreased,WAASMD gradually increased,extreme values and variability of IPTW-related weight also increased,and the performance of all methods including TMLE became worse and worse.(b)Compared with results under treatment prevalence of 0.4,results under 0.1 suggested smaller OVL,larger WAASMD,more extreme weights and worse performance of TMLE.(c)Weight truncation in standard TMLE showed that with the increasing level of truncation,the bias of the effect estimation tended to increase,the variance gradually decreased and the overall performance became better.Specifically,SD,SE and RMSE gradually decreased.(d)When the overlap was good,standard TMLE performed well;when the overlap is bad,all measurements of standard TMLE performed the worst.In addition,all stabilized TMLE estimators from IPTW-related weights did not performed well,either.(e)In terms of bias,when the overlap was good,standard TMLE performed as well as stabilized TMLEs.And there were no significant differences between standard and stabilized TMLE for other measurments;when the overlap was bad,OW and MW based TMLE had the smallest bias among eight stabilized TMLE.(f)MPIPTW and NIPTW based TMLE had the same bias,SD and RMSE as the standard TMLE,but different SE and 95%CI coverage.(g)Under all scenarios,stabilized TMLE achieved smaller SD,SE,RMSE and better 95%CI coverage,among which OW and MW based TMLE were superior to other TMLEs.(h)G-compatation yielded good estimates under all scenarios for each measurement due to the correct specification of outcome models.PSW estimators from IPTW-related weights performed poorly when there was a bad overlap.The estimates of all weights based TMLE and APSW were superior to estimates of crude,G-computation,PSW and WLS in terms of all measurements under the same scenario.(2)Empirical application resultsThe study recruited 3,916 patients,among whom 3,177(81.13%)received RALRP,and the rest received ORP(18.87%).Propensity score overlap between the two groups was good enough(OVL=0.8120).The range of propensity score was 0.3928-0.9002.All weights were less than 10 and no extreme weights were observed.WAASMD corresponding to nine weihts were 0.0041,0.0390,0.0032,0.0025,0.0041,0.0041,0.0037,0 and 0.0002,respectively.All the considered methods produced similar results: patients receiving RALRP had shorter length of stay than ORP.After accounting for potential confouders,the effect estimation had decreased while the SE had increased.Compared with standard TMLE,stabilized TMLE yeiled both larger point estimate and SE.In consideration of stabilized TMLE,OW and MW based TMLE had smaller SE and narrower 95%CI.2.Research on the double robustness properties of stablized TMLEs(1)Simulation study results(a)When the treatment prevalence was 0.4,misspecified treatment model resulted in smaller extreme values,variability of weights and WAASMD of IPTW-related weights,and larger mean of weights under all degrees of overlap;when the treatment prevalence was 0.1 and overlap was good,performances on these measurments after misspecification were similar to performances when the treatment prevalence was 0.4;when the treatment prevalence was 0.1 and overlap was bad,all the above measurments increased after misspecification.(b)In all scenarios,methods based on treatment model like PSW performed poorly when the treatment model was misspecified.Meanwhile,methods based on outcome model like G-computation performed poorly when the outcome model was misspecified.(c)On the whole,stablized TMLEs outperformed standard TMLE in RMSE under four types of model specification(Qcgc,Qcgw,Qwgc and Qwgw).Among the stablized TMLEs,OW and MW based TMLE produced the smallest RMSE.(d)IPTW-related weights including IPTW,IPTW(1,99)truncation,IPTW(5,95)truncation,IPTW(10,99)truncation,MPIPTW,NIPTW and SHIPTW based APSW and TMLE were more adversely affected by extreme values than model misspecification.(e)Under Qwgc and Qwgw,stabilized TMLEs yielded better performances than standard TMLE in terms of all selected measurments(bias,SD,SE,RMSE and 95%CI coverage).(f)In the same scenario,misspecified treatment model had less extreme propensity score values than correctly specified treatment model.And performances of stabilized TMLE under Qcgw were slightly better than performances under Qcgc.(g)For TMLE and APSW,correct specification of outcome models seemed more important than correct specification of treatment models.When the outcome models were misspecified,OW and MW based APSW performed poorly.(h)When the overlap was bad,especially for a treament prevalence of 0.1,all methods performed bad except for OW and MW based estimators.Under Qwgw,all weights based APSW and TMLE performed even worse than crude estimates.(i)Under four types of model specification,stabilized TMLE and APSW achieved smaller SD,SE,RMSE and better 95%CI coverage.(j)Under Qwgw,all weights based TMLE performed better than corresponding APSW regarding bias,SD,SE,RMSE and 95%CI coverage.Under Qcgw and a good overlap,TMLE was superior to APSW.However,APSW was better under Qcgw and a bad overlap(except for OW and MW based estimators).(2)Empirical application resultsAmong 4,956 adults aged ? 65 years,12.83%(n=636)had ADL disability.Propensity score overlap between groups was not good enough(OVL=0.5955).The range of propensity score was 0.00473-0.80024.Extreme weights were observed.For instance,IPTW even reached to 71.52.WAASMD corresponding to nine weihts were 0.0105,0.0034,0.0217,0.0606,0.0105,0.0105,0.0045,0 and 0.0022,respectively.After accounting for potential confouders,the effect estimation had decreased while the SE had increased.The results were consistently statistically significant among all TMLEs except for NIPTW based TMLE: cognitive function decline in the ADL disability group was larger than the normal ADL group.Point estimates were similar among all IPTW based TMLEs.In consideration of stabilized TMLE,OW and MW based TMLE had smaller estimates and SE.No matter whether the outcome model was correctly specified,G-computation yielded comparable results,which indicated that omitting the interaction in the outcome model had little impact on the effect estimation.Overall,leaving out the interaction in the initial outcome model resulted in slightly larger estimates and SE.Conclusion:In observational real-world studies,both extreme weight and model misspecification could seriously affect the accurary of treatment effect estimation.TMLE is more adversely affected by extreme values than model misspecification.Correct specification of outcome models seems more important than correct specification of treatment models.Treatment model misspecification could affect the existence of extreme weights.When propensity score weights based methods are used to estimate the target parameter,propensity score overlap between groups and weight distribution should be checked.When there is a good propensity score overlap or less extreme weights,standard TMLE performs as well as stabilized TMLEs;when there is a bad propensity score overlap or more extreme weights,truncation could inprove the performances of standard TMLE.Stabilized TMLE achieve smaller SD,SE,RMSE and better 95%CI coverage,among which OW and MW based TMLEs are superior to other TMLEs and produce good estimate even under bad overlap.Stablized TMLEs have advantageous double robustness properties,especially when both models are misspecified.No matter model specification,OW and MW based TMLEs always yield the best estimates.Above all,the study recommends OW and MW based TMLE to minimize the impact of near violation of positivity,especially in the following cases: bad overlap;many extreme weights;low treatment prevalence;potential risk of model misspecification.
Keywords/Search Tags:causal inference, extreme weight, stabilized weight, model misspecification, double robust estimators, targeted maximum likelihood estimation
PDF Full Text Request
Related items