Font Size: a A A

Matching On Continuous Exposure In Multisite Studies And Its Application To Evaluate The Association Between Ambient Fine Particulate Matter And Insomnia

Posted on:2022-08-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y XuFull Text:PDF
GTID:1521306551490344Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Air pollution has become a global environmental and public health problem.Fine particulate matter(PM2.5)is recognized to be harmful to human health and is currently one of the pollutants of the greatest international concern,causing a great burden of disease worldwide.A growing number of research evidence suggests that long-term exposure to ambient PM2.5 could exert adverse effects on sleep health.Insomnia is one of the most common sleep disorders that can seriously affect an individual’s physical and psychological health,as well as his or her social functioning.China is one of the countries with the most severe air pollution in the world,and a large number of studies related to the health effects of air pollution have been conducted in China in recent years,but no studies on the effects of long-term exposure to ambient PM2.5 on insomnia have been conducted.Existing mainstream statistical analyses in environmental epidemiology use regression method that rely exclusively on model assumptions,but the proper statistical methodology is critical for epidemiological evidence for the health effects of ambient.Methods based on causal inference framework for observational studies,which emphasizes observational study design and clearly distinguishes between the design stage and the analysis stage,are increasingly being admired.In the design stage,observational study design methods are used to balance the observed covariates affecting the exposure assignment mechanism across exposure groups,to disentangle the association between exposure and covariates,and to reconstruct the"randomized"assignment mechanism in order to mimic the gold standard for causal inference,the Randomized Controlled Trial(RCT).In this stage,all processes are independent of the outcome variables.The design stage was completed before moving on to the analysis stage,where the outcome data were analyzed.Propensity Score matching(PSM)is one of the most widely used methods of observational study design.PSM is currently the most widely used method for causal inference.Compared with regression models,PSM has a few key advantages:(i)matching emphasizes the comparison of populations with similar covariate distributions between exposure and control groups without extrapolation;(ii)the covariate matching balance effect can be evaluated more intuitively;and(iii)matching can complement each other with many analysis methods such as regression.Traditional PSM mainly focuses on dichotomous exposure,but many exposure data,including PM2.5 pollution concentration,are continuous variables;on the other hand,in order to obtain data on exposure concentration gradients,air pollution data generally come from multiple regions,i.e.,multi-center studies,and pollution exposure is obviously clustered at the level of study centers,and the data as a whole show a hierarchical structure of regional study centers→individuals.However,the Generalized Propensity Score match(GPSM)method for continuous variables is just in its infancy,and only one study has proposed a feasible GPSM algorithm,and there is no relevant research on multicenter matching algorithm for continuous variables.For multilevel data from large multicenter epidemiological studies,this study extended the existing dichotomous multicenter study data algorithm to continuous variables based on the newly proposed GPSM algorithm,and propose a new multicenter matching algorithm that takes into account the improvement of individual and study center balance.The association between long-term exposure to PM2.5 pollution and insomnia in southwest China was estimated using traditional regression model and the GPSM algorithms to mutually validate and assess the sensitivity of the results to model assumptions,providing more robust scientific evidence for environmental standard setting and public health policy implementation.This study is divided into two main parts:the first part,a study of GPSM algorithm for multicenter data continuum variables;the second part,a study of the association between PM2.5 long-term exposure and insomnia in southwest China.Part one,a simulation study of GPSM algorithms for continuous variables with multisite data.Objectives:(i)to propose a multisite algorithm for the continuous variable matching algorithm;(ii)to propose a new improved algorithm for GPSM of multisite data continuous variables;(iii)to systematically compare the matching algorithms.Methods:This study is based on the newly proposed GPSM nearest neighbor matching with replacement for continuous variables.This algorithm was named na?ve GPSM(NVM)which constructs simple linear regressions to estimate GPS for pool nearest neighbor matching without considering study site effects.This study extended two existing dichotomous exposure multisite data matching algorithms to continuous exposure.The first algorithm is within-site matching(WM).WM finds the nearest neighbor GPS for matching strictly within the same high-level research sites;the second algorithm is multilevel propensity score matching with GPS estimated by multilevel models(MLPSM).the sites are considered as high-level units fitting two-level regression models to estimate GPS for pool nearest neighbor matching.A novel matching algorithm,hierarchical preferential match(HPM),was proposed.The basic idea of HPM algorithm is to first set the site matching priority according to the"distance"among high-level research sites,and simultaneously consider the individual GPS and research site priority for nearest neighbor matching.In other words,when searching for the nearest neighbor of individual GPS,the nearer research site is given higher priority.In this study,two strategies,HPM based on Mahalanobis distance(HPM-M)and HPM based on hierarchical clustering(HPM-C),were considered for high level priority assessment.Simulation study design:three study center sample allocation methods(five equal sample size study centers,ten equal sample size study centers and eight unequal sample size study centers)were first set up,and four scenarios were set up under each grouping method according to the intra-group correlation coefficient of exposure T from smallest to largest,and a total of 12 simulation scenarios were set up,and the sample size of each simulation scenario data was 1000.from variable balance,matching The advantages and disadvantages of the five algorithms,NVM,WM,MLPSM,HPM-M and HPM-C,were systematically compared in terms of variable balance,success rate,absolute bias and mean square error(MSE)of outcome estimation.Simulation study Results:The improvement of covariate balance was similar in the 12 simulation scenarios,NVM had the best improvement of individual level covariate balance,but the balance of high-level units was worse than before matching;MLPSM improved both individual covariate balance and high-level unit balance,but some individual covariate balance might occur during matching because high-level units were considered when estimating GPS.WM has the best improvement on high-level unit balance,but almost no improvement on individual covariate balance;HPM-M showed some improvement in high-level unit balance but little improvement in individual covariate balance.HPM-C can improve individual level covariate balance and improve high level unit balance to some extent.Matching success rates were also similar in the 12 simulation scenarios,with WM having the lowest matching success rate,while the other three matching algorithms had similar matching success rates.The effect estimation bias was the lowest for HPM-C among the five matching algorithms in all 12 simulated scenarios,while the estimation bias for MLPSM was the highest among all five matching algorithms.In the scenarios with only five balanced high-level units,the estimation bias of WM and NVM were similar,and in the scenarios with ten balanced high-level units and eight unbalanced high-level units,the estimation bias of WM was smaller than that of NVM.In the 12 simulated scenarios,the MSEs of all five matching algorithms were closer.Conclusion:In this study,the HPM algorithm is proposed.And the simulation study results show that HPM-C is better for the comprehensive improvement of covariate balance,has less absolute bias in effect estimation,has MSE comparable to other matching algorithms,and retains most of the matched objects,and is more suitable for multilevel data matching of continuous exposure than the existing NVM,MLPSM,and WM.Part two,application of matching algorithms on continuous exposure in multisite studies to evaluate association between long-term PM2.5 exposure and insomnia.Objective:Using southwest China as the study site,we analyzed the same data using traditional regression models and causal inference methods to empirically demonstrate the proposed method and assess the sensitivity of the evidence to model assumptions,providing evidence of the association between long-term exposure to ambient PM2.5 and insomnia in China.Methods:Based on the baseline data of China Multi-Ethnic Cohort(CMEC)in southwest China,the city and state where the included sample individuals are located are the high-level units,and the cohort members are the low-level units.Firstly,we set the matching priority for the city and state,and use hierarchical clustering to rank the cities and states based on their economic and health-related indicators,and then adjust the sequence according to the actual situation,such as geographical distribution and ethnic customs,to assign the matching priority to the high-level units.After determining the city-state priority,the four aforementioned matching algorithms(NVM,MLPSM,HPM-C and WM)are used to match.Three analytical frameworks were used in the association study:(i)a completely model-based approach,mainly regression modeling methods,including generalized addictive model(GAM)and two-level logistic regression;(ii)a completely design-based approach,mainly based on GPS matching,and then using kernel smoothing estimation to fit the dose-response curve,where GPS matching algorithms include the aforementioned NVM,MLPSM,HPM-C and WM;(iii)the combined design and model approach,which mainly constructs a model based on the pseudo-population obtained after matching to estimate the dose-response relationship.Model a and model b were fitted separately,with model a incorporating only individual PM2.5 concentrations and model b considering city-state aggregation in addition to individual PM2.5concentrations,and constructing a two-level random intercept logistic model treating city as the two-level unit.Results:A total of 70,213 people were included in this study from eight cities in four provinces.Among the four matching algorithms,the HPM algorithm proposed in this study has the smallest AAC(HPM:0.11;NVM:0.13;MLPSM:0.14;WM:0.15).Exposure response curves for long-term PM2.5 exposure on insomnia using GAM and design-based approach were somewhat different although the specific details The general trend was that the risk of insomnia increased with increasing PM2.5concentrations,and this upward trend was characterized by a sharp and then slow increase,i.e.,at lower concentrations,the slope of the curve was larger and the risk of insomnia increased faster,and when the PM2.5 concentration increased to a certain level,the slope of the curve became smaller and this upward trend slowed down.for every 10μg/m3 increase in PM2.5 long-term exposure concentration,the models estimated that the risk of insomnia increased with the slope of the curve.,the ORs for insomnia estimated by each model were listed as follow:1.08(95%CI:1.05-1.12)for the two-level logistic regression;1.03(95%CI:1.01-1.04)for the model a of WM algorithm and 1.05(95%CI:1.04-1.07)for the model b of WM algorithm;1.03(95%CI:1.02-1.04)for the model a of HPM algorithm and 1.20(95%CI:1.15-1.25)for the model b of HPM algorithm;1.04(95%CI:1.02-1.06)for the model a of NVM algorithm and1.26(95%CI:1.21-1.31)for the model b of NVM algorithm;1.04(95%CI:1.56-1.61)for the model a of MLPSM algorithm and 1.26(95%CI:1.67-1.81)for the model b of MLPSM algorithm.Conclusions:The HPM algorithm proposed in this study could obtain the best matched quasi population in the empirical study.The results of the empirical study found an association between long-term PM2.5 exposure and the prevalence of insomnia,However,there are some variations among the effect values estimated by different methods,so the effect magnitude of long-term PM2.5 exposure on insomnia needs further study.
Keywords/Search Tags:PM2.5, long-term exposure, insomnia, generalized propensity score match, multisite study
PDF Full Text Request
Related items