Font Size: a A A

Real-World Study Design And Analysis Strategies Under Big Data

Posted on:2023-07-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C SiFull Text:PDF
GTID:1524306902982419Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BackgroundWith the accumulation of healthcare big data such as electronic medical records and electronic health records,Real-World Study(RWS)based on Existing Health and Medical Data has gradually been popularized.Using the existing health and medical data resources,acquiring compliant real-world data through data treatment,developing the RWS design and analysis strategies,and conducting RWS on specific clinical questions have become pressing scientific issues in evidence-based medical research.Due to the multi-source,heterogeneous,and non-uniform standards of healthcare big data,it is challenging to convert it into research data that conforms to medical research.It is an effective way to solve this problem to construct a scientific data warehouse with a series of standard database designs and technological processes and make a connection with the Common Data Model(CDM)established by the Observational Medical Outcomes Partnership(OMOP).Besides,using existing healthcare data to conduct RWS not only requires effectively controlling numerous observable confounding factors,but also correcting biases caused by unknown confounding and systematic errors.It is necessary to select appropriate study design and statistical methods for bias control in the different stages;then,through the optimal combination,to form a real-world study design and analysis strategy for the above-mentioned scientific data warehouse.In terms of research design,the New User cohort design can achieve the purpose of simulating the target outcome trails on the assumption of causal inference theory.In terms of analysis methods,Propensity Score(PS),negative control design,positive control design,P value calibration,and confidence interval calibration can be used for controlling observed and unobserved confounding and bias in RWS.On this basis,a set of designs and statistical analysis strategies applicable to RWS is expected to be formed.Using the above RWS design and analysis strategies based on scientific data warehouses,the RWS can be carried out for specific clinical problems in specific fields.Among the drugs for diabetes,long-term use of insulin and metformin therapy remains controversial for cardiovascular risk,while high-quality evidence for the risk of renal,cancer,and death outcomes is starkly lacking.Regarding combination therapy,whether retaining metformin in the prescription at all times can benefit the above outcomes has not been confirmed.In addition,based on the scientific data warehouse,an individual risk prediction model for T2DM patients could be developed to screen high-risk individuals,which could help to optimize drug therapy and reduce the risk of long-term outcomes in T2DM.Therefore,creating a scientific data warehouse through the database design and data processing based on the existing healthcare big data,proposing a set of RWS design and analysis strategies,and conducting a study with the example of T2DM drug treatment evaluation,can provide a complete case study in the context of big data and provide a reference solution to this scientific problem.Aims1.Based on the regional diabetes inpatient healthcare big data,to develop a scientific data warehouse design and a standardized data processing process,create a Diabetes Science Data Warehouse from raw healthcare big data that can be interfaced with international common data models.2.Based on the Diabetes Science Data Warehouse,to propose and create a set of RWS design and statistical analysis strategies applicable to the evaluation of drug therapy in the context of big data by selecting and optimizing a combination of statistical methods for bias control in all aspects of research.3.Based on the scientific data warehouse and research strategy,to explore the potential impact of metformin and insulin on major adverse outcomes,obtain real-world evidence on whether metformin combination therapy could reduce the basal risk of dispensing drugs,and answer the question of whether metformin is required to be kept in the prescriptions at all times.Finally,to develop a predictive model for the risk of adverse outcomes in patients with diabetes to provide a decision tool for optimal treatment measurements.Materials and methodsThe research data was acquired from the medical data of the third-level first-class hospitals in the National Healthcare Big Data Research Institute-Weihai Regional Healthcare Big Data Platform.The subjects included in the study were based on all-cause inpatients with diabetes in this regional platform from December 1,2009,to December 31,2021,with a total of 45,318 individuals.The outcome data of this study were obtained from the records of multiple visits within the regional platform,and the outcomes outside the platform were supplemented by linking the electronic medical records homepage of the whole province and the chronic disease,cancer,and death registration databases of the Shandong Center for Disease Control and Prevention.The end date for outcome tracking is April 1,2022.All diagnoses,measurements,and medication use in patients with diabetes were included in this study.The basic information included age,gender,and date of birth.The personal and medical history included smoking status,drinking status,sleep quality,appetite status,mental status,weight changes,and history of diabetes.These data are transformed into a standard format through the design framework and standardized data processing process(including data collection,data processing,coding mapping,concept mapping,extract-transform-load process,and quality control)proposed in this study to create a diabetes scientific data warehouse.Through further processing,the data were mapped to the standard concepts of OMOP-CDM(including 4635 diseases,888 drugs,and 71 measures),and finally,the diabetes OMOP-CDM data warehouse was created.For study design,the New User Cohort Design was utilized in the real-world study.A total of 15,650 patients with drug therapy were included in the total cohort according to the inclusion and exclusion criteria.New user cohorts for specific drugs/prescriptions were created according to the study contents,and three separate cohorts were created for target exposure(X1),control group(X0),and outcome(Y)in each analysis.The diabetic drugs in this study were divided into four categories:①metformin,②α-glucosidase inhibitor(AGI),③insulin secretagogues(sulfonylureas + glinides),and ④insulin.A comparative analysis was performed by creating new user cohorts of various drugs/prescriptions from the total cohort.The analysis and comparison cohorts were divided into three groups:①metformin user cohort and non-metformin user cohort,②insulin user cohort and non-insulin user cohort,and③metformin combination user cohort and non-metformin combination user cohort.Each group was further divided into subcohorts according to specific drugs for subgroup analysis.This study defines the exposure standard as more than half a year(>180 days)for each group.This study defined nine target outcomes,including all-cause mortality,cardiovascular disease(CVD)mortality,cancer mortality,chronic kidney disease(CKD),cancer,myocardial infarction,heart failure,stroke,and 4-Point Major Adverse Cardiovascular Events(4P-MACE).The research content is carried out according to the RWS content including characteristic description,population-level causal effect estimation,and individual-level risk prediction.In the characterization,it was analyzed from three aspects:cohort characterization,exposure characterization,and follow-up and outcome characterization.Furthermore,the current situation of drug treatment in diabetic patients was explored through treatment pathway analysis.In terms of population-level causal effect estimation,a statistical analysis strategy for the RWS was proposed.The steps are ①cohort and covariate configuration,that is,combining the created cohorts(X1,X0,Y)according to study purpose and configuring variable sets for analysis;②performing a large-scale L1 regularization propensity score and achieving a quasi-randomization process through PS matching and weighting;③diagnosing the quality of quasi-randomization by comparability between groups,overlapping-preference score distribution,and balance diagnosis;④selecting a statistical model to estimate the average treatment effect.The Cox proportional hazards regression model and the Fine-Gray competitive risk model based on 1:N matching were used as the principal analysis while the univariate and PS-weighted Cox regression was used as a sensitivity analysis;⑤creating 150 negative control(theoretical relative risk RR=1)outcome cohorts and synthesizing positive control outcomes(theoretical RR=1,1.25,1.5,2,4)based on negative control outcomes.Then,using the negative control and positive control outcomes to diagnose residual confounding and bias;⑥fitting the null distribution of the P value based on the negative control outcomes,and performing empirical calibration on the P value.At the same time,using the negative control and positive control outcomes jointly fit a systematic error model to calibrate the point estimates and confidence intervals.By removing bias caused by unobserved confounding and systematic errors,to obtain population-averaged causal effects.For individual-level risk prediction,the regularized Cox method is selected as the prediction method and was established based on different data domains and compared with previously published models.Model validation was performed by dividing the database into a training set and a validation set,and five-fold cross-validation was used at the same time.Through model validation,threshold evaluation,discriminant ability,calibration curve,and decision curve to evaluate the performance of the predictive model.Finally,the nomogram of the prediction model was created as a clinical application tool.ResultsThe longest follow-up period was 15 years,and the median follow-up period was 5.6 years.During the entire observation period,insulin users accounted for the largest proportion,accounting for about 82%,followed by metformin(70%),AGI(47%),sulfonylureas(32%),and glinides(25%).The treatment path analysis showed that the clinical therapy for diabetes is highly personalized,with a high proportion of insulin use and relatively insufficient use of metformin.In the population-level effect estimation of the medication,the large-scale regularized propensity score achieved well balance among the comparison groups in nearly all characteristics especially the serious of illness.Unknown bias and confounding were ideally calibrated using negative controls and synthetic positive controls.The results of the three comparison groups were shown as follows:(1)Long-term effects of metformin:Compared with non-metformin users,the risk of 6 major outcomes was lower in metformin users.The adjusted risk(hazard ratios)was 0.44(95%CI:0.30-0.59,P<0.001)for all-cause mortality,0.53(95%CI:0.36-0.72,P<0.001)for CVD mortality,0.43(95%CI:0.28-0.59,P<0.001)for cancer mortality,0.50(95%CI:0.33-0.68,P<0.001)for CKD,0.68(95%CI:0.49-0.87,P<0.001)for cancer,and 0.72(95%CI:0.51-0.94,P<0.05)for myocardium infarction.In subgroup analyses,metformin was associated with a lower risk of several of these outcomes compared with other specific drugs.The estimates from Cox and competing risk models are consistent.(2)Long-term effects of insulin:Compared with non-insulin users,insulin users had a 1.45-fold(95%CI:1.07-2.09,P<0.05)risk for all-cause mortality,1.68-fold(95%CI:1.11-2.72,P<0.05)risk for cancer mortality,1.65-fold(95%CI:1.05-2.77,P<0.05)risk for CKD,and 1.87-fold(95%CI:1.34-2.81,P<0.001)risk for cancer,but the effect on the cardiovascular system was not statistically significant.In subgroup analyses,insulin was associated with an increased risk of several of these outcomes compared with other drugs.(3)Long-term effects of metformin combination therapy:Any other hypoglycemic drug combined with metformin presented a protective effect on almost all outcomes.Combination use of metformin had a 0.37-fold(95%CI:0.25-0.50,P<0.001)risk for all-cause mortality,0.49-fold(95%CI:0.33-0.67,P<0.001)risk for CVD mortality,0.31-fold(95%CI:0.19-0.45,P<0.001)risk for cancer mortality,0.46-fold(95%CI:0.30-0.63,P<0.001)risk for CKD,0.73-fold(95%CI:0.54-0.93,P<0.05)risk for cancer,0.79-fold(95%CI:0.61-0.98,P<0.05)risk for 4P-MACE,0.65-fold(95%CI:0.46-0.86,P<0.005)risk for myocardial infarction,and 0.78-fold(95%CI:0.59-0.97,P<0.05)risk for heart failure.In subgroup analyses,the combined use of metformin with any specific antidiabetic drugs was associated with a significant risk reduction.The combination of metformin with insulin was most effective,reducing the risk of all target outcomes except stroke compared with insulin use alone.In the individual-level prediction of long-term outcomes,the prediction models created from the full data domain have the best performance.The total number of predictors for the nine outcomes was 178,in which insulin was selected as a predictor of all-cause mortality,cancer mortality,and CVD mortality.In the validation set,the area under the curve(AUC)for the 9 outcomes was:all-cause mortality 0.83(0.81-0.85),CVD mortality 0.84(0.81-0.86),cancer mortality 0.82(0.80-0.85),CKD 0.88(0.86-0.91),cancer 0.63(0.61-0.66),4P-MACE 0.73(0.71-0.74),myocardial infarction 0.72(0.69-0.76),heart failure 0.78(0.76-0.80),and stroke 0.72(0.70-0.73).For C-statistic and other evaluation indicators,both the training set,validation set,and cross-validation showed similar and ideal results.Conclusions1.Based on the regional diabetes healthcare big data,this study designed and created a diabetes scientific data warehouse from the raw healthcare big data through standardized data processing.Finally,it was connected with the international OMOP-CDM to create a diabetes OMOP-CDM database for RWS.2.This study proposed and developed an RWS statistical analysis strategy including six steps:①cohort and covariate configuration,② quasi-randomization process,③quasi-randomization diagnosis,④average causal effect estimation,⑤bias diagnosis,and⑥results calibration.The strategy formed a complete study process by optimally combining statistical methods for bias control in all sessions of statistical analysis.Combined with the new user cohort design,it could be used for estimating population-level average causal effects in RWS under big data.3.Based on the scientific data warehouse and the study design and strategy,the potential impact of major glucose-lowering drugs on long-term outcomes was evaluated.This study confirmed that long-term use of metformin can reduce the risk of multiple outcomes and the use of insulin could significantly increase the risk of all-cause mortality,cancer mortality,CKD,and cancer.The combination use of metformin in the context of any antidiabetic drug(e.g.insulin)could reduce the risk of various outcomes,which provides evidence to support the retention of metformin in prescriptions at all times.4.This research established predictive models based on healthcare big data and achieved the ideal prediction performance for nine outcomes.It confirms the value of the scientific data warehouse in the prediction of long-term outcomes and provides a decision-making tool for identifying high-risk individuals and guiding the selection of optimal treatment measurements.Collectively,this study forms an RWS protocol based on healthcare big data through the proposed scientific data warehouse design and RWS analysis strategy.Combined with the results of the case study on the evaluation of long-term outcomes of diabetes medication,this study provides a complete reference case for RWS from raw data processing to evidence-based medical evidence generation.Innovation1.This study proposes the design and creation process of a "scientific data warehouse"based on healthcare big data,and establishes a regional diabetic scientific data warehouse,which has achieved the combination with international common data models.This provides the feasibility evidence and reference for the creation of a domestic medical scientific data warehouse.2.By optimizing and combining the study design and statistical analysis methods of bias control in each section of research,this study proposed the strategy including①cohort and covariate configuration,②quasi-randomization process,③quasi-randomization diagnosis,④average causal effect estimation,⑤bias diagnosis,and ⑥ results calibration.The case study of T2DM using this strategy provides a reference for conducting similar RWS based on healthcare big data in the future.3.This case study confirmed the protective effect of metformin on most clinical outcomes and the adverse effect of insulin.Additionally,this research provided new evidence that the combination use of metformin with any drug could reduce the underlying risk of the corresponding drug.This provides real-world evidence for the potential impact of diabetes drug therapy.Meanwhile,the established individual-level predictive model of long-term risk for regional diabetes patients could provide tools for identifying high-risk individuals and guiding drug selection.
Keywords/Search Tags:Real-World Study, Type 2 Diabetes, Scientific Data Warehouse, Study Design and Strategy, Drug Therapy Evaluation
PDF Full Text Request
Related items