With the development of artificial intelligence,causal inference has become one of the core issues of statistics and data science.People have begun to pay attention to how to launch a causal relationship conclusion when a phenomenon has occurred.Compared with machine learning,causal inference has attracted attention for its unique interpretability.The application of propensity score in observational research has always been the focus of causal inference research,but how to accurately estimate the propensity score value has always been hindered by problems such as covariates and uncertain functional relationships between covariates and treatment.At present,some studies have proposed that the introduction of nonparametric machine learning methods in the process of estimating propensity score may be a means to achieve accurate estimation of propensity score.Based on the standardized mortality ratio weighting,this paper proposes to optimize the propensity score model by using machine learning.The core of the idea is to replace the traditional logistic regression estimation propensity score weight through machine learning,and further eliminate the difference in covariate distribution between treatment group and control group.Therefore,this paper first examines the performance of various propensity score models using simulated data.According to the degree of linearity and additive correlation between covariates and treatment,this paper estimates the propensity score weighting by logistic regression,support vector machine model,neural network,random forest and generalized boosted models(GBM)in four cases.The results show that logistic regression performs poorly under the conditions of non-additive and nonlinear,while the ensemble learning may be more useful for covariate balance.Finally,the real data of the SEER database is used as the experimental background,five propensity score models are empirically analyzed.Based on the propensity score model,the effect of cancer-directed surgery(CDS)on the survival rate of patients with oligometastatic pancreatic ductal adenocarcinoma(PDAC)is evaluated,and the causal inference process and results of different models are compared.By balancing the covariates and estimating the average effect of the treatment group,it is confirmed that CDS can effectively prolong the overall survival of patients with oligometastatic PDAC.Empirical analysis shows that covariate balance and robustness of neural network,random forest and GBM model are better than logistic regression model. |