Objective: To explore the epidemiological characteristics of tuberculosis in China in three dimensions: time,space,and spatio-temporal,and to explore the feasibility of using machine learning methods to develop a time series forecasting analysis of the number of tuberculosis in China,so as to provide data support for further development of accurate tuberculosis prevention and control in China.Methods: Through the National Public Health Science Data Center,the monthly number of tuberculosis in China(excluding Hong Kong,Macau and Taiwan)from 2006 to 2018,and the monthly number of tuberculosis and annual incidence in 31 provinces(excluding Hong Kong,Macau and Taiwan)from 2006 to 2018 were collected.Through the China Statistical Yearbook,the year-end resident population data of 31 provinces from 2006 to2018 were collected.The research methods are: 1.National spatio-temporal analysis of tuberculosis was conducted in three dimensions: in the temporal dimension,a time series diagram was constructed and seasonal indices were calculated to analyze the epidemic trend and seasonal effect of tuberculosis in China from 2006 to 2018.In the spatial dimension,the spatial autocorrelation analysis(including global spatial autocorrelation and local spatial autocorrelation)of tuberculosis incidence from 2006 to 2018 was conducted to explore the distribution characteristics of tuberculosis in spatial regions in China.In the spatio-temporal dimension,the Kulldorff’s space-time scan was conducted to detect the trend characteristics of the aggregation of tuberculosis in China over time from 2006 to 2018.2.Time series forecasting analysis: based on the monthly number of tuberculosis in China from January 2006 to December 2018,a time series forecasting analysis was developed using the Extreme Gradient Boosting(XGBoost).The data were divided into training set and test set,with the training set being the number of monthly tuberculosis from January 2006 to December 2017 and the test set being the number of monthly tuberculosis from January 2018 to December 2018.The Mean Absolute Error(MAE),Root Mean Squared Error(RMSE),and Mean Absolute Percentage Error(MAPE)were selected to evaluate the model performance.Results: 1.From 2006 to 2018,the number of tuberculosis in China generally showed a slow decreasing trend and had a clear seasonal trend.The number of tuberculosis was relatively low in January and February and relatively high in March and April each year.2.In the global spatial autocorrelation,all Moran’ I from 2006 to 2018 were > 0 and continued to increase overall,with P < 0.05.In the local spatial autocorrelation,the provinces with high-high distribution of tuberculosis incidence from 2006 to 2018 mainly involved Hunan Province,Qinghai Province,Sichuan Province,Guangxi Zhuang Autonomous Region,Tibet Autonomous Region and Xinjiang Uygur Autonomous Region.Provinces showing low-low distribution of incidence mainly involved four regions,Hebei Province,Beijing Municipality,Jiangsu Province and Zhejiang Province.In addition,Yunnan Province,Qinghai Province and Sichuan Province had shown a low-high distribution.3.Results of the Kulldorff’s space-time scan of tuberculosis data from 31 provinces in China showed that four clusters were detected from 2006 to 2018,including three secondary clusters.The primary aggregation area involved 10 regions,namely Guangxi,Hainan,Guizhou,Guangdong,Hunan,Jiangxi,Chongqing,Yunnan,Sichuan and Hubei provinces,and the aggregation period was from January 2006 to June 2012.Subagglomeration 1 involves three regions,namely Xinjiang Uyghur Autonomous Region,Qinghai Province and Tibet Autonomous Region,with the aggregation period from July2012 to December 2018.Sub-agglomeration 2 involves 1 region,which is Heilongjiang Province,and the aggregation period is from January 2006 to June 2012.Sub-aggregation area 3 involves 1 region,which is Henan Province,and the aggregation time is from January 2006 to June 2009.4.The optimal parameter set of the XGBoost model is as follows: colsample_bytree is 1,learning_rate is about 0.01,max_depth is 4,min_child_weight is 4,n_estimators is 636,and subsample is 0.5.In the fitting and prediction results of the model for the number of tuberculosis in China from 2006 to 2018,the MAE,RMSE,and MAPE of the training set are 2515.16,3628.33,and 2.32%,respectively,and the above three indexes of the test set are 5187.82,6487.95,and 5.44%,respectively.The predicted values of XGBoost are close to the actual values.Conclusions: 1.From 2006 to 2018,the number of tuberculosis in China showed an overall decreasing trend,indicating that the tuberculosis prevention and control measures in China are effective and the epidemic is being controlled.2.From 2006 to 2018,there is an obvious spatial positive autocorrelation in the incidence of tuberculosis in China,and the degree of aggregation is on the trend of enhancement.The incidence of tuberculosis in three regions,Xinjiang Uyghur Autonomous Region,Tibet Autonomous Region and Qinghai Province,has been showing a high risk state,suggesting that focused prevention and control should be continued.3.The XGBoost model has a good performance and is suitable for the forecasting of tuberculosis,which is of positive significance for the exploration of tuberculosis epidemic trends in China. |