Font Size: a A A

Research On The Application Value Of Stacking Architecture And Transfer Learning In The Prediction Model Of Infectious Diseases

Posted on:2021-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:1484306134955009Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:1.Based on the data of the number of malaria cases in various provinces,municipalities and autonomous regions from 2011 to 2017,the incidence trend and spatial characteristics of malaria in China were described and analyzed.The correlation between malaria incidence and regions were discussed,and the key provinces or autonomous regions that require predictions were identified.2.The application value of deep learning algorithms in the prediction of the number of infectious diseases was discussed.The prediction performance of the traditional time series model and deep learning algorithm in malaria case prediction were compared.3.Stacking architecture was introduced to improve predictive performance by combining distinct algorithms and models.Moreover,the application of the stacking architecture in the field of infectious disease prediction was also explored.4.Utilizing the idea of transfer learning,the prediction model based on stacking architecture with outstanding prediction performance was applied to the construction of other infectious disease prediction models.Methods:1.Source data includes monthly reports of malaria cases in China's mainland from 1950 to 2017 and meteorological data.After the data cleaning of the source data,least absolute shrinkage and selection operator(LASSO)regression was adopted for the dimension reduction of the meteorological variables.2.Arc GIS 10.2(ESRI,USA)software was used to conduct visual analysis for each province and city during the malaria control phase(2007-2010)and malaria elimination phase(2011-2016).Then spatial autocorrelation analysis and Getis-Ord Gi * hot spot analysis was conducted using data from 2007 to 2016.Moran's I index was adopted to reflect the spatial distribution of malaria cases in China's mainland.3.The ARIMA,STL+ARIMA,BP-ANN and LSTM network models were separately applied in simulations using malaria data and meteorological data in Yunnan Province from 2011 to 2017.We compared the predictive performance of each model through evaluation measures.4.According to the technical framework of Stacking,the above four models were combined using gradient boosting regression tree(GBRT)to reduce the generalization error and improve the prediction performance of the model.5.According to the idea of transfer learning,based on the model trained on monthly reports of malaria cases data,a time series prediction model that can be applied to the prediction of the incidence of other infectious diseases was constructed.Results:1.The results of Moran 's I Index suggest that there is indeed a spatial correlation between the number of malaria cases in various provinces,municipalities and autonomous regions.Yunnan Province was selected as the region with high malaria incidence for subsequent analysis.2.The Root Mean Square Error(RMSE)of the four sub-models were 13.176,14.543,9.571,and 7.208,and the MAE(Mean Absolute Error)of the four sub-models were 10.367,10426,6.548,and 5.869,and the Mean Absolute Scale Error(MASE)of the four sub-models were 0.469,0.472,0.296,and 0.266,respectively.After using the results of the four sub-models as inputs combined with GBRT,the RMSE,MAE,and MASE values of the ensemble model decreased to 6.810,4.940,and 0.224,respectively.3.The ensemble model was trained to predict the incidence of influenza in Shanxi Province,and the RMSE,MAE,and MASE values of it were 0.928,0.769,and 0.035.In order to have a horizontal comparison of model prediction performance,four sub-models were used to fit the influenza data of Shanxi Province.The RMSE of the four sub-models were 1.046,1.031,1.550,1.486,and the MAE of the four sub-models were 0.917,0.926,1.210,1.135,and the MASE of the four sub-models were 0.041?0.042?0.055?0.051,respectively.Conclusions:A novel ensemble model based on the robustness of structured prediction and model combination through stacking was developed.The findings suggest that the predictive performance of the final model is superior to that of the other four sub-model,indicating that stacking architecture may have significant implications in infectious disease prediction.The findings also suggest that the ensemble model is still better than the four sub-model when the ensemble model with outstanding prediction performance in the source task transferred to complete other target tasks,such as predicting the incidence of influenza.This indicates that transfer learning may have significant implicationis in the field of infectious disease prediction.
Keywords/Search Tags:time series, prediction model, LSTM, ensemble learning, Stacking framework, transfer learning
PDF Full Text Request
Related items