Font Size: a A A

The Construction Of Support Vector Machine Regression Model To Forecast Influenza Epidemic By Integrating Baidu Search Queries And Traditional Surveillance Data In Beijing And Liaoning Province From 2011 To 2016

Posted on:2020-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:F LiangFull Text:PDF
GTID:2404330596995759Subject:Public health
Abstract/Summary:PDF Full Text Request
Objective: 1.To explore the application of support vector machine(SVM)regression model in merging search engine query data and traditional influenza data.2.To explore the application of SVM regression model of influenza in Beijing and Liaoning province.Methods: The monthly influenza incidence rates in Beijing and Liaoning province were obtained from China National Scientific Data Center for Public Health from January 2011 to December 2016."Influenza" was adopted as the primary item in a free platform named ‘China Webmaster’ to find more influenza-related internet keywords.Based on Baidu Index,the monthly search volume of the internet keywords in Beijing and Liaoning province from January 2011 to December 2016 were collected.Correlation analysis was applied to the keywords with different lag periods and the incidence rate of influenza.The Baidu keywords whose correlation coefficient was greater than 0.4 and had statistically significant will be selected into the SVM regression model.The possible values of the three parameters(C,γ,ε)of the SVM regression model are enumerated by the method of exhaustion.The better parameter values were selected in the model by "leave-one-out" cross validation.Two evaluation metrics including Root Mean Square Error(RMSE)and Root Mean Squared Percent Error(RMSPE)were used to evaluate the performance of the model.Correlation analysis was performed by IBM SPSS 22.0 software,and SVM regression model analysis was performed by R 3.4.2 using e1071 package.Results: The influenza incidence rate in Beijing presented large variations with an obvious seasonality,and the incidence of influenza was increasing year by year during the peak season of epidemic.The variation of influenza incidence rate in the peak season in Liaoning Province is much smaller than that in Beijing.Correlation analysis results indicated that there were 26 Baidu keywords in Beijing entry the model while 17 Baidu keywords in Liaoning province entry the model.The better model parameters for the Beijing SVM regression model were C=6,γ=0.005,and ε=0.01,respectively.The SVM model in Liaoning province performed well with the parameters,C=3,γ=0.005,ε=0.01.The model based on Beijing Baidu keywords had the minimum of RMSE and RMSPE,which were 5.491561 and 0.605623,respectively.The values of these two indicators had a little different from the model based on integrated data.Both of two indicators values of these model were much lower than the model based on past influenza incidence data.It can be considered that the prediction effect of the model based on Beijing Baidu keyword and the model based on integrated data model were better than the model based only on past influenza incidence data.At Liaoning,the RMSPE of the Support Vector Machine Regression Model based on the lag of January flu incidence + Baidu keyword integrated data was the minimum,at 0.290522.Thus,the integrated data model in Liaoning was performed well than the model based only on past influenza incidence data.Conclusion: It was feasible to use data of internet search engine as a supplementary data source for traditional influenza surveillance,and the SVM regression model was effective for tracking influenza epidemic in Liaoning province.The SVM regression model based on data of influenza incidence and Baidu search query in Liaoning province was more stable than that in Beijing.
Keywords/Search Tags:Infectious diseases, Seasonal influenza, SVM regression model, Baidu keywords, Flu surveillance system
PDF Full Text Request
Related items