Font Size: a A A

Detecting Influenza Epidemics By Comparing And Optimizing Models Based On Internet Search Engine Query Data

Posted on:2017-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:R J WangFull Text:PDF
GTID:2334330503992360Subject:Information Science
Abstract/Summary:PDF Full Text Request
Influenza(flu) is an acute respiratory infectious disease caused by infection of influenza virus, it has some characteristics such as strong infectivity and rapid transmission. Conventional surveillance for influenza is routinely recommended to monitor influenza-like illness(ILI) and influenza virus infections from clinics and laboratories. This traditional mode of surveillance leads to the report data always lag far behind the development. Thus epidemiologists have been investigating alternative data sources and real-time tools for influenza surveillance. One new developing data source is internet search queries. In 2008, Google found that some search queries related to influenza are good indicators of influenza activity, they developed Google Flu Trends(GFT) which is based on the quantitative relationship between the number of search queries related to flu and the number of ILI.In China, alternative search engines such as Baidu are more widely used than Google. The market share of Google in China is less than 20%, while that for Baidu is more than 80%. On the aspect of influenza warning, previous research has conducted some application on the search engine data, however, few scholars present a systematic method for Baidu search data preprocessing and models comparing. Thus, in this paper, we collect search query data from Baidu to investigate the relationship between online information searches and conventional surveillance data in China. By developing and comparing the early-warning models, this paper explores the possibility of detecting influenza epidemics by Internet data.The contents and results of this study are mainly as follows:(1)To begin with, this paper explores the logical relationship between online information searches and conventional surveillance data based on the concepts of information behaviors, information seeking behaviors and so on. A theoretical framework is established which reveals that health condition of the individuals may motivate their demands for health information, and further driving their health information seeking behaviors.(2) According to the theoretical framework, we determine to use range selection method to select keywords from four areas, including influenza prevention, influenza symptoms, influenza treatments and frequent terms related to influenza. 79 keywords are selected in the first step, and 22 keywords are used to build the models after cross correlation analysis. The empirical research proves the logic rationality of the theoretical framework: the keywords which could reflect flu trends ten weeks in advance are related to influenza vaccines; those a week in advance are referred to influenza symptoms; and most of simultaneous keywords are frequent terms related to influenza.(3) 8 models are established according to the differences of time correlations and theories. Results indicate that multiple linear regressive model and artificial neural network model have more significant goodness-of-fit, but good fitting effect does not necessarily reflect accurate forecasting result. Besides, principal component regression model could reduce the collinear among the variables in theory, whereas both the fitting effect and prediction accuracy of it are relatively lower than those of multiple linear regressive model in practice.(4) Finally, historical ILI cases are introduced to optimize models. By comparing the models based on historical ILI cases, search queries and both above separately, the result shows that the two kinds of information are complementary in influenza surveillance, and combining the two can achieve better monitoring results.
Keywords/Search Tags:influenza, search engine, Baidu Index, early-warning model, model comparing
PDF Full Text Request
Related items