Monitoring The Spatial Distribution Of Epidemic Based On Search Engine Data

Posted on:2019-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xiao

Full Text:PDF

GTID:2334330545499719

Subject:Cartography and Geographic Information Engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet has resulted in rapid growth of data accumulated on the Internet and most of them are related to spatial location.In the era of Web 2.0 when users participate in content creation,they spontaneously create geo-spatial content which includes user access records on the Internet.The search history of search engine includes search keywords,search time and the IP address corresponding to a geographic location,which means search engine data is a typical source of geospatial big data.It is open,ubiquitous,and near real-time so that it can help solve those problems that traditional data cannot solve.The application of search engine data in the field of disease surveillance is a classic case of big data applications.When the Google flu trends was first issued,it has attracted widespread attention,and many scholars have followed it.Previous researches mainly focus on the time characteristics of search engine data.There are few studies focus on the spatial distribution characteristics of web search behavior so the spatial aspect of search engine data has not been fully used.This paper studies in this aspect and its contents include:(1)Researches on the search engine data acquisition method.Taking Baidu Index as an example,this paper introduces the framework of auto-crawler using Python.This website does not provide API for direct data access.The data is not represented by static text but interactive charts and the figures are encrypted by collage of pictures which means it is difficult to collect data.Selenium,a Python package is used to simulate the user’s input,selection and hover.After moving the mouse to the position of the index and capturing the screen,the image recognition package can be used to obtain the search index of a keyword.(2)Researches on data preprocessing methods,including the selection of keywords and the processing of multicollinearity between keyword’s search indices.Keywords from relevant researches and recommended by keyword mining tools are considered to be the initial scope.Then,keywords that are highly related to real flu cases are selected through correlation analysis.Stepwise regression and principal component analysis are conducted to solve the multicollinearity problem and when and where they can be useful is discussed.(3)Model the relationship between the related keyword’s search index and the real flu cases and their variation with time and space,and this model is used to provide near real-time estimates of the spatial distribution of influenza.Previous studies have pointed out that there are spatial differences in web search behavior.When researchers research many regions at a time,they usually model each region separately.Linear regression models based on ordinary least squares are commonly used.Considering the similarity between a spatial unit and its surrounding unit,this study models multiple research areas simultaneously and consider the distance decay effect.Ordinary least squares regression(OLS),geographically weighted regression(GWR),and geographically temporally weighted regression(GTWR)are conducted and their fitting results and monitoring results are compared.It is found that the GTWR model which takes the non-stationarity of time and space into account is the best.This method can be used as a supplement to traditional disease surveillance methods.The combination of GTWR model and search engine data can identify high-influenza regions and monitor the spatial distribution of influenza at near real time,it can also provide predictive models and statistical interpretations for spatial epidemiological studies.

Keywords/Search Tags:

geographically and temporally weighted regression, spatial distribution, web crawler, Baidu Index, influenza

PDF Full Text Request

Related items

1	Exploration On Estimation Method Of Regional Cases Of Seasonal Influenza
2	Spatial Distribution And Analysis Of Influencing Factors On Adverse Pregnancy Outcomes
3	An Application Of Spatial Analysis Techniques To Geographic Distribution And Risk Factors Of HIV/AIDS In Guangxi
4	A Study On The Spread Model And Spatial Distribution Of Hfmd Based On Gis
5	The Application Of Spatial-temporal Analysis And Modeling Methods On Hemorrhagic Fever With Renal Syndrome
6	Spatial Epidemiology And The Geographical Risk Factors Of Tuberculosis On A Smaller Scale
7	Study On Influencing Factors Of Incidence Of Bacillary Dysentery In Gansu Province Based On Spatial Regression Techniques
8	Spatial-temporal Distribution And Influencing Factors Of Tuberculosis In Inner Mongolia,2016～2018
9	Study On Epidemiological Characteristics And Temporal-spatial Clustering Of COVID-19 In Shandong Province
10	The Construction Of Support Vector Machine Regression Model To Forecast Influenza Epidemic By Integrating Baidu Search Queries And Traditional Surveillance Data In Beijing And Liaoning Province From 2011 To 2016