Font Size: a A A

Research On The Statistical Inference Problems In WEB Search Data

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y JiangFull Text:PDF
GTID:2310330518493403Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The network data has been growing at a surprising speed since entering the Internet era.However,the utilization efficiency of web search data is not high enough.We cannot discover the information hiding in the web search data if we do not use scientific methods and techniques.So we hope to make full use of the web data and analyze the data by using statistical methods.In this paper,we search and sort out the needed web search data,then use statistical methods to analyze some statistical inference problems in web search data from three different aspects:exploring spatial distribution characteristics of the elderly population using the web search data;forecasting CPI(Consumer Price Index)with web search Baidu Index;analyzing the web search UCI datasets with use of modified KNN(K Nearest Neighbors)algorithm.At present,China's aging population grows fast and it has a significant impact on society and other aspects.Therefore,once the elderly population data is obtained in various provinces and cities in China,we can use statistical methods to analyze the spatial correlation of aging population from the spatial level which will help the government understand the China's elderly population distribution and gathering state objectively?scientifically and comprehensively.Nowadays the network has gradually replaced the traditional media and become the important way for users to obtain information on the Internet.Users use the Internet(such as the Baidu Index)to search related information while the Internet also records their query records,and CPI is an important indicator of economic prices which have relations with web search data,so it becomes very meaningful for us to use the Baidu Index data to research on statistical inference problems of CPI.The existing data mining technology can meet the needs of different applications.However,the complexity of network data produces a huge challenge to the traditional data mining algorithm which cannot be used to deal with these complicated data efficiently.The classification accuracy of the traditional KNN algorithm is not good for the four UCI datasets,therefore,it is urgent and necessary to study and improve KNN algorithm.
Keywords/Search Tags:Web search data, Aging population, Spatial analysis, Forecasting, Improve KNN algorithm
PDF Full Text Request
Related items