Font Size: a A A

Research On The Application Of Web Search Data On Predicting Real Estate Price Index

Posted on:2017-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y D TangFull Text:PDF
GTID:2309330482989053Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years with the popularity of the Internet and the rapid development of information technology, people’s lives have become increasingly inseparable from the network. People can search for news and interesting information using the search engine,also can use instant messaging software to chat, such as Tencent QQ, We Chat and so on.The Internet has become a huge database. Web searching market data contains more than300 million principal interest and concern. They could reflect customers’ behavioral trends and laws, and provide the necessary foundation for the study of micro data on macroeconomic issues.The real estate industry is a pillar industry of the national economy. It is significant to study the Chinese housing prices to the life for people and the development of social economy. Beijing is China’s economic, political and cultural center as well as has very high degree of aggregation of the current population. Beijing’s housing prices not only increasingly affect the normal life of the residents, but also is about the stability of society.So the National Bureau of statistics released housing price index of 70 large and medium cities, to enable people to understand the changing trend of prices, also to provide data for researchers. In this paper, the author use of Beijing new residential price index and network search data for research.This paper establishes a conceptual framework based on the theory of equilibrium price and conduction delay theory, considering the supply and demand of the real estate market from the micro and macro factors. In the process of research, the text mining method is used to deal with the information of house price news on the Internet. This paper uses some of the methods to expand the initial key words, such as the long tail keywords,demand map and so on. Then the author screens the key words based on the correlation coefficient and the leading orders between web search data and the new housing priceindex of Beijing, utilizing on the Pearson correlation coefficient and the time difference correlation analysis method. This paper selects keywords whose absolute value of correlation coefficient is above 0.5 and then uses K-means clustering method and principal component analysis to process the keywords to get measures of Beijing new housing price index. And this paper obtain explanatory variables that can represent class information most, using K-means clustering method to classify the keywords in the empirical analysis,and uses the principal component analysis method form two composite indicators as macro and micro integrated indicators. Finally the paper establishes two regression models between the price index of the new housing and the network search data in Beijing, and compares the goodness of fit and the prediction accuracy of the two regression models.Conclusions as follow:(1)For keywords of micro-economic factor, most people usually search information about one year ahead, such as information second-hand housing and housing property management information, and for the macro-factor, interested buyers usually search information about six months to a year in advance, to pay close attention to the price, the wage level, the education level in the vicinity of the housing and so on.(2)Beijing housing price index of first-order also has significant explanatory power to price index.(3)The goodness of degree is 0.86, which model established by K-means cluster analysis. We predict Beijing housing price index from August to December in 2015 with this model. Then through comparing the true value and the predicted value, we get mean absolute error of 0.234. However, we get the goodness of fit of the model established by the principal component analysis method was 0.82 and the mean absolute error of 0.309.We found that selecting and optimizing the key words used cluster analysis could get better fit model and less error prediction results.
Keywords/Search Tags:Web search data, Baidu Index, K-means clustering, Text mining
PDF Full Text Request
Related items