Font Size: a A A

Research And Implementation Of Smart City Housing Price Appraisal System Based On Spark

Posted on:2018-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J T CheFull Text:PDF
GTID:2348330518998646Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The aim to construct smart cities is to solve the problems encountered in the process of urban development,such as over-reliance on manual work,design which is not intelligent enough,prediction which is not accurate enough,etc.The healthy development of the property market is an important part of the construction process of smart cities which is closely related to the people’s livelihood.On the one hand,the professionals,who have necessary experiences and domain knowledge,are relied on to offer subjective analysis and appraisal to housing price.Due to the relatively small number of lack of mathematical model,objective appraisal can not be given and labor costs is relatively high.On the other hand,in the early urban information construction process,the government units will establish the relevant business system on the basis of needs.However,these business systems still have the following problems: With the rapid development of the city scale and the further enrichment of the diversity of demand,the amount of data increases dramatically,the storage space is facing bottlenecks;In addition,the significant information contained in the data of these systems hasn’t been effectively excavated.Therefore,the traditional approach has been unable to meet these needs.It’s the key to solve the above-mentioned problem to study a system which can support large-scale storage and analyze the housing data effectively.Based on the above needs and problems,this paper studies and implements the storage and analysis system of housing price appraisal based on the characteristic price theory,large data processing technology and random forest and linear regression algorithm in machine learning.First,in order to provide more accurate attribute information for housing price appraisal,a fast data acquisition tool is designed and implemented.Second,in order to solve the problem of the sharp increase in the amount of data,a Hive-based real estate data warehouse is implemented.The data warehouse can not only import the data related to housing price appraisal in the original database all at once,but also import the data in the database incrementally and periodically.Then,the pretreatment process of the whole set of data is designed and realized,and the correlation analysis is carried out on the characteristics of the house after the pretreatment,so as to remove the attribute with little correlation with the housing price.And finally input data set of the housing attribute,suitable for the machine learning algorithm,is gotten.Then,on the Spark platform,the linear regression and stochastic forest model of housing price evaluation are constructed according to the hedonic characteristic price theory,and the data in the data warehouse are used to cross validation and tuning.As a result,the model parameters is selected,which can balance the error between the forecasted and the real housing price of the time and the time needed in constructing the model.At the same time,in order to improve the performance of access to housing price appraisal results,a set of data structures based on Redis and access interface are designed and implemented.Finally,the price appraisal model is published in the form of RESTful Web Service for convenience of the user.In this paper,a processing framework for big data is built on the basis of virtual machine,including HDFS which is a high availability distributed file system based on Zookeeper,Sqoop which is a data ETL tool,Hive which is a data warehouse tool,Spark which is a distributed memory computing framework as well as Redis which is the memory database for the cache of the results.The experimental results show that the random forest model is more suitable for the problem of housing price appraisal than the linear regression model with 893200 sample data for tuning.When the parameters of the random forest are selected,the average absolute error between the estimated housing price and the real price is less than 0.03,and the time for the two algorithms to build the model is within acceptable limits.The random forest model can be good at learning the characteristics of housing data,so the random forest model for price appraisal is adopted in the real environment.Finally,the four modules and performance of the price appraisal are tested and the main test process was recorded in tabular form.The detailed test results and evaluation criteria fully demonstrate that the demand for housing price appraisal has been basically completed.
Keywords/Search Tags:Smart City, Housing Prices Appraisal, Random Forest, Hedonic Price Model, Spark
PDF Full Text Request
Related items