Font Size: a A A

Modeling Urban Residential Land Price Distribution Based On Big Data And Machine Learning

Posted on:2022-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:1529306743981869Subject:Land Resource Management
Abstract/Summary:PDF Full Text Request
Timely and accurately monitoring the distribution and dynamics of urban residential land price(RLP)is of great significance for scientifically grasping urban real estate market trends,effectively controlling the balance of residential land supply and demand,rationally optimizing the layout of urban residential space,and promoting the high-quality development of urbanization.As a relatively static land price standard,the current urban land price system based on the benchmark land price and marked land price can reflect the urban RLP at a certain point and has a certain timeliness.With the rapid development of China’s social economy and the continuous expansion of urban space,urban infrastructure and environmental conditions are rapidly updated,which leads to drastic changes in the spatial structure of urban RLP.Therefore,it is an important challenge for us to grasp the dynamic changes of land price quickly and accurately and put forward effective coping strategies in time.At the same time,the arrival of the era of big data and the rapid progress of information mining technology makes urban governance and research tend to be more refined and quantitative.Under this development trend,how to timely and accurately model urban RLP and its dynamics to serve the refined management of land use and smart city construction has become the focus of academic attention.On the basis of reviewing relevant domestic and foreign literature,this study aims to model the urban RLP distribution at the block level from the perspective of spatial prediction by using geographic big data and machine learning methods.With the aid of data mining methods such as GIS spatial analysis and deep learning,urban residential land sales data and geographic big data such as urban points of interest(POI)and areas of interest(AOI),and Tencent street view images were used to map the location,neighborhood,and visual environment of urban residential land,to build a multi-level variable system for block-level RLP prediction.Then,nonlinear machine learning algorithms and ensemble learning methods were used to develop block-level RLP prediction models,and empirical study was carried out using Wuhan city as an example to test the validity and reliability of the model.Finally,based on the RLP data predicted by the machine learning regression model,the RLP at the block level in Wuhan was mapped,the contribution of several geographic variables to the RLP prediction was measured,and the nonlinear response of RLP to the change of prediction variables was analyzed.The main conclusions are summarized as follows:(1)From the perspective of spatial prediction,a method system of modeling RLP distribution at the block scale using big data and machine learning was constructed.The method system includes three core contents:building block-level RLP prediction samples using limited land transaction sample data,constructing a multi-level variable system for RLP prediction supported by big data,and developing RLP prediction models based on machine learning methods.In constructing the block-level RLP sample,aiming at the problem of insufficient land price samples in the block unit,the average value of the land sample price in the block unit certain buffer was calculated and used as the land price of the block.And the results show that the spatial distribution trend of land prices in sample blocks is basically consistent with that of land prices in sample sites,indicating that the constructed block-level RLP samples are reliable.(2)Using open-access geospatial big data,a multi-level variable system for block-level RLP prediction was constructed.From the perspective of residents’demand and perception of facilities,the distance from the block to the nearest facility,the number of facilities within the 15-minute living circle,the density of facilities within the urban group and the spatial visual quality of surrounding streets were measured to map location,neighborhood and environmental characteristics of residential land.Based on this,several geographical variables were extracted at the block scale to predict RLPs.The characteristic factor measurement method used in this study considers people’s needs,so it can reflect the realistic considerations of housing selection,and at the same time,it also saves the cost of surveying the scale of the facility and its scope of service.The implementation of this method benefits from the explosive growth of urban geospatial data and the rapid progress of data mining technology.(3)The block-level RLP prediction model was constructed using nonlinear machine learning methods.Based on the same RLP sample data,four nonlinear machine learning regression algorithms,two linear regression algorithms and stacking ensemble learning methods were used to develop RLP prediction models at the block level respectively,and their accuracy was compared and analyzed.The results show that,compared with the linear regression model,the three nonlinear models of radial basis function-based support vector regression(RBF SVR),extra Tree regression(ETR)and random forest regression(RFR)perform better,with R~2 ranging from 0.729 to 0.814,of which the RBF SVR model performs best.Using the stacking ensemble learning method to combine individual machine learning algorithms can effectively improve the RLP prediction accuracy.The stacking#5 model that integrates RBF SVR,ETR and RFR algorithms has a smaller prediction error than any of the three models,and R~2increases from the maximum value of 0.814 to 0.828.(4)The distribution of RLP generated by Stacking#5 regression model is similar to that of the exponential function-based ordinary Kriging interpolation model,and both can well map the polycentric pattern of RLPs in Wuhan.From the prediction difference of the two models,the frequency distribution of the difference is approximately normal,which indicates that the probability of large error between the prediction results of the two models is very small.In view of the fact that the reliability of ordinary Kriging interpolation has been tested in practice,this result also indirectly shows that the Stacking#5 regression model is reliable.In local areas with few samples of land price,the ordinary Kriging interpolation model often produces certain prediction errors.In this case,the prediction results of the machine learning regression model can be used as a supplementary reference.(5)Based on the block-level RLP map predicted by the machine learning regression model,the spatial distribution pattern of RLP in Wuhan was analyzed.The results show that,with the increase of the distance from the city center,the RLP in Wuhan presents a nonlinear fluctuation attenuation trend of"rapid decline,slow decline,steady fluctuation,slow rise,and slow decline again".In addition,the RLP in Wuhan presents an inverted U-shaped distribution trend of"high in the middle and low on both sides"in the east-west and south-north directions;however,due to the existence of urban sub-centers,the RLP in the eastern,southern and northern edges of the study area have all increased to a certain extent.From the perspective of RLP distribution,as of2020,the polycentric urban spatial structure of“one chief,three deputies”had taken shape in Wuhan.(6)Based on the machine learning regression model,the importance of prediction variables was measured and compared using permutation-based approach,and the nonlinear response of the RLP to the change of the prediction variables was analyzed using partial dependence analysis methods.The results of the relative importance of the variables show that:the importance of variables associated with commerce,education,and medical facilities is relatively high,while the importance of variables associated with public transportation is relatively low.In addition,the impact of the same type of facilities on RLP has a scale effect;for example,compared with the density of commercial facilities within the urban group or area,the density of commercial facilities within the 15-minute living circle contributes more to the RLP prediction.The partial dependence curve of the variable shows that:For most of the"distance variables",only within a certain range,the RLP decreases with the increase of the distance from the nearest facility,and the distance attenuation effect does not appear after the distance is exceeded.For most of the"number variables",within a 15-minute living circle,RLP first increases with the number of facilities and then stabilizes.For most of the"density variables",the RLP shows a trend of"first slowly increasing,then rapidly increasing,and finally becoming stable"with the increase of facility density in urban group or area.In addition,within the 15-minute living circle,the RLP increases with the increase of green space coverage,and when the coverage rate reaches 0.25,the RLP tends to stabilize.For the variable of green view index(GVI),when the average value of the GVI within the 15-minute living circle is greater than 0.15,RLP increases rapidly with the increase of the GVI.In summary,the urban RLP distribution modeling method constructed in this study can provide new ideas and means for urban land price evaluation in the era of big data.The urban land price distribution modeling method based on big data and machine learning can quickly and accurately map the urban land price,which can provide technical support for the establishment of dynamic land price monitoring system.The main innovations of this study are as follows:(1)from the perspective of residents’demand and perception of facilities,a multi-level variable system for predicting RLPs at the block scale was constructed.With the help of data mining methods such as GIS spatial analysis and deep learning,urban residential land sales data and geographic big data such as urban POI and AOI and Tencent street view images were used to map the location,neighborhood,and visual environment of urban residential land.Based on this,several geographical variables were extracted at the block scale to predict RLPs.(2)the block-level RLP prediction model was constructed using nonlinear machine learning methods.In this study,nonlinear machine learning methods were introduced to fit the complex interaction between RLP and its prediction variables,and several block-level RLP prediction model were constructed.Aiming at the limited RLP transaction samples,the SVR algorithm suitable for small sample size was selected to predict RLP.At the same time,the stacking ensemble learning method was used to combine individual machine learning algorithms to improve the RLP prediction accuracy.Furthermore,the relative importance of the prediction variables and the nonlinear response of the RLP to the changes of the prediction variables were measured and analyzed.This can provide a theoretical reference for understanding the formation mechanism of urban RLP.
Keywords/Search Tags:Urban residential land price, Spatial prediction, Big data, Machine learning, Wuhan city
PDF Full Text Request
Related items