Font Size: a A A

Research On Population Spatialization Based On Multi-source Data

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZouFull Text:PDF
GTID:2370330626958554Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of urbanization,the continuous growth and rapid agglomeration of the urban population have brought many challenges for the fine management of the city.As a typical oversize city in China,the developed economic level has led to the huge population base in Shanghai,which has caused more complex population distribution than that of small and medium-sized cities.Under the severe situation of population development,it is important to establish a fine system for monitoring the spatial distribution of the population.This measure has practical significance in improving the level of comprehensive urban management and assisting the construction of the intelligent city.The precondition of monitoring the urban population is to obtain the spatial distribution data rapidly and accurately.Based on combing and summarizing the previous research results,we obtained the detailed population distribution data of the500-megabit grid-scale in Shanghai in 2010 based on the traditional spatial regression model and the advanced machine learning algorithm.In this process,two models were constructed by integrating multi-source and high-precision data.First of all,we adopted a great volume of spatial data from land use,nighttime lights,points of interest,road and building information,and then extracted many features closely related to the spatial distribution of population such as land use type area proportion,night light brightness,points of interest density and so on.After screening features for the model used multicollinearity diagnosis results,we created the population model of spatial distribution based on spatial lag regression and random forest algorithm respectively.In addition,the random forest training model was applied to predict the population distribution of Shanghai in 2017.By establishing an accuracy verification system consisting of qualitative and quantitative dimensions,these two types of population spatialization models were compared and analyzed.On this basis,we comprehensively evaluated the two models based on the characteristics.After evaluating the accuracy of the two models,the random forest model with higher simulation accuracy was explained and analyzed in detail from three aspects:feature importance,the relationship between binning features and population distribution,and feature contribution.The main achievements are as follows:(1)The model of population spatial distribution based on multiple category cofactors and spatial lag regression method worked well with R~2 at 0.86.The accuracy verification results show that the population spatialization result based on the spatial lag regression model has higher simulation accuracy than the LandScan and CNPOP datasets,but slightly inferior to the GPW and WorldPop population datasets.(2)The model of population spatial distribution based on the multi-dimensional feature library and random forest algorithm has a better simulation result than the spatial lag regression model,with R~2 reaching 0.98.The accuracy verification results show that the population spatialization result based on the random forest model has obvious advantages over all the simulation datasets.Based on the training model by using the same feature vector,the prediction result of Shanghai's population distribution in 2017 also has a good prediction effect.Compared with the validation datasets,this result has certain advantages in various accuracy validation indicators at the district level.(3)The qualitative analysis results of the two population spatialization models show that the population distribution in the spatial lag regression result exhibits the characteristic of concentrated clustering,while it mostly exists in the form of aggregation points and behaves more details in the random forest simulation result.The linear regression R~2 of the estimated population and statistical data at the street level are 0.46 and 0.71,respectively.Quantitative analysis results show that the accuracy of the random forest simulation result is higher,and the phenomenon of population underestimation and overestimation in the study area has improved significantly.In the low,medium and high levels of population density,the accuracy of the random forest model is higher than the spatial lag regression model.And the accuracy has the most significant increase in areas with high population density,such as the central urban area of Shanghai.(4)The interpretation of the random forest model shows that among the modeling features,the distance to the POIs of catering,living services and education,POI density,night light brightness,residential houses and building years are the features of high importance.We have found that these important features,whose contributions demonstrate different trends with feature values increasing,have both positive and negative contributions to the model.In addition,the contribution values of these features behave obvious spatial differentiation characteristics in space.In summary,this paper studied the method of using multiple sources of high-precision data to obtain population spatialization results at the fine grid-scale,and comparatively analyzed the fitting effect of different population spatialization models under the same modeling factors.It can provide new ideas and methods for fine population research in the big data era by fusing multi-source data.
Keywords/Search Tags:population spatialization, spatial regression, random forest model, refined model, feature contributions
PDF Full Text Request
Related items