| Commuting is an essential part of urban daily life,among which commuting production,commuting attraction and commuting distribution with the relevant findings are seriously concerned by traffic demand forecasting and are of great significance for policy-making and urban construction project-evaluation.In recent years,with the further development of machine learning in big data era,researchers have explored new models for commuting generation and distribution using big data represented by mobile signaling data and promising technologies represented by machine learning.Although their explorations make up for the shortcomings of the traditional household travel survey data which is cost-and time-effective,and improves the problems of unreasonable assumptions,inflexible structure and low estimation accuracy of traditional approaches to some extent,most studies merely use big data as an substitution for traditional data through adding more impact factors to regression models or gravity models,and refine machine learning to household travel survey data,resulting in that the potential of them to estimate and predict commuting volume is not fully exploited.Moreover,the existing studies lack quantitative discussion on how the factors influence commuting generation and distribution.Based on machine learning technologies and multi-source big data including mobile signaling data,POI,transit data,land use data,trip planning data,housing price data in Kunshan downtown area,this thesis proposed promising approaches for modeling commuting generation and distribution integrating Random Forest in machine learning domain and multi-source data.Then this thesis identified the key factors and analyzes the complex nonlinearities between variables.The findings could be useful for policy-makers when it regards to traffic management,land use planning and other related policies as well as serve the diverse commuting demand analysis and forecasting in Kunshan.The main contents,results and conclusions are as follows:(1)Establish a feature set suitable for modeling generation and distribution.Following the process of feature engineering,various independent variables for commuting production,commuting attraction and commuting distribution were respectively classified into 6 categories including demographic features,built environmental features,socioeconomic features,commuting-related features,location-related features and area-related features.Then 16,19 and 16 features that influence the performance negatively were selected for the above 3 models using the Mean Decrease Accuracy method.(2)Propose new approaches for modeling commuting generation and distribution.Taking commuting production,commuting attraction and commuting distribution as the dependent variables,this thesis estimated them by Random Forest.The results show that:(1)Commuting production model and commuting attraction model had high precision and generalization ability with R~2 reaching 0.81 and0.69 respectively.(2)Commuting distribution model showed poor reliability and severe over-fitting cases,however,with its R~2 less than 0.2,which indicated that the precision and quality of the current mobile signaling data are not good enough to support for modeling commuting distribution with the relatively small area unit.(3)R~2 between the actual and estimated commuting production and attraction was 0.87 and 0.78 respectively,indicating reliable estimation results for them.Besides,multicollinearity did not affect the estimation results.(4)Errors mainly came from the limitation of research data or research scale as well as the lack of consideration of the impact bright by green space,which were manifested in an underestimation of commuting generation in dormitory-factory areas,a slight overestimation of that in the inner city and the peripheral area in Huaqiao,and an overestimation of the commuting production in where there were large water or green land.(3)Identify the key factors impacting on commuting generation.By analyzing the Variable Importance(VI),this thesis found that the key factors for commuting production and attraction are similar and the cumulative importance of the top seven factors for them were both above 94%.Specifically,population density and employment density are the most critical factors for commuting production and attraction with the importance of 0.60 and 0.52 respectively.Population density,employment density,unit area,average commuting distance,average transfer times and average rent are key factors for the two.Moreover,the average housing price and the construction density have a certain impact on commuting production and attraction respectively.(4)Analyze the nonlinear relationship between the key factors and commuting generation.By taking insights into the numerical distribution,changing trend and spatial distribution of Variable Contribution,this thesis found that:(1)Population density,unit area and employment density determined the commuting generation to a large extent while the other factors would modify the results.This was mainly because the contribution range together with the effective range of the above 3 factors were wide which meant that the output values of Random Forest could change from the average of input values to the estimated values,while those of the other factors were relatively narrow.(2)The influence of key factors on commuting generation had different nonlinear trends with their values increasing.The influence of population density and employment density increased fast in an almost linear way when their values did not exceed 50000 people/km~2 and 70000 people/km2 respectively,and then remained unchanged,which resulted from the restriction of unit area.The influence of average transit transfer times and average commuting distance decreased slightly at first,then increased rapidly and finally remained stable with two turning points of 0.8 times and 5km,1.3 times and 9km,which were related to the degree of transit development and the type of commuting distance in the corresponding unit.As for the influence on commuting attraction,the trends for population density,employment density,average transfer times and average commuting distance were consistent with those in commuting production analysis,however,the influence thresholds of the former two factors changed and the latter two factors contributed more to commuting attraction than production.The influence of construction density showed a negative association with its value,but its negative impact was slight in general.(3)There were spatial differences in the influence of key factors on commuting generation and the differences were related to the influence trends,spatial distribution of commuting generation,values of the factors,etc.Areas with great impact of population density were consistent with those where production density and population density were high,i.e.the residential areas in inner city,along West Chengbei road and other main roads surrounding the inner city.Similarly,Areas with great impact of employment density were corresponding to those where attraction density and employment density were high and were extended to where there were amounts of enterprises outside the inner city.The influence of average transfer times and average commuting distance showed core-periphery spatial patters with their influences lower inside than outside,which corresponded to their changing trends.The negative impact of the construction density was mainly distributed in the inner city and the surrounding residential areas,while the positive impact was mainly distributed in the industrial parks,which was a result of the high construction density with a small commuting attraction in the inner city,or the low construction density in industrial parks attracting amounts of commuting.The whole thesis contains about 51 438 words,123 pictures and charts. |