Font Size: a A A

Research On Bus Passenger Flow Forecasting Method Based On Spark Platform

Posted on:2018-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhaFull Text:PDF
GTID:2322330512483264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Urban public transport is an important part of urban construction and social life,and has a far reaching and comprehensive impact on urban economy and lives of residents.However,inefficient traffic resource utilization,traffic congestion and pollution have been becoming more serious,these problems directly influence people's fundamental interests.Bus passenger flow forecasting is a scientific measure,can provide important information to urban public transport policy development,system planning and operation management.It also can help public transport managers to develop a reasonable bus operation plans and policies.Bus passenger flow forecasting is an important way to improve the utilization of traffic resources,and play a very important role on alleviating traffic congestion and reducing traffic pollution.Random forest is a combination model based on multiple decision trees,which has more advantages than other algorithms.However,in the stand-alone mode,the decision tree construction and prediction voting process of random forest are serialized and have a lower operation efficiency.When facing the large amount of data,the random forest algorithm in the traditional stand-alone environment consumes a lot of time.Spark is a distributed computing platform that can process massive amounts of data easily,making large-scale,distributed iterative calculations become possible.This paper combines the advantages of random forest and Spark,and uses random forest as the bus passenger flow forecasting model and Spark as the parallelization platform of random forest.Based on the existing IC credits data,this thesis analyzes the regular pattern of passenger flow by extracting useful information,Researching the features of time distribution and dynamic influencing factors of passenger flow.This paper discovered the rules of bus passenger flow on weekdays and weekends,and also discovered the influence of weather,temperature and holiday.In order to solve the problem of random forest in the stand-alone mode,this paper proposes a parallel forest parallelization method based on Spark platform,which realizes the parallelization of two processes of random forest,one process is building decision tree and another one is voting.The experimental results show that the efficiency of parallel forest is better than that of traditional forest environment.Besides,by comparing the experimental results of multiple regression models,it is found that random forest has achieved good achievement in model fitting and prediction accuracy.Existing improvements on random forests are mostly used in classification issues,and there are few studies on the improvement of regression issues.This paper summarizes the previous research experience,puts forward the method of calculating the similarity of random forest samples,and optimizes the voting process of the random forest based on the calculation method,putting forward the weighted voting method.At the same time,this paper proposed the improved feature selection algorithm,which can reduce the feature subset when doing feature selection and reduce the influence of the unimportant feature.The experimental results show that the accuracy of the improved forest model is better.
Keywords/Search Tags:Bus passenger flow forecasting, regression, random forest, Spark, parallelization
PDF Full Text Request
Related items