Font Size: a A A

Research On Random Forest Algorithm Based On Hadoop And Design Of Traffic Flow Prediction System

Posted on:2020-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X MaoFull Text:PDF
GTID:2392330596996999Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the changes of the times and the rapid development of science and technology,vehicles are becoming more and more abundant,but cars are still the primary way for people to travel.In recent years,unprecedented changes have taken place in the field of intelligent transportation.The deep excavation of massive traffic flow data has received extensive attention in the industry.With the rapid development of traffic data collection technology and the diversification of collection methods,traffic data is increasing rapidly.By the end of 2018,there were 11 cities with a car ownership of nearly 3 million in China,and the traffic flow data generated every day in medium-sized cities reached hundreds of terabytes.Faced with massive traffic flow data,high-quality data pre-processing and fast and efficient traffic flow prediction can provide a basis for later traffic control.Hadoop has received a lot of attention from the industry as soon as it is proposed.Its core content,HDFS and MapReduce,provide researchers with efficient data storage capabilities and distributed computing models.The Hadoop-based parallel random forest algorithm(MapReduce Random Forest,MR_RF)is a clever combination of random forest algorithm and MapReduce programming mode.Because the super-parameter adjustment requirements of random forests are not high,the use is convenient,and the classification accuracy is high,so this thesis deeply studies its algorithm ideas.Compares the difference between MR_RF algorithm and RF in the study process.The MR_ONRF model is proposed to optimize the split node to improve the MR_RF algorithm,and improve the accuracy of traffic flow data prediction in this thesis.This thesis mainly completes the following research work:1.This thesis elaborates on the research status of Hadoop platform,random forest and traffic flow prediction.This thesis introduces the related technologies of Hadoop platform in big data environment,and then introduces Hadoop's MapReduce computing principle and operation mechanism in detail.And the parallel random forest MR_RF in the programming mode.Finally,this thesis compares the advantages of MR_RF compared with RF.2.In the process of data flow collection,transmission and storage of traffic flow,data will inevitably have errors.The use of original traffic flow data as predictive data will bring unnecessary errors,so we preprocess the data before forecasting.Thepre-processing is first to discriminate error,loss and redundant data,and then modify these three types of data,to correct the error data,repair the lost data,and simplify the redundant data.Due to the massive traffic flow data,this thesis uses the MapReduce programming mode of the Hadoop platform to process massive data blocks.3.In the process of creating the decision tree,because the advantages and disadvantages of the split nodes will affect the final forest results,the MR_ONRF combined multi-splitting algorithm is proposed,and then the best split nodes are calculated for splitting,and the decision tree splitting algorithm is optimized.Comparing MR_ONRF proposed in this thesis with MR_RF and traditional RF,the MR_ONRF in this thesis has improved prediction accuracy and efficiency.4.On the basis of theoretical research,MR_ONRF is applied to the traffic flow prediction platform.The traffic flow prediction prototype system is designed and implemented,and the data is quantified into travel reports,which provides an efficient decision basis for the subsequent traffic control.
Keywords/Search Tags:Hadoop, traffic flow preprocessing, random forest, decision tree, MR_RF, traffic flow prediction
PDF Full Text Request
Related items