| With the development of economy and the improvement of people’s quality of life,the speeding up of urbanization also brings a series of questions.The car ownership in the cities emerges rapid growth,so the traffic congestion at rush time of weekday and also holiday has affected people’s normal life.In this thesis,we study how to use scientific methods to analysis the congestion status,provide real-time traffic congestion areas’ information for the police or the people and make it convenient to take appropriate measures to deal with the problem in time.This thesis compares and analyzes the current popular big data processing platform and framework,based on the understanding of the relevant technology,has designed a real-time big data processing system solution which can real-time collect,process,analyze and display the traffic GPS data.Based on the Hadoop platform,with Kafka,it can carry out real-time collection and transmission of traffic GPS data.In addition,it has also improved CluStream,the data stream clustering algorithm,based on Spark Streaming stream data processing framework,for data clustering analysis,and then push the processing results into the MySQL database and Redis database,it access data through the HTTP protocol and the WebSocket protocol,finally,display the real-time or historical traffic congestion area in the browser.The main work of this thesis is as follows:(1)According to the different needs of users in the traffic congestion area,put forward a storage and access optimization strategy,combining MySQL database and Redis database.Based on different storage advantages of MySQL and Redis,the memory access service is more refined.Using HTTP protocol and WebSocket to classified access historical data and real-time data,providing different ways of storage access according to different needs from users,providing a targeted,high quality and high efficiency processing strategy for the system.(2)According to the issues of data processing timeliness and the algorithm parameters sensitive of CluStream data stream clustering algorithm,put forward an improved CluStream algorithm,combined with the variable length sliding window and genetic algorithm improvement.In this case the processing of data flow has better real-time performance and reliability.(3)According to the characteristics of large amount and high real-time requirement of traffic GPS data,the parallel strategy of CluStream data stream clustering algorithm is improved.With the help of the distributed computing framework of Spark Streaming,the improved CluStream algorithm,real-time and efficient clustering process are realized.The big data real-time processing platform designed for urban traffic congestion area in this thesis,has been tested and analyzed.The results show that the framework put forward by the thesis can provide real-time,fast and accurate processing and analysis the urban traffic congestion area.The real-time and history data of traffic congestion area of urban can be displayed in the platform,which can provide reference for safety management,people’s travel,traffic management and other aspects,so as to ease the urban traffic congestion problems. |