Font Size: a A A

Research On Big Data Clustering Analysis Method And Application Of Public Transportation Operation

Posted on:2019-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:M HuangFull Text:PDF
GTID:2392330578468514Subject:Agricultural informatization
Abstract/Summary:PDF Full Text Request
Intelligent public transportation development is one of the main components of today's Smart City construction.Using big data analysis technology to help solve traffic problems,improve public transport efficiency,improve public transport environment,and achieve the Smart Transportation is the appeals and trends of development of intelligent public transport industry.Hadoop is an open source distributed cloud computing platform that meets the needs of big data processing and is suitable for distributed processing of large data sets.Hadoop is widely used in data mining.Data mining technology uses Hadoop platform as a data processing carrier to improve data processing efficiency.The data mining algorithm uses the popular K-means clustering algorithm to effectively process large data sets,which is easy to use,but its k-value artificial selection and random selection of the initial cluster center make the algorithm have large instability,and the algorithm calculating the distance from each data sample to the centroid will result in more computational redundancy and greatly reduce the computational efficiency of the algorithm.In the context of big data,the research on the K-means clustering algorithm of Hadoop data analysis system and data mining algorithm around intelligent public transportation,the main work and innovations of this paper are as follows:(1)Aiming at the problem of K-value selection of traditional K-means clustering algorithm and random selection of initial cluster centroid in the data mining algorithm to be applied in data analysis,an improved Canopy-Kmeans algorithm is proposed:in Canopy algorithm.The principle of "median and maximum distance product" is used and the distance calculation method is added in the iterative process of K-means algorithm.The simulation experiment of the improved algorithm was carried out by using Matlab.Experiments show that the algorithm has better timeliness and accuracy.(2)According to the development status of intelligent public transportation,the current situation of mass transit,decentralization,multi-source and heterogeneity of public transportation data is analyzed.According to the advantages of massive data storage,analysis and cluster expansion of Hadoop cluster,the data is designed and implemented.Acquisition,data storage and data analysis,and results visualization show four modules of data analysis and processing system based on Hadoop framework technology.(3)Using the established data analysis system,the historical operation data of Xiantao Public Transport Company was analyzed and analyzed:K-means clustering algorithm was used to classify bus routes according to passenger trips and revenues;using MapReduce programming calculation model for bus companies In the first quarter,the number of passengers on the bus,the daily income of the bus line,and the 24-hour passenger fluctuations were analyzed.The experimental results prove that the system is practical and effective.At the same time,the results of data analysis have certain guiding significance for the operation of bus companies.(4)The Canopy-Kmeans algorithm model based on MapReduce improvement is designed.The improved Canopy-Kmeans clustering algorithm is used to analyze the historical data of bus company history in the data analysis system,and the performance of the parallelization algorithm is analyzed.Application and performance analysis results show that the improved algorithm can perform good parallelization and is more efficient.
Keywords/Search Tags:Intelligent transportation, Big data, Clustering analysis, K-means, Canopy-Kmeans
PDF Full Text Request
Related items