With the rapid expansion of our country’s urban infrastructure construction,residents have more diversified travel modes,and the scope of travel continues to expand.In order to optimize the efficiency of urban operation and improve the travel experience of residents,the management and research of traffic information is urgent.Urban taxis are the most frequent means of transportation,and the big data of taxis that are generated every hour of the day have extremely high value for information extraction and analysis.With the exponential growth of computer computing power and the continuous update and optimization of related algorithms in the field of artificial intelligence,the latest technology and algorithms are used to research and mine the deep information contained in the large number of taxis,statistically analyze the laws of various data,and explore Urban hotspot areas and forecasting taxi demand based on hotspot areas are of great significance to the improvement of taxi operation efficiency and the construction of smart cities.Based on the above requirements,this thesis starts with the big data of taxis and does the following work.First of all,this thesis cleans and processes the content and characteristics of the original data of the big taxi data;uses the coordinate data conversion algorithm to convert the original WGS-84 GPS data into GCJ-02 and BD-09 geographic coordinate data;The ray inverse address resolution algorithm solves the problem of judging the administrative area of the GPS coordinate point,and optimizes the algorithm for accuracy.Records and marks the points with fuzzy regional discrimination,and changes the direction of the ray to judge again.After experimental comparison,compared with the related functions of map service providers,the algorithm has faster discrimination speed and economy,and the optimization algorithm has a greater improvement in accuracy;finally,the processed data is performed in the dimensions of time and space.Feature extraction,research and analysis.Secondly,starting from the theory of clustering algorithms which base on GPS data,this thesis study the classification of clustering algorithms,and the principles and steps of K-means,BIRCH,DBSCAN and OPTICS under different clustering classifications.Through the realization of the above clustering algorithms,comparing and analyzing the differences,advantages and disadvantages,an OPTICS+K-means algorithm is proposed.This algorithm solves the problem of non-spherical clusters based on the density clustering algorithm and clustering based on the partitioning method.This kind of algorithm cannot eliminate the problems such as noise points in non-hot spots.Comprehensive evaluation,this algorithm is more suitable for the clustering of hot spots in taxi big data.Finally,by studying time series analysis models,ARIMA,SARIMA and LSTM algorithm models are constructed respectively,and the above three models are applied to the hot spots obtained by the OPTICS+K-means clustering algorithm.Three models carry out time series forecasts on demand in hot spots,and through comparative analysis,it is concluded that the LSTM model forecast results are more suitable for real data,having higher accuracy and smaller errors,which proves that the LSTM model are more suitable for the demand prediction of taxi big data hotspot areas. |