With the continuous acceleration of urbanization and the increasingly complex travel needs of residents,taxis have become an indispensable part of the urban transportation system,providing convenient personalized services to meet the diversified travel needs of urban residents,and have become an important mode of transportation.However,the randomness of urban residents’ travel leads to uneven distribution of demand,the lack of optimization in taxi driving results in high empty load rates,and the increasing demand and number of taxis can cause problems such as supply-demand conflicts and traffic congestion.Therefore,accurate prediction of taxi demand is of great practical significance.Currently,taxis are commonly equipped with GPS devices,which generate a large amount of taxi GPS data during driving,including time,location,and other information.This provides the possibility of using deep learning methods for taxi demand forecasting.The emergence of the Hadoop platform has solved the problem of storing and processing large volumes of big data,enabling the mining of a substantial amount of taxi GPS data.This thesis aims to effectively utilize taxi GPS data by exploring hot areas of taxi demand based on the analysis of spatiotemporal demand characteristics using the Hadoop platform.The ultimate goal is to predict taxi demand in these hot areas and gain insights into their patterns and trends.In the research of taxi data preprocessing and spatiotemporal demand feature analysis,firstly,a Hadoop data processing platform is built to achieve distributed storage of taxi GPS data.The Map Reduce programming model is then used to complete preprocessing operations on the original taxi data,including coordinate conversion,secondary sorting,and data cleaning.Based on this,pick-up and drop-off location data are extracted.Subsequently,statistical methods are employed to calculate the demand for taxis and the duration of passenger trips in different time periods within the study area using the preprocessed data.Additionally,the temporal and spatial distribution of taxi demand on weekdays and weekends is analyzed.In the research of mining hotspots of taxi demand,the K-Means algorithm is improved to address issues related to the initial cluster number and the sensitivity of the initial cluster center.The contour coefficient method is employed to determine the number of initial clusters,while the maximum-minimum distance method is used to select the initial cluster center.To enhance the clustering effect of the K-Means algorithm,the similarity between data objects is calculated using the Gini index-weighted Manhattan distance metric.Experimental results demonstrate an improved accuracy of the algorithm.Additionally,Map Reduce parallelization is applied to the improved K-Means algorithm,leading to enhanced efficiency of the parallelization algorithm.Based on the analysis of spatiotemporal demand characteristics,the improved K-Means parallelized clustering algorithm is applied to process the taxi passenger location data,enabling the mining of taxi demand hotspots during peak hours on weekdays and weekends.In the research of taxi demand forecasting in hotspots,with a focus on predicting taxi demand during peak passenger hours in hotspot areas and considering the temporal and spatial characteristics,the activation function of LSTM in the CNN-Bi LSTM model is enhanced.Subsequently,a taxi demand forecasting model is established based on the improved CNNBi LSTM.The effectiveness of the model is evaluated by using evaluation indicators,and experimental results demonstrate its superior predictive performance.Furthermore,by incorporating external factors such as weather data,the prediction accuracy of the model is further enhanced,confirming its effectiveness and applicability in forecasting taxi demand in hotspot areas. |