| Waiting spot recommendation and waiting time prediction are one of important factors to help passenger formulate travel plan and to reflect the city’s traffic conditions for traffic managers.It is also an important means in the implementation of traffic control.However,current studies about waiting spot recommendation and waiting time prediction still exist some problems such as the limited processing ability,the low mining efficiency,the lack of passenger mobility,the non-stationary time series,and the excessive difference between values in the traditional centralized mining platform.In order to solve these problems,this paper takes waiting spot recommendation and waiting time prediction as the study contents,using the mobile trajectory big data,establishing distributed models and parallel algorithms by analyzing taxi hotspot data and taxi travel time information to provide passengers and drivers more accurate information from the dimensions of time and space based on the Spark parallel computing framework.The main works and innovations in this paper are summarized as follows:1.Data preprocessing.Traditional mining platform cannot solve the problem of calculation and storage of mobile trajectory big data,and the problem of mix driving direction of the taxi.Firstly,we build a parallel processing framework based on Spark to provide a platform for the calculation of mobile trajectory big data.Secondly,matching the coordinate axis and the GPS direction data,we propose a method that based on GPS direction and coordinate axis quadrant to divide the road division.Finally,we combine GPS direction data with the range of taxi direction variation to divide the trajectory mobile big data after processing,which achieve the data support of waiting time prediction.2.Waiting spot recommendation.In order to solve the parameter sensitivity,the difficulty in boundary point identification in the Spark,different initial positions of passengers and the center of the non-convex clustering graph of DBSCAN(DensityBased Spatial Clustering of Applications with Noise),we propose a parallel DBSCAN optimization algorithm with silhouette coefficient and the rate of picking up on Spark,named as SP-DBSCAN.Firstly,we use the silhouette coefficient and the boarding ratio to select optimal parameters of Eps and Min Pts,which solves the loss of hot spots caused by low-density clustering and boundary point recognition due to unreasonable parameter settings.Secondly,we recommend the K-Means algorithm to find two centroids of one cluster,to solve the irrationality of only one waiting spot in one nonconvexity area.Finally,we use evaluation index of the recognition rate to evaluate our algorithm.Experimental results demonstrate that compared with C-DBSCAN under four data sets,the recognition rate of SP-DBSCAN increased by 3.20%,4.97%,1.60%,and 6.20%,and under two data sets with P-DBSCAN,the recognition rate of SPDBSCAN increased by 3.47% and 5.80%.3.Waiting time prediction.Aiming to solve these problems that trajectory data cannot divide driving direction,different travel characteristics,the non-stationarity of the time series,and the excessive difference in values.We propose a GRU forecasting model with Empirical Mode Decomposition(EMD)algorithm and normalization on Spark,named as EMDN-GRU.Firstly,we distinguish the feature of the travel period,and then we use the EMD algorithm which can decompose the time series into a finite number of Intrinsic Mode Functions(IMF)and a Residual(Res)to reduce the non-stationarity of the time sequence.Secondly,we predict all series after normalization by GRU model and add up these predict values after change to the original value.Finally,we use evaluation index of the measures of effectiveness(MOEs)to verify the accuracy of EMDN-GRU.Experimental results on real-world GPS trajectory mobile big data demonstrate that compared with GRU,LSTM,EMD-LSTM,and EMD-GRU,the MAPE of EMDN-GRU of the predict results on weekends,weekdays and one week are reduced by 84.19%,74.67%,92.48%,and 92.13%;88.22%,85.82%,91.72%,and95.01%;62.20%,63.27%,83.07%,and 84.54%,respectively. |