With the rapid development of urban railway transit,it is vital to improving service quality and passenger travel experience.Passenger demand variation caused by special events(regular and unplanned events)has posed great challenges for the operation of urban railway transit.Therefore,mining short-term passenger flow features and forecasting under special events makes great sense for formulating the operation plan quantitatively.Current studies focus on short-term prediction under normal conditions.However,passenger demand characteristics under normal conditions and special events are different.Existing findings do not apply to special events.Hence,with Automatic Fare Collection(AFC)data,regular event information on the Internet,and recorded unplanned event data,the study aims to predict short-term passenger flow under regular and unplanned events.Specifically,six issues,including the mining of short-term passenger demand characteristics,identification of prediction scenarios and challenges,event information extraction,imbalanced training dataset caused by limited observations under unplanned events,prediction model development,the uncertainty of prediction scenario,and fusion mechanism design given uncertainty prediction scenarios,are studied.The main research contents and findings are as follows:1.Using AFC data and the Ward criterion-based agglomerative hierarchical clustering(AHC)algorithm,we summarize the station-based short-term passenger demand characteristics under normal conditions.Then,fusing multi-source data(i.e.,AFC data,regular event publicity data,unplanned event record data),short-term temporal-spatial passenger flow propagation characteristics under events are extracted.Finally,the typical prediction scenarios,prediction challenges,and the corresponding solutions under events are summarized.2.Assuming events do not impact the targeted passenger flow,the problem of evaluating and selecting predictor variables is solved.Seasonal Autoregressive Integrated Moving Average(SARIMA),Gradient Boosting Decision Tree(GBDT),and Random Forest(RF)are separately developed.Results show that: GBDT outperforms SARIMA and RF in both oneand two-step predictions.SARIMA performs worst,implying that there exists a non-linear relationship among passenger flow.GBDT and RF have the strongest robustness in one-and two-step predictions,respectively.The negative effect of the previous step prediction on the multi-step predictions should be reduced.Simply adding more predictor variables cannot necessarily improve the prediction accuracy,and the key variables should be selected when building predictor variables.3.Assuming regular events impact the targeted passenger flow,two key issues,including event information extraction and prediction model development,are solved.1)Exploring the propagation characteristics of network-based passenger flow under regular events,the event information is extracted.2)Considering the input variable attributes and the relationship between input and output variables,a deep learning(DL)-based short-term passenger flow prediction framework is proposed using the DL theory.It is flexible and explainable.Results show that: the extracted event information enables the DL model to capture the passenger flow variation caused by the events and thus improve the prediction accuracy accordingly.As a result,the DL model outperforms GBDT.Under normal conditions,GBDT outperforms the DL model,and the prediction accuracy of all models is slightly reduced due to the extracted event information.Hence,a set of predictor variables and a simple model cannot achieve the optimal prediction performance simultaneously under different prediction scenarios,implying predictor variables should be separately developed.Also,the fusion mechanism should be devised to take the predictive advantages of the sub-models.4.Assuming unplanned events impact the targeted passenger flow,two key issues,including event information extraction and the imbalanced training dataset caused by the limited observations under unplanned events(combining the normal observations to build the training dataset),are solved.The event information is extracted after exploring the temporalspatial propagation characteristics of network-based passenger flow under unplanned events.Also,Synthetic Minority Over-sampling Technique(SMOTE),Borderline SMOTE(BLSMOTE),and Adaptive Synthetic Sampling Approach(ADASYN)are developed to oversample the limited passenger flow observations under unplanned events.Hence,we balance the training dataset and enrich passenger demand characteristics.Results show that:DL-SMOTE,DL-BLSMOTE,and DL-ADASYN outperform GBDT and the DL model,highlighting the effectiveness of the extracted event information and the over-sampling techniques,and DL-BL-SMOTE performs best and has the strongest robustness.On the other hand,GBDT performs best under normal conditions,and DL-SMOTE,DL-BL-SMOTE and DL-ADASYN perform worst,implying that the over-sampling techniques and extracted event information have degraded the DL model performance under normal conditions.Thus,we should separately develop the predictor variables and devise the fusion mechanism to take the predictive advantages of the sub-models.5.Events may not impact the targeted passenger flow.Thus,the Naive Bayesian(NB)classifier-based,dual dynamic track(DT),and neural network(NN)fusion mechanisms are separately developed to take predictive advantages of GDBT and the DL model under normal and event scenarios.Results show that: the three fusion mechanisms further improve the prediction accuracy in most cases;GBDT-DL-DT performs best with the strongest robustness,GBDT-DL-NB has the weakest robustness,and GBDT-DL-NN performs worst.The development of three fusion mechanisms gives insights into short-term passenger flow prediction given the uncertainly of prediction scenarios. |