| In recent years,railway construction has developed rapidly,and rail transportation has gradually become one of the main transportation methods.The public’s demand for rail transportation is increasing,and they pay more and more attention to the punctuality of trains.Ensuring the punctuality of trains has also become one of the basic requirements for the quality of railway passenger transportation services.The delay of trains will not only affect the passengers,but also reduce the transportation efficiency of the line,which will have a negative impact on railway transportation.In order to control delay more reasonably,coordinate the operation of train better,reduce the impact of delay faster,based on a big set of real data from UK’s West Coast Main Line(WCML),this paper designs a prediction model for railway train delays with the goal of improving the prediction accuracy of train delays.The main research contents are as follows.(1)Based on the real data from UK’s WCML,complete the data preprocessing such as deduplication,outlier processing,discrete variable processing and missing value processing on the original data,and analyze the distribution of delay on the line in a visual form.Based on the principle of Time-event Graph,analyze the dynamic propagation process of delay in train operation,and establish a data feature set containing 14 variables.(2)According to the characteristics of primary delays and secondary delays,select the input feature set for modeling,and establish the clustering model of delay by using Density-Based Spatial Clustering of Applications with Noise algorithm(DBSCAN).Through the visualization of the clustering results,it is possible to identify the primary delays and secondary delays in the data.The delay data is finally integrated into 4 categories,each category has obvious characteristics in horizontal direction and vertical direction.Based on the characteristics of the 4 types of delay,calculate the delay propagation factor and identify the delay propagation chains in horizontal direction and vertical direction from the data set.(3)Aiming at solving the parameter selection problem of Gradient Boosting Decision Tree(GBDT),a parameter optimization method based on Particle Swarm Optimization(PSO)algorithm is proposed.This paper design the process of parameter optimization method,and verify it through case analysis: compared with the cross-validation method,using PSO to optimize the GBDT parameters can obtain a better parameter combination and greatly reduce the time cost.(4)This paper designs a train delay prediction model combining the automatic delay classification model and the GBDT model based on PSO optimization(PSO-GBDT model).The structure of the model is designed and the performance of the model is verified on the real data.The train delay prediction model has a prediction accuracy of 95% within 3minutes of allowable error,and the prediction accuracy is higher than the random forest model,support vector regression model and neural networks model.At the same time,this paper design a visual interface for train delay prediction model and complete the control design based on PYQT5,finally the results of delay prediction are visually displayed.Based on the real train operation data,combined with data mining and machine learning methods,this paper establish a train delay prediction model.The model can quickly and accurately complete the prediction of train delays,which is of great significance for the optimization of train operation diagrams and train operation command,and this model can improve the quality of railway transportation services. |