Font Size: a A A

Generative Adversarial Network Based Rear-end Accident Data Filling And Analysis Of Factors Influencing Severity

Posted on:2024-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2542307157469844Subject:Traffic and Transportation Engineering
Abstract/Summary:PDF Full Text Request
As the level of motorization gradually increases,traffic accidents occur frequently around the world,and safety research around traffic accident history data is increasing in order to effectively prevent and reduce the severity of traffic accidents.At present,there are still difficulties in collecting traffic accident information,and many countries have established traffic accident databases,but the quality of data varies,and there are often missing data.As a frequent type of traffic accidents,the presence of missing values in rear-end accidents can lead to a decrease in the accuracy and bias of statistical analysis results,and increase the risk of model misclassification and decrease model accuracy.Generative adversarial network(GAN)is one of the most promising methods for unsupervised learning on complex data distribution in recent years,and its application has been gradually extended to the field of missing data filling.In this paper,based on the missing data in traffic rear-end accidents,we propose to use Generative adversarial imputation network(GAIN)to fill the missing values and form a complete data set of rear-end accidents,and on this basis,we construct a three-classification accident severity influence factor analysis model to analyze the influence mechanism of each feature The analysis of the influence mechanism of each characteristic variable is analyzed in depth.Considering the problem that domestic accident data are difficult to obtain,this paper selects 101,452 rear-end accident data from Chicago during 2016-2021 as the research object,which contains 20 independent variables and 1 dependent variable,among which the independent variables involve driver,vehicle,road and environment information,and the dependent variable is the accident severity level,with three categories: no injury accident,minor injury accident,and serious injury or fatal accident.Based on the analysis of the missing data situation,GAIN is used to achieve the filling of the missing data,and it is compared with the Multiple Imputation by Chained Equations(MICE)and Expectation Maximization(EM)filling methods in statistical filling methods,as well as the machine learning(EM)filling algorithm,and Miss Forest and K-Nearest Neighbor(KNN)algorithms in machine learning methods to verify the filling effect of GAIN in terms of filling speed,variance variation of data,and fitting effect.The results show that the GAIN algorithm can better simulate the original data distribution and generate results closest to the original data.Based on the data filling,the triple classification accident severity influence factor analysis models of XGBoost and LightGBM were constructed to model and analyze the original dataset and the dataset after filling by five different algorithms.Considering the existence of data imbalance,this paper selects the random oversampling method for data imbalance processing,and chooses Accuracy(Accuracy),F1 and AUC(Area Under Curve)as model evaluation indexes,and the results show that the evaluation indexes of LightGBM model are better than XGBoost model,and the evaluation indexes of GAIN dataset are better than other datasets.In the LightGBM model,the model evaluation indexes are improved after filling in the missing data values,and the model accuracy of GAIN dataset is improved by 0.0456,F1 by 0.0322,and AUC by 0.0543 compared with the original dataset;the model evaluation indexes are improved slightly after the data imbalance treatment,and the model accuracy of GAIN dataset is further improved after the data imbalance treatment.Finally,SHAP was used to visualize and analyze the model results for global and accident cases respectively,and based on this,risk prevention and control strategies for rear-end accidents were proposed from the perspectives of drivers,vehicles,roads,and environment,in order to reduce the severity of rear-end accidents.This paper is supported by the National Natural Science Foundation of China(NSFC)under the project "Research on the Causal Mechanism of Traffic Accident Severity Considering Data Imbalance and Model Interpretability"(Grant No.52102404).
Keywords/Search Tags:Data filling, Generate adversarial networks, GAIN algorithm, LightGBM model, XGBoost model, SHAP principle
PDF Full Text Request
Related items