Font Size: a A A

Research On Mechanism And Interpolation Strategies Of Missing Traffic Crash Data

Posted on:2020-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:W BaiFull Text:PDF
GTID:1482306473470824Subject:Transportation planning and management
Abstract/Summary:PDF Full Text Request
Using the historical crash data to explore the crash characteristics is an important way in the traffic safety research;many related traffic safety theories and methods were proposed based on the post-event research from the historical data.However,in reality,due to a number of issues related to the crash types(e.g.hit-and-run crash),data collection techniques(e.g.negligence of investigators,flawed data acquisition equipment or methods)and post-processing(missing when inputting the system),a large number of missing data exist in the crash database and cause concerns to the related studies,such as the data mining.It is an unavoidable problem in traffic safety research.The missing data increases the complexity of the results analysis,leads to deviation of analysis conclusion also,and brings a series of problems such as not being able to show the overall relevant characteristics of research.The data integrity and reliability of the sample is the key premise to ensure the correctness of the data mining.Deletion and other traditional methods have serious limitations and the new development of data mining and processing technology makes the application of more advanced mathematical methods is possible.In order to utilize the sample data adequately and ensure the conclusions more being realistic,the “interpolation” method is adopted to fill missing data and finally obtain the complete ones.Then,corresponding data mining methods can be carried out with the complete structure of the data.The purpose of imputation is not to obtain the missing data value accurately,but to predict the distribution or corresponding regularity of obedience.The main achievements of the thesis are as follows:1.Based on the complete crash samples,the study employed logistic model,binary probit model with random parameters,and linear mixed-effects models to identify the factors significantly affecting the occurrence of hit-and-run behavior.The results indicated that the fleeing drivers’ age and sex,the fleeing vehicle type and usage,whether the perpetrator/innocent injured or not,staying drivers’ age,staying vehicle type,crash occurring time(nighttime or daytime),road conditions,speed limit,road hierarchy,the crash location(road lane)had the significant impacts on the occurrence of fleeing behavior in two-vehicle crashes.In addition,it was also found that the binary probit model with random parameters and the linear mixed-effects models had similar goodness of fits which were better than that for logistic model.2.The improved Apriori algorithm is employed to explore the missing rules of the characteristic variables in hit-and-run crash.The results indicated that the crashes information were mainly missing in terms of the characteristics of fleeing/staying drivers and vehicles.The missing scale of characteristic variables in hit-and-run crashes is considerably larger than that in nonhit-and-run crashes.The information on alcohol invoulvement of the escaped drivers are the variable that suffers the most serious missing.Moreover,it is obviously associated with the missing data of fleeing drivers’ age and staying vehicle type.Thus,the study attempted to explore the association rules among the injury severities of fleeing and staying drivers,fleeing driver sex,and fleeing vehicle type.3.Considering the potential heterogeneity involved in the crash samples,the study attempted to build optimized logistic model and the linear mixed-effects model with the selction criteria of goodness of fit to imputate the single missing data,which are different from the traditional regression model.Meanwhile,the single missing data imputation based on Apriori association rules was established according to the association results of Apriori algorithm.It can be concluded that the single missing data imputation based on Apriori association rules has higher interpolation efficiency jusitified by the evaluation criteria of integrating degree,root-mean-square error,and mean absolute error.4.The study refined the methods for missing mechanism judgment,and established a multiple imputation method based on random forest theory(Fully Conditional Specification).Then,the characteristics of the influencing factors in hit-and-run crashes were identified based on the model.The factors contributing to the hit-and-run behavior were discussed with comparing to the results of the full sample data analysis.The thesis provides an avenue to deal with the missing data in the road traffic crash database,which aims to form the complete data samples so that the crash mechanism and propensity can be explored by corresponding data mining methods.The application of interpolation theory and methods serve to provide guidance and reference for improving the quality of road traffic crash database,reducing the road traffic crashes,and mitigating the crash injury severities.
Keywords/Search Tags:Hit-and-run crash, Missing data, Apriori algorithm, Association rules, Multiple imputation, Random forest, Influencing factors identification
PDF Full Text Request
Related items