Research On Mechanism And Interpolation Strategies Of Missing Traffic Crash Data

Posted on:2020-03-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Bai

Full Text:PDF

GTID:1482306473470824

Subject:Transportation planning and management

Abstract/Summary:

PDF Full Text Request

Using the historical crash data to explore the crash characteristics is an important way in the traffic safety research;many related traffic safety theories and methods were proposed based on the post-event research from the historical data.However,in reality,due to a number of issues related to the crash types(e.g.hit-and-run crash),data collection techniques(e.g.negligence of investigators,flawed data acquisition equipment or methods)and post-processing(missing when inputting the system),a large number of missing data exist in the crash database and cause concerns to the related studies,such as the data mining.It is an unavoidable problem in traffic safety research.The missing data increases the complexity of the results analysis,leads to deviation of analysis conclusion also,and brings a series of problems such as not being able to show the overall relevant characteristics of research.The data integrity and reliability of the sample is the key premise to ensure the correctness of the data mining.Deletion and other traditional methods have serious limitations and the new development of data mining and processing technology makes the application of more advanced mathematical methods is possible.In order to utilize the sample data adequately and ensure the conclusions more being realistic,the “interpolation” method is adopted to fill missing data and finally obtain the complete ones.Then,corresponding data mining methods can be carried out with the complete structure of the data.The purpose of imputation is not to obtain the missing data value accurately,but to predict the distribution or corresponding regularity of obedience.The main achievements of the thesis are as follows:1.Based on the complete crash samples,the study employed logistic model,binary probit model with random parameters,and linear mixed-effects models to identify the factors significantly affecting the occurrence of hit-and-run behavior.The results indicated that the fleeing drivers’ age and sex,the fleeing vehicle type and usage,whether the perpetrator/innocent injured or not,staying drivers’ age,staying vehicle type,crash occurring time(nighttime or daytime),road conditions,speed limit,road hierarchy,the crash location(road lane)had the significant impacts on the occurrence of fleeing behavior in two-vehicle crashes.In addition,it was also found that the binary probit model with random parameters and the linear mixed-effects models had similar goodness of fits which were better than that for logistic model.2.The improved Apriori algorithm is employed to explore the missing rules of the characteristic variables in hit-and-run crash.The results indicated that the crashes information were mainly missing in terms of the characteristics of fleeing/staying drivers and vehicles.The missing scale of characteristic variables in hit-and-run crashes is considerably larger than that in nonhit-and-run crashes.The information on alcohol invoulvement of the escaped drivers are the variable that suffers the most serious missing.Moreover,it is obviously associated with the missing data of fleeing drivers’ age and staying vehicle type.Thus,the study attempted to explore the association rules among the injury severities of fleeing and staying drivers,fleeing driver sex,and fleeing vehicle type.3.Considering the potential heterogeneity involved in the crash samples,the study attempted to build optimized logistic model and the linear mixed-effects model with the selction criteria of goodness of fit to imputate the single missing data,which are different from the traditional regression model.Meanwhile,the single missing data imputation based on Apriori association rules was established according to the association results of Apriori algorithm.It can be concluded that the single missing data imputation based on Apriori association rules has higher interpolation efficiency jusitified by the evaluation criteria of integrating degree,root-mean-square error,and mean absolute error.4.The study refined the methods for missing mechanism judgment,and established a multiple imputation method based on random forest theory(Fully Conditional Specification).Then,the characteristics of the influencing factors in hit-and-run crashes were identified based on the model.The factors contributing to the hit-and-run behavior were discussed with comparing to the results of the full sample data analysis.The thesis provides an avenue to deal with the missing data in the road traffic crash database,which aims to form the complete data samples so that the crash mechanism and propensity can be explored by corresponding data mining methods.The application of interpolation theory and methods serve to provide guidance and reference for improving the quality of road traffic crash database,reducing the road traffic crashes,and mitigating the crash injury severities.

Keywords/Search Tags:

Hit-and-run crash, Missing data, Apriori algorithm, Association rules, Multiple imputation, Random forest, Influencing factors identification

PDF Full Text Request

Related items

1	Analysis Of Causes Of Traffic Accidents On Different Sections Of Expressway Based On Improved Association Rules
2	Improvement And Application Of Apriori Algorithm For Association Rules
3	Research On Accidents Causes Based On AHP-Apriori Algorithm
4	Characteristic Analysis Of Urban Highway Traffic Accidents Based On Data Mining
5	Research On Application Of Apriori Improved Algorithm In Traffic Illegal Data Analysis
6	The Application Research Of Association Rules In Large Data Mining And Optimal Operation Of Power Plant Boilers
7	Research On Residential District Livability Evaluation Based On Association Rules
8	Study On Road Traffic Accident Influencing Factors Based On Association Rules And Spatial Autocorrelation
9	Study On The Transient Stability Assessment Method Based On Association Rules
10	Research On Data Mining Of Low-grade Highway Accidents Baesd On Multimensional Association Rules