| It is one of the emphases of automobile safety technology research to predict the injury severity of traffic crash and put forward the method to reduce the crash injury.In this paper,the prediction model of injury severity of traffic crashs is built based on feature selection algorithm.The model takes the data of rear-end crash caused by non-driver factors recorded in FARS(Fatality Analysis Reporting System)in the United States National Highway Traffic Safety Administrationas an example to explore the inherent law between the influencing factors and the severity of accident injury.According to the analysis results,it provides safety measures and improvement suggestions for unmanned vehicles and assisted driving vehicles to reduce the severity of accident damage.The main work is as follows:(1)This paper combines the two classifications Filter and Wrapper based on the working principle of the feature selection algorithm.Two indexes of information measurement in Filter feature selection algorithm are selected:gini index and mutual information;Then,on the basis of wrapper feature selection algorithm,a method based on gini index and mutual information is used to comprehensively evaluate the variable feature importance,which is used as the basis of generalized sequence backward search strategy.The improved Wrapper feature selection algorithm uses three classifiers,C4.5 decision tree(C4.5),random forest(RF)and support vector machine(SVM),to construct the prediction model of traffic crashs injury severity and make a comparison.(2)Frist,the causes and influencing factors of rear-end crashs were preliminarily analyzed.Then,the data of rear-end crashs from 2010 to 2017 in the traffic crash database of FARS were used as the data source.Driver’s own misbehavior and missing records were excluded.Afterdata integration and data format preprocessing,a total of 6,295 records of rear-end crashs were obtained.The severity of oppcutant injuries was taken as the dependent variable,and 23 potential factors were selected from people,vehicles,roads and environment as the independent variables.(3)In the process of feature selection cycle,23 variables in the data set are evaluated and ranked by comprehensive evaluation criteria.Then the classification accuracy of the three classifiers in the current loop is calculated.Secondly,M variables with the smallest comprehensive evaluation scores were removed in sequence according to the backward search of the generalized sequence.Generating a new dataset and restarting the loop until the iteration loop is terminated when the variable in the dataset is zero.The results show that the model can quickly and accurately screen out important variables by introducing comprehensive evaluation criteria which are:seat belt,relative speed,airbag,vehicle weight,vehicle damage degree,vehicle type,age,overturned,fire,road environment,weather,time period and occupant type;At the same time,the classification performance of prediction model is improved.(4)In this paper,the classification performance of three classifiers is compared by classification accuracy and recall rate.The results show that the classification performance of SVM is the best on the whole,followed by RF and C4.5.Pseudo-elastic analysis and sensitivity analysis were carried out on the selected important variables by SVM classifier to determine the influence of traffic crash injury severity of each variable.According to the analysis results,corresponding safety measures and improvement suggestions are proposed for driverless vehicles and assisted driving vehicles. |