Statistical Machine Learning Based Classification And Prediction Of High Risk Drivers

Posted on:2024-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2542307106986109

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

As urbanization becomes a global development trend,vehicle ownership is increasing and with it the number of traffic accidents and casualties is increasing tremendously,which brings great loss and distress to society and individuals.At the same time,urbanization and increasing traffic congestion have also promoted the development of connected vehicle technology.However,in expanding the scope of auto insurance business,companies often focus on improving the profit model and reaching sales targets,thus neglecting the identification of high-risk drivers and the control of claims costs.Therefore,whether from the perspective of social security or from the perspective of enterprise car insurance operation and car network development,risk identification of drivers is an important research direction.In addition,in the context of the big data era,it has become easier and easier to obtain massive user data,and machine learning algorithms have also developed relatively mature,and the technology for customer risk identification has become more and more advanced.However,there are still few studies on driver risk identification and control,which are more based on theoretical analysis,and there is still much room for research on the use of data for quantitative classification and prediction of driver risk.Therefore,this thesis focuses on four major aspects of driver risk classification and prediction based on the personal characteristics of the accident driver,vehicle characteristics,road characteristics and driving environment.Firstly,we select the characteristics that may have an impact on driver risk according to the reference literature and experience,use a reasonable way to deal with the missing values and outliers,and code and standardize the variable values.Subsequently,a random forest model is used to rank the importance of the selected features,and the features with importance greater than 0.01 are selected as input variables for the model.Then the imbalance problem of the data was processed,and the data were balanced using SMOTE oversampling,Near Miss undersampling,and SMOTETomek mixed sampling,and the original data set and the three balanced processed data sets were brought into the CART decision tree,the base classifier was the CART decision tree with Adaboost integrated learning and ANN The original imbalanced dataset and the three balanced datasets are brought into the CART decision tree,the Adaboost integrated learning and ANN deep learning models with CART as the base classifier,and the Easy Ensemble imbalanced classification model based on Adaboost integrated learning for the imbalanced dataset for training,and the models are evaluated using a test set.By comparing the effects of various models,it is found that the classification effect of the original imbalanced dataset directly brought into the model is similar to that of random classification,which is extremely poor,while the three balancing adoption methods and the Easy Ensemble model all solve the problem of imbalanced data not being able to make effective classification prediction to a certain extent,and the G＿means,F＿measure,and AUC values of the model have been The G＿means,F＿measure and AUC values of the models are effectively improved,and the high-risk drivers can be effectively identified.In general,SMOTETomek-Adaboost has the best prediction classification effect,and the AUC of this model can reach 0.94.This thesis suggests that this model be applied to the problem of identifying and classifying high-risk drivers.

Keywords/Search Tags:

Driver serious accident risk level, Imbalanced data, Machine learning, Integrated learning, Deep learning

PDF Full Text Request

Related items

1	Research On Risky Bus Driver Identification Combining Ensemble Learning And Interpretability Methods
2	Research On Machine Learning Method And Application For Class-imbalanced Automotive Big Data
3	Research On Photovoltaic Power Generation Forecast Based On Integrated Deep Learning
4	Research On Classification Method Of Bearing Quality Grade Based On Imbalanced Data Augmentation
5	Research On Intelligent Fault Diagnosis Method Of Railway Train Wheelset Bearing Based On Deep Learning
6	Research On Bus Passenger Flow Prediction And Bus Plan Optimization Model Based On Big Data And Machine Learning
7	Study On Densely Connected Deep Extreme Learning Machine Algorithm
8	Research On Imbalanced Sample Learning For Fastener Defect Detection
9	Research And Application Of The Accident Risk Prediction Model Of Mountain Expressway Based On Machine Learning
10	Research On Fault Diagnosis Method Of Rolling Bearing Based On Integrated Deep Learning