| With the continuous updating of vehicle information collection equipment and the continuous increase in vehicle ownership of operating companies,the vehicle data collected by enterprises has grown exponentially.At the same time,data show that the occurrence of most major traffic accidents is directly related to the bad driving behavior of the operating vehicle drivers.Therefore,operating companies urgently need a data analysis method and data analysis platform that saves costs and has efficient computing performance to analyze the increasingly large amount of vehicle data.This thesis focuses on the construction of a distributed driving behavior evaluation model from the aspects of vehicle data pre-processing and key feature extraction methods,driving behavior evaluation algorithms,and model calculation platform construction methods.(1)This thesis optimizes the pre-processing method of vehicle data,using the method of removing duplicate data first,and then checking and restoring outliers through algorithms.Aiming at the problem that the traditional outlier filtering algorithm is easy to cause data loss,an outlier optimization algorithm based on LOF is proposed.This algorithm starts from the data characteristics that most of the abnormality of vehicle data is the anomaly of latitude and longitude feature values.Two points are used.Distance,direction angle,latitude and longitude to calculate the latitude and longitude of the next piece of data in a certain normal data,and then replace the latitude and longitude of the abnormal data with the calculated latitude and longitude to complete the goal of converting the abnormal points into normal points,ensuring the integrity and consistency of the data.For key feature extraction,a wrapped feature selection algorithm is selected for processing.(2)This thesis proposes a driving behavior evaluation algorithm to improve the inaccuracy and non-objectiveness of traditional 100-percentile vehicle driving behavior evaluation methods,and the complexity of collecting characteristic data.The algorithm includes two parts: driving behavior recognition algorithm and DBSCAN clustering algorithm.Firstly,the driving behavior characteristic data is identified by the driving behavior recognition algorithm,and then the DBSCAN clustering algorithm based on density is used to cluster the driving behavior to make the classification level more scientific and objective.(3)This thesis solves problems such as the traditional distributed computing platform based on virtual machine nodes,which consumes a lot of resources,complicates the construction process,and tedious resource scheduling among nodes by building a Spark computing platform based on Kubernetes cluster.And make a visual kanban to present the evaluation results.After testing,it is proved that the Spark distributed computing platform and vehicle data analysis method based on Kubernetes cluster used in this thesis can help enterprises complete the evaluation goals of driving behavior based on vehicle data.And while using vehicle data for data analysis,reducing the company’s investment in human and hardware,saving investment costs for the company. |