With the progress of society and the rapid development of economy,the number of motor vehicles in the city is increasing,which causes a series of road traffic safety problems.In China,more than 260 thousands people are killed in traffic accidents every year,causing huge casualties and property losses.The road traffic safety analysis has become a hot topic of research by scholars at home and abroad.The earliest analysis of the causes of traffic accidents mainly includes the macroscopic analysis of accidents and the statistical analysis of data,which focuse on the impact of the factors such as people,cars,roads and environment on traffic accidents.With the further studies,it is considered that the causes of traffic accidents are multifaceted,systematic and internal.With the development of the large data intelligent analysis technology,the use of data mining and machine learning related technologies to analyze traffic data,and to find the cause and potential risk factors of the accident,is beneficial to put forward the pertinent measures,avoid and prevent the occurrence of traffic accidents,which has a good application significance.In view of the characteristics of the diversity of traffic accidents,and considering the characteristics of the authenticity and timeliness of the news reports,this paper uses the news data of traffic accidents to excavate and analyze the risk factors of traffic accidents.The paper takes the traffic accident news of Sina website as the data source,extracts the related risk factors of traffic accidents from the news events,and stores them by using the Neo4 j map database.In order to overcome the shortcomings of the classical Apriori only for single dimension association mining and the need to scan the database frequently,an improved multi value attribute MA-Apriori algorithm is proposed,which takes provinces and cities as the focus,and excavates a variety of combination risk factors that lead to the accident.In order to improve the ability of online learning for new data,this paper is inspired by the idea of Fast Update(FUP)incremental mining algorithm,further improves the learning algorithm of association rules,and establishes an incremental mining analysis algorithm for traffic accident risk factors.Finally,according to the results of the mining,the laws of traffic accidents in provinces and cities are summarized.Based on the Bayesian network,the model of traffic accident risk prediction is built,which provides the basis for the early prevention of traffic accidents.The main work and innovation of this article are described as follows:1.Set up a database of news traffic accident record.In the study of traffic accidents,based on the news data related to traffic accidents,considering the influence of meteorology on traffic,the database of traffic accident records is set up from the news reports combined with real-time weather to mining accident factors.The paper uses crawler technology to capture the news data of Sina website from 2015 to 2017,and the data cleaning is used to filter data unrelated to traffic accidents based on the text keyword extraction method.Finally,the risk factors of traffic accidents are extracted from the selected news,including the 6 attributes of time,place,weather,accident cause,traffic mode and accident type,and the Neo4 j map database is used to store traffic accident records.A traffic accident record database containing 1177 data has been constructed.2.Propose an improved algorithm based on multivalued attribute association rules.In this paper,the Apriori algorithm is used to excavate the association rules of traffic accident factors.Considering that the accident factor is the relationship between multidimensional factors,so the multidimensional association rules are used for data mining.The Apriori algorithm is only suitable for single dimension association mining and needs to scan the database many times,which leads to low computing efficiency.In this paper,an improved algorithm: MA-Apriori is proposed.According to the characteristics of traffic accident news data,the judgement of whether the "location" and "accident type" are included in the connection step is added.In addition,the pruning step only considers the generating frequent items of "accident type",and improves the efficiency of calculation while mining the rules required.The performance of the traditional Apriori algorithm and the MA-Apriori algorithm on traffic accident data sets are compared in the experiments.The experimental results show that the smaller the support rate is,the higher the efficiency of the improved algorithm is.3.Propose an improved incremental association rule algorithm.With the continuous increase of new data,it is beneficial to mining more effective traffic accident risk factors and related relationships.However,the traditional Apriori algorithm needs to rescan the whole database including the original and new data,which cannot make full use of the acquired knowledge,resulting in a lot cost of time and space.This paper is inspired by the idea of the FUP algorithm,and proposes an improved incremental algorithm: UMA-Apriori,which uses the frequent set of the original database to calculate the frequent sets of the new datasets.Then,the two frequent sets are compared,and the same parts are retained.For the different parts,calculate the support degree in the database,leave frequent sets that satisfy the minimum support,and finally use the combined frequent sets to calculate strong association rules.Through using incremental association rules to mining of the new data,the experimental results verify that the efficiency of UMA-Apriori algorithm is improved compared with the original traditional algorithm,and the more the incremental data is,the more obvious effect is.4.Construct of traffic accident risk prediction model.Bayesian network is one of the most effective models in uncertain knowledge and reasoning field,and has great advantages in data analysis and prediction.Therefore,this paper chooses the Bayesian network as the basic model to build a traffic accident risk prediction model.Based on the obtained accident risk factors,this paper calculates the probability of accident by calculating the probability of the conditions among the factors,so as to achieve the purpose of the accident risk prediction.In this paper,the original traffic accident data set and new data set are used to test the risk prediction model,and analyze the effectiveness of the risk prediction model.The prediction accuracy of training set is 94.2% and the accuracy rate on test set is 86.1%,which proves that the constructed prediction model is effective. |