Font Size: a A A

Traffic Accident Text Analysis Based On BERT+BiLSTM+CRF Model And Improved Apriori Algorithm

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:H SunFull Text:PDF
GTID:2492306566498164Subject:Traffic and Transportation Engineering
Abstract/Summary:PDF Full Text Request
The analysis of traffic accidents is of great significance to traffic safety.Current analysis is mostly based on the structured coded data provided by the traffic control department,but the textual data of traffic accidents is insufficient to be used as the fact that a large number of key heterogeneous data such as time,location,numbers,and casualties in the text are hard to be effectively extracted and it would be influenced further analysis of traffic accident.Existing methods for extracting text information from traffic accidents mostly use deep learning model based on static word vectors.Although this method could automatically extract the features of heterogeneous data from text and avoid the time-consuming problem caused by manual design rules,the accuracy of extracted information via this method is poor.In view of the better performance of the pre-trained language model BERT in other text mining fields,this thesis mainly focus on the study in the application of the fusion model of BERT and deep learning network to extract key heterogeneous information from traffic accident text.On the basis of the high efficiency of the deep learning model to improve the accuracy of extracted information and in combination with the improved association rule mining algorithm to carry on the accident analysis are proposed and developed.The main work of the thesis includes:(1)A BERT+BiLSTM+CRF model fused with pre-training language is constructed to extract heterogeneous information from traffic accident text.Using BERT performs dynamic vector mapping of text characters and solves the problems of ambiguity and insufficient context-dependent from the source of data expression;Using BiLSTM extracts vectorzed features of text and outputs high-feature text sequences;Using CRF calculates the probabilistic advantage of the global optimal output node and optimizes the result of text sequence features.Comparative experiments indicate that the average accuracy,by using this model,in extracted textual information is 0.924,and the F1 is 0.918,which is 3.8% and8.1% higher than the best model without BERT,respectively.(2)Aiming at the problems of the traditional Apriori algorithm in mining traffic accident data such as the huge number of candidate sets and the single dimension of the generated rules,an improved Apriori algorithm with multi-attribute constraints is designed.According to the characteristics of multi-valued attributes in traffic accident data,the data is uniformly valued and sorted,and by adjusting the threshold of the rule dimension,multi-attribute constraints are designed to eliminate invalid rules generated by the Apriori algorithm.Experiments showed that the running time of the improved algorithm is reduced by 38.3% on average in compare with traditional algorithm and also diminished the generation of invalid rules.(3)A traffic accident text mining analysis system is designed and developed.The system has functions such as traffic accident text data collection,accident text information extraction,association rule analysis,result displayand export.The system could be used by relevant departments to analyze the text data of traffic accidents and to make corresponding decisions.In summary,the constructed model based on BERT+BiLSTM+CRF can effectively extract key data from traffic accident texts;The improved Apriori algorithm for mining traffic accident rules is more efficient and can obtain more scientific results;The designed traffic accident text mining analysis system can basically meet the analysis requirements of traffic accident text data.
Keywords/Search Tags:Deep learning, BERT, BiLSTM, Conditional random field CRF, Association rule mining
PDF Full Text Request
Related items