Public security anti-terrorism situation prediction and response plan formulation require various forms of data,including Internet information and case materials.Terrorism-related texts contain rich information about terrorism-related entities,such as terrorism-related characters,the means and locations of terrorist attacks,etc.Identifying terrorism-related entities provides basic support for information extraction and knowledge graph construction in the field of anti-terrorism.This thesis takes the terrorism-related news text data published on the Internet as the main research object to improve the entity recognition technology oriented to the field of anti-terrorism.Aiming at the current challenges of entity recognition in the field of anti-terrorism,such as the lack of public corpus and entity standards,uncertain entity types,single structure of entity recognition model,and the extracted terrorism-related entities cannot be directly applied in the extraction of anti-terrorism events and the construction of anti-terrorism domain knowledge base,a whole process scheme for recognizing and utilizing the entities in the field of anti-terrorism is proposed.In order to conduct the recognition and association of terrorism-related entities,this thesis identifies terrorism-related news from massive data by Text Classification technology,extracts fine-grained entities in terrorism-related news by named entity recognition technology,stores and associates entities using graph database.The main work of this thesis includes:(1)In view of the lack of entity standards and uncertain types,a set of fine-grained entity labels for the field of anti-terrorism is formulated.The label specification reduces the impact of entity classification ambiguity on entity recognition model,and fine-grained labels encode role information into entities,thus constructing the entity labeled dataset Anti-Terr-Corpus in the field of anti-terrorism.(2)Considering the single structure of traditional entity recognition model and its inability to solve the problem of polysemy,a fine-grained entity recognition model Mac BERT-Stacked Bi LSTM-CRF based on framework of pretrained language model-semantic encoding layer-label decoding layer is proposed.The dynamic word vector representation fused with contextual information is obtained through Mac BERT.The richer contextual information about the sequence before and after the sequence is obtained by stacking two layers of Bi LSTM,and the labeling rules are learned through CRF to improve the recognition accuracy.Compared with the traditional model based on static word vector,the proposed model increases F1 by 24.5 percentage points;compared with the single-layer Bi LSTM model,the F1 increases by 1.1 percentage points;the adaptability of the model is verified on the RMRB dataset and the CEC emergency corpus;experiments show that the model can effectively capture important entities in anti-terrorism texts.(3)Focusing on the problem of less effective labeled data in practical application of antiterrorism business,a method of applying entity recognition model under the condition of data scarcity is proposed.Active learning algorithm is used to identify the most valuable data for model learning in massive unlabeled dataset,so as to reduce the amount of actual labeled data.According to the experimental results of specific datasets,the corresponding active learning algorithms based on different strategies are obtained,which provide important theoretical reference for the model implementation.(4)In order to effectively sort out terrorism-related news for entity recognition,a classification model Mac BERT-CLS Bi LSTM-Softmax is proposed based on the application of domain entity recognition model.By segmenting natural news and generating sentence vectors separately and then merging them to replace the traditional truncation method,the complete semantic information of the text is better captured,thereby improving the classification accuracy of long terrorism-related news.Finally,a prototype system of associated visualization of terrorism-related news entities is designed and implemented to verify the application of the above technology,and the association analysis of terrorism-related entities is realized in combination with the graph database. |