| The frequent occurrence of telecommunication fraud has enriched the information sources of the case base in the field of telecommunication fraud.Event graph is good at mining the logical relationship between events from unstructured and semi-structured information,which is helpful for assisting the police to familiarize themselves with fraud routines,study and judge the key links of anti-fraud business,or dig out the deep factors that lead to the occurrence of telecom fraud incidents,so as to control or mitigate the incidents.The social harm caused by telecommunications fraud.However,due to the lack of relevant research in this field and the complex algorithms involved,there are still many problems to be solved in the construction of an event graph in the field of telecommunication fraud.This research is oriented to the whole process of building an event graph in the field of telecommunication fraud.make improvements.It is divided into several parts,such as data set construction,event trigger word and event element extraction,event element identification under small sample conditions,and event relationship extraction,etc.,explore the implementation algorithms of each part and make improvements.The contributions of this paper mainly include the following aspects:Aiming at the problem of low accuracy in extracting event trigger words and event elements in the field of telecom fraud,an event trigger word extraction and event element extraction method based on ERNIE-BIGRU-CRF is proposed.The ERNIE pre-training language model of knowledge-enhanced semantic representation is used to obtain the semantic representation of the input,and the knowledge information such as entity attributes and entity relationships is implicitly learned by using three-stage masking strategies of word mask,phrase mask and entity mask.In order to further extract textual context information,the word vectors trained by ERNIE are input into the bidirectional GRU layer to extract sentence-level features for further semantic encoding.Finally,the sequence labeling is performed by the CRF layer,and the globally optimal label sequence is output.The experimental results show that the F1 index of the ERNIE-BIGRU-CRF model constructed in this paper is significantly improved compared with the baseline model in the event trigger word extraction task and the event element extraction task.In the TFG(Telecom Fraud Graph)data set of new illegal cases,compared with the baseline model BERT-Tags,the ERNIE-BIGRU-CRF model proposed in this paper improves the F1 value by 13% in the task of trigger word extraction,and in event element recognition and classification.The tasks increased by 11% and 8%,respectively.Aiming at the problem of low accuracy of event element recognition and inability to use label semantic information under the condition of small samples,this paper proposes a Tempalte-based UNILM model for Chinese event element recognition based on template generative prompt.By transforming the event element recognition task,it is regarded as a generative task under the seq2 seq framework.Introducing the concept of UNILM model,by transforming the attention mask in BERT,the Transformer architecture of BERT is combined with Seq2 Seq,which has better generation performance than ordinary seq2 seq models such as BART and is fully adapted to the Chinese environment.The experimental results show that the F1 index of the Tempalte-based UNILM model in the identification of event elements under the condition of small samples is significantly higher than that of ERNIE-BIGRU-CRF,and the average improvement of each event element is about 6% compared with ERNIE-BIGRU-CRF.The accuracy of event feature recognition under sample conditions.For the task of relation extraction between events,this paper divides it into explicit event relation and implicit event relation,using pattern matching for extraction and introducing semantic dependency analysis into relation extraction,giving full play to the characteristics of Chinese consensual,and directly across syntactic analysis.Semantic dependency analysis uses the complementary advantages of the two to complete event relationship extraction.The experimental results show that the accuracy rates of explicit relation extraction and implicit relation extraction on the TFG dataset are 89% and 72%,respectively.Finally,this paper gives an example of the construction process of an event graph in the field of telecommunication fraud. |