| Event extraction technology is an important method to analyze and store military encyclopedia information.As one of the main tasks of natural language processing,event extraction is widely used in information retrieval and public opinion analysis.Currently,event extraction methods have achieved good results in other areas.However,the research found that there are still the following problems in the military encyclopedia field:(1)The boundary of Chinese text words is blurred.The military text of the encyclopedia is long.Still,the pre-training model based on absolute location coding cannot directly process more than512 characters of text,resulting in incomplete text semantics of the input model.The event extraction effect of the model is poor.(2)The existing encyclopedia military field event extraction task lacks open-source datasets.This paper builds a Chinese military event extraction dataset by crawling Chinese military data from the encyclopedia website,but the dataset still has the problems of insufficient diversity of data samples and wide distribution of event elements entities.This paper took the Baidu Du EE dataset and the Chinese Military Event Extraction dataset as objects to study encyclopedia military event extraction technology based on MacBERT(MLM as Correction BERT)to solve the above problems.The main research contents and results are as follows:(1)To solve the problem of ambiguous Chinese word boundaries and long encyclopedia military text length,a joint military event extraction algorithm HDMacBERT-CRF based on improved MacBERT(MLM as Correction BERT)is proposed.The algorithm uses word granularity partitioning of input text to avoid ambiguous word boundaries in Chinese.The algorithm uses hierarchical decomposition to enable the model to directly process text that exceeds 512 characters.Compared with four typical algorithms on two datasets,the algorithm extracts better results and is adequate for Chinese event extraction tasks.(2)To solve the problem of the wide distribution of encyclopedia military event element entities and the lack of diversity of data samples,FGM(Fast Gradient Method)and BiLSTM(Bi-directional Long Short-Term Memory)–based deep-level semantic event extraction algorithm(HDMacBERT-FGM-BiLSTM)is proposed.BiLSTM for bidirectional chain computing is used to enhance the extraction of semantic information before and after the text.FGM antagonism training is used to dynamically add perturbations to the Embedding matrix of the model,which increases the diversity of semantic samples and robustness of the model.Experiments on the Chinese military event extraction dataset show that compared with the classic BERT-CRF algorithm,the algorithm improves event extraction significantly,with microP,microR,and microF values increasing by 4.00%,6.75%,and 5.45%,respectively. |