| Enterprise support policy documents are usually unstructured text data,and traditional natural language processing tasks perform poorly on them.Named entity recognition is used to achieve structured processing of unstructured text in the field of enterprise support policy documents by inputting pre-defined entity types in the language corpus.However,using traditional named entity recognition models on enterprise support policy texts can result in low entity recognition accuracy and missing entity information due to the large number of proprietary terms.In addition,the scarcity of annotated corpora related to enterprise support policy texts leads to poor entity recognition performance on such texts.This study utilizes named entity recognition technology to identify important entity information in enterprise support policy texts.(1)To address the issue of poor model training due to the scarcity of annotated corpora,we first preprocessed a dataset of 5512 corpora obtained from government websites to obtain unstructured enterprise support policy text data.We then analyzed the structure and content of enterprise support policy texts and defined the information needed by enterprises to apply for support policy as entity categories.We manually labeled the dataset with the defined entity categories to obtain a standardized entity annotation corpus for training the named entity recognition models,making a contribution to research in this field.(2)To address the issue of poor recognition performance of traditional NER models on enterprise support policy texts due to the presence of proprietary terms,we propose an enterprise support policy text named entity recognition model that combines Ro BERTa-wwm and Bi LSTM-CRF.Ro BERTa-wwm is used to train dynamic word vectors to represent word polysemy,and Bi LSTM network is used to further extract contextual information and semantic features of enterprise support policy texts.Finally,the best predicted sequence is obtained through conditional random field.The proposed model achieved an F1 score of 91.70% on the enterprise support policy dataset,demonstrating its effectiveness in recognizing named entities in enterprise support policy texts.(3)To address the issue of Ro BERTa-wwm model’s weak ability to capture lexical-level features,we employ Word2 vec to produce word embedding vectors and obtain more comprehensive lexical-level feature information.The Ro BERTa-wwm-generated character embedding vectors and Word2vec-generated word embedding vectors are concatenated to enhance the acquisition of character-level and word-level feature information.To further improve entity recognition accuracy,Bigru and IDCNN are employed to obtain features of different granularities.The proposed model achieved an F1 score of 93.98% on the enterprise support policy dataset. |