Research On Entity Recognition For Enterprise Support Policy Text

Posted on:2024-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:W F Zhu

Full Text:PDF

GTID:2568307124971499

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Enterprise support policy documents are usually unstructured text data,and traditional natural language processing tasks perform poorly on them.Named entity recognition is used to achieve structured processing of unstructured text in the field of enterprise support policy documents by inputting pre-defined entity types in the language corpus.However,using traditional named entity recognition models on enterprise support policy texts can result in low entity recognition accuracy and missing entity information due to the large number of proprietary terms.In addition,the scarcity of annotated corpora related to enterprise support policy texts leads to poor entity recognition performance on such texts.This study utilizes named entity recognition technology to identify important entity information in enterprise support policy texts.(1)To address the issue of poor model training due to the scarcity of annotated corpora,we first preprocessed a dataset of 5512 corpora obtained from government websites to obtain unstructured enterprise support policy text data.We then analyzed the structure and content of enterprise support policy texts and defined the information needed by enterprises to apply for support policy as entity categories.We manually labeled the dataset with the defined entity categories to obtain a standardized entity annotation corpus for training the named entity recognition models,making a contribution to research in this field.(2)To address the issue of poor recognition performance of traditional NER models on enterprise support policy texts due to the presence of proprietary terms,we propose an enterprise support policy text named entity recognition model that combines Ro BERTa-wwm and Bi LSTM-CRF.Ro BERTa-wwm is used to train dynamic word vectors to represent word polysemy,and Bi LSTM network is used to further extract contextual information and semantic features of enterprise support policy texts.Finally,the best predicted sequence is obtained through conditional random field.The proposed model achieved an F1 score of 91.70% on the enterprise support policy dataset,demonstrating its effectiveness in recognizing named entities in enterprise support policy texts.(3)To address the issue of Ro BERTa-wwm model’s weak ability to capture lexical-level features,we employ Word2 vec to produce word embedding vectors and obtain more comprehensive lexical-level feature information.The Ro BERTa-wwm-generated character embedding vectors and Word2vec-generated word embedding vectors are concatenated to enhance the acquisition of character-level and word-level feature information.To further improve entity recognition accuracy,Bigru and IDCNN are employed to obtain features of different granularities.The proposed model achieved an F1 score of 93.98% on the enterprise support policy dataset.

Keywords/Search Tags:

Named entity identification, Supporting policy text, RoBERTa-wwm, Word fusion

PDF Full Text Request

Related items

1	Research On Chinese Named Entity Recognition Based On XLNet And Word Segmentation Fusion Coding
2	Research On Chinese Named Entity Recognition Based On RoBERTa-WWM
3	Research And System Implementation Of Network Text Named Entity Recognition Based On Large Events
4	Research On Named Entity Recognition Method For Network Security Domain
5	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
6	Chinese-Slavic Mongolian Named Entity Translation Based On Word Alignment
7	Research On Chinese Text Oriented Named Entity Intelligent Recognition Model
8	The Research On Improvement Of Chinese Named Entity Recognition Method Based On Deep Learning
9	Research On Named Entity Recognition And Entity Link Method For Short Text Questions
10	Research On Surgical Intelligence Question Answering Based On Knowledge Graph