Font Size: a A A

Research And Application Of Korean Named Entity Recognition Method Based On Multi-Granularity Fusion

Posted on:2023-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:J L GaoFull Text:PDF
GTID:2545306617993829Subject:Electronic Information (in the field of computer technology) (professional degree)
Abstract/Summary:PDF Full Text Request
Named entity recognition is one of the key research topics in natural language processing.It not only enables the people to access the key information in articles at a high speed but also provides fundamental entity information for numerous downstream tasks.Research on intelligent information processing in Chinese Korean is still at an early stage of development and the study of the fundamental task of named entity recognition can provide the basis for more in-depth natural language processing tasks to follow.Therefore,the study of named entity recognition in Korean has great academic significance and research value.To address the unique linguistic features of Korean,this dissertation adopted the method of combining Korean text representation based on multi-granularity deep fusion and pre-training Korean language models to improve the effectiveness of Korean named entity recognition and realized a full-text retrieval prototype system based on the proposed NER model.First,this dissertation proposes a Korean multi-granularity fusion method to solve the problem caused by large number of suffix endings in Korean words and the rich linguistic processing units in Korean texts,which leads to the inaccurate delineation of named entity boundaries.Based on the traditional vector stitching fusion pattern,the proposed method fuses the differences and connections between the granularity and controls the fusion weights in the mode of weighted averages to achieve a deep fusion effect between the granularities.Second,to improve the representation of Korean texts,we adopted the fast Text pre-training language model and KLUE-BERT pre-training language model for word embedding representation for each granularity,and TENER,a Transformer-based named entity recognition model,as a feature extraction model.The KLUE-BERT pretraining model with bidirectional Transformer encoding was used for word embedding of Korean morpheme granularity to improve the feature representation ability of basic granularity;the fast Text static word embedding model was used to encode syllable and phoneme granularity to improve the text representation ability of fine granularity.The proposed model has a better feature extraction effect than the traditional RNN and CNN model.Finally,an entity-recognition-based Korean full-text retrieval prototype system was designed and implemented using the proposed NER model.In order to measure the degree of correlation between two texts,this dissertation proposes a entity-based text representation method which represents the text with the results of NER,and determines the correlation between texts with the Ochiai coefficient between the two entity sets,and finally obtains the text retrieval results in order of the correlation coefficients.The experimental results show that the proposed Korean NER method improved the performance a lot compared with other methods in different datasets.In the KLUE-NER dataset,our result shows a 4.22% improvement in F1-score compared to the BERT-based method.In the Klpexpo2016 dataset,the F1 value was improved by 3.18% compared with the currently most effective Bi-LSTM+CRF-based method.Meanwhile,test results of the implemented full-text retrieval prototype system show that the system has good entity-oriented retrieval performance.
Keywords/Search Tags:Korean named entity recognition, multi-granularity fusion, pre-training language model, entity-based text representation
PDF Full Text Request
Related items