Research And Application Of Korean Named Entity Recognition Method Based On Multi-Granularity Fusion

Posted on:2023-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:J L Gao

Full Text:PDF

GTID:2545306617993829

Subject:Electronic Information (in the field of computer technology) (professional degree)

Abstract/Summary:

PDF Full Text Request

Named entity recognition is one of the key research topics in natural language processing.It not only enables the people to access the key information in articles at a high speed but also provides fundamental entity information for numerous downstream tasks.Research on intelligent information processing in Chinese Korean is still at an early stage of development and the study of the fundamental task of named entity recognition can provide the basis for more in-depth natural language processing tasks to follow.Therefore,the study of named entity recognition in Korean has great academic significance and research value.To address the unique linguistic features of Korean,this dissertation adopted the method of combining Korean text representation based on multi-granularity deep fusion and pre-training Korean language models to improve the effectiveness of Korean named entity recognition and realized a full-text retrieval prototype system based on the proposed NER model.First,this dissertation proposes a Korean multi-granularity fusion method to solve the problem caused by large number of suffix endings in Korean words and the rich linguistic processing units in Korean texts,which leads to the inaccurate delineation of named entity boundaries.Based on the traditional vector stitching fusion pattern,the proposed method fuses the differences and connections between the granularity and controls the fusion weights in the mode of weighted averages to achieve a deep fusion effect between the granularities.Second,to improve the representation of Korean texts,we adopted the fast Text pre-training language model and KLUE-BERT pre-training language model for word embedding representation for each granularity,and TENER,a Transformer-based named entity recognition model,as a feature extraction model.The KLUE-BERT pretraining model with bidirectional Transformer encoding was used for word embedding of Korean morpheme granularity to improve the feature representation ability of basic granularity;the fast Text static word embedding model was used to encode syllable and phoneme granularity to improve the text representation ability of fine granularity.The proposed model has a better feature extraction effect than the traditional RNN and CNN model.Finally,an entity-recognition-based Korean full-text retrieval prototype system was designed and implemented using the proposed NER model.In order to measure the degree of correlation between two texts,this dissertation proposes a entity-based text representation method which represents the text with the results of NER,and determines the correlation between texts with the Ochiai coefficient between the two entity sets,and finally obtains the text retrieval results in order of the correlation coefficients.The experimental results show that the proposed Korean NER method improved the performance a lot compared with other methods in different datasets.In the KLUE-NER dataset,our result shows a 4.22% improvement in F1-score compared to the BERT-based method.In the Klpexpo2016 dataset,the F1 value was improved by 3.18% compared with the currently most effective Bi-LSTM+CRF-based method.Meanwhile,test results of the implemented full-text retrieval prototype system show that the system has good entity-oriented retrieval performance.

Keywords/Search Tags:

Korean named entity recognition, multi-granularity fusion, pre-training language model, entity-based text representation

PDF Full Text Request

Related items

1	A Study On Acquiring Chinese-English Named Entity Translation Equivalents Based On Comparable Corpus
2	Research On Chinese-Vietnamese Entity Alignment Technology Based On Named Entity Recognition
3	Named Entity Recognition For The Field Of Ancient Chinese
4	Recognition Of Uyghur Musical Named Entity Based On CRF
5	Research On Tibetan Named Entity Recognition Based On Pre-training
6	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
7	Research On Named Entity Recognition Based On Ancient Book Corpus
8	Research On Urdu Named Entity Recognition Based On Attention Bi-LSTM-CRF
9	Cambodian Named Entity Recognition Based On The Topic Model Word Vector
10	Research On Chinese Named Entity Recognition Based On Annotation Schemes And Character-word Fusion