Font Size: a A A

Chinese Named Entity Recognition Based On Font And Character Relation Graph Features

Posted on:2023-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2568307127483174Subject:Engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition is a key technology in the field of natural language processing.Existing studies do not make full use of font semantics,word relationship semantic information and the low recognition rate of named entity recognition model.In view of the above problems,the research is carried out by two aspects:Chinese character representation and Chinese named entity recognition.The main work is as follows:(1)Construct Chinese character vector graph data set and character relation data set to provide data for subsequent research on character vector representation method and named entity recognition.The Chinese character vector graph data set contains 3908 Chinese character vector graphs;The word relation dataset contains 66252 pairs of phrases,synonyms and antonyms.(2)This paper presents a method for representing scalable vector graphics to vector(svg2vec)based on glyphs to solve the problem that semantic information of glyphs is not fully used in the representation of Chinese characters.Firstly,a font self coder is constructed by using the variational self coder to extract the font features of Chinese characters,and the font vector is obtained,which is compared with word2vec,glove,gnm2vec and CWE vectors.The experimental results of Chinese word segmentation show that the F1 value of font vector on MSR dataset is increased by 1.67,0.12,1.69 and 1.34 respectively;The experimental results of short text similarity show that the average F1 value of glyph vector on CNN,self attention and LSTM models has increased by 3.28,2.03,0.04 and 0.31 respectively.(3)A representation method of word vector(relagraph2vec)based on word relation graph is proposed to solve the problem that the semantic information of word relation is not fully used in the representation of Chinese characters.Firstly,the data of word relation graph is established,and then the vector of word relation graph is obtained by training based on graph neural network algorithm.Finally,the comparison experiment is carried out with word2vec,glove,gnm2vec and CWE vectors.The experimental results of Chinese word segmentation show that on MSR dataset,the F1 value of word graph vector and glove vector is increased by 2.23,0.68,2.25 and 1.9 respectively;In the short text similarity experiment,the average F1 value of word graph vector on CNN,self attention and LSTM models increased by 4.47,3.22,1.23 and 1.5 respectively.(4)This paper presents a Chinese named entity recognition method based on the fusion of font and character graph features,which solves the problem of low recognition rate of Chinese named entity recognition model.Based on the BiLSTM-CRF model,a feature fusion embedding layer is added,and a Chinese named entity recognition model integrating the features of font and character relation graph is proposed.Subsequently,a comparative experiment was conducted with the BiLSTM-CRF model.The results showed that the F1 value on msraner data set was 0.4 higher than that of the BiLSTM-CRF model.(5)The automatic extraction system of database table structure is constructed.The effectiveness of the proposed named entity recognition method is verified in the system application.The proposed named entity recognition method is used to realize the function of automatic extraction of table structure.First analyze the user demand text,then extract the table name and field name,adapt the field data type,and finally automatically create the table.The system test results show that the recognition accuracy of table name and field name is 96%.To sum up,two words vector representation methods improve the semantic representation ability of word vector;the Chinese named entity recognition method based on the characteristics of font and character relation graph improves the recognition rate of the model.
Keywords/Search Tags:Word vector, Chinese character vector diagram, Graph neural network, Chinese named entity recognition
PDF Full Text Request
Related items