Font Size: a A A

A Named Entity Recognition Method For Text Of Han Dynasty Paintings

Posted on:2023-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:J H XuFull Text:PDF
GTID:2545306614972619Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Han painting refers to the painting on stones,bricks and other artefacts in the Han Dynasty.And it is one of the most iconic art forms which presents the rich and varied life of the Han dynasty.The access to information from the textual data in the field of Han painting helps us to sort out the historical context of the Han dynasty and to understand the deeper connotations of traditional culture.Named entity recognition(NER)is the basic task for text data processing has been achieved with good results in the conventional fields,but there are many adaptation problems when applied to the highly specialized Han painting domain texts.Therefore,a named entity recognition method for texts in the field of Han painting has been proposed in this paper.In the face of a large number of syntactic structures that mixed ancient and modern Chinese and the presence of a large number of unfamiliar terms in the field of Han painting,the existing named entity recognition methods fail to provide the necessary high-quality entity lists and corpora,which brings challenges to the task of named entity recognition for the texts in the field of Han painting.This paper is dedicated to the named entity recognition for the texts in the field of Han painting and the contributions are concluded as follows.(1)The grammar and terminology of texts in the field of Han painting are analyzed,and a lexicon-based word segmentation method for the text in the field of Han painting was proposed.On the one hand,our proposed method generates tagging data based on the lexicon and combines the randomly selected words from the lexicon into pseudotagged sentences.And a large number of pseudo-tagged sentences,the modern Chinese corpus and a small number of expert word segmentation corpus are combined to build the corpus.On the other hand,our method replaces rare terms in the text based on the lexicon to avoid the impact of rare terms on word separation.The experimental results show that the word segmentation method proposed in this paper achieves excellent results,with an F1 score of 94.37%on the closed dataset of the Han painting domain.(2)The part-of-speech tagging and recognition method was proposed based on the word segmentation to reduce the error rate of character recognition by using word holistic recognition methods.Firstly,similarity annotation and transfer learning were used to expand the corpus,and then a part-of-speech tagging recognition model was constructed for the texts in the field of Han painting.The experimental results show that the named entity recognition method proposed in this paper improves the F1 score by about 1.5%over the traditional tagging-based methods.(3)The text in the field of Han painting was processed based on the named entity recognition method that proposed in this paper and a knowledge graph in the field of Han painting was constructed.And a Han painting image retrieval system based on the knowledge graph was implemented.Specifically,the functions of the system include querying the related images,the synonyms and near-synonyms of the entities,and querying the entities associated to the images.
Keywords/Search Tags:Chinese word segmentation, Named entity recognition, Part-of-speech tagging, Deep learning, Han painting knowledge graph
PDF Full Text Request
Related items