Research On Textual Similarity Of Ancient Chinese Annotated Corpus Based On Deep Learning

Posted on:2024-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2555307154964989

Subject:Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

The study of Chinese Sound and Meaning research requires convergence of materials with phonetic-semantic relationships to achieve morphophonetic-semantic interpolation,but the gain and loss of information at the interpretive level in the process of document transmission,such as training and interpretation methods,synonym substitution,and differences in word usage,make the manual association of similar interpretive texts less efficient and accurate.Therefore,it is necessary to find a deeper text similarity calculation solution to solve these problems.Text similarity calculation has been widely used in the fields of ancient Chinese search engines and precise pushing of documents.However,existing similarity algorithms have limitations such as feature dimensionality and redundant features when facing the ancient Chinese annotated corpus,resulting in less than ideal clustering results.To solve these problems,this study proposes a similarity calculation and text clustering method for ancient Chinese annotated texts.The method can help researchers quickly determine the possible differences in phonetic-syntactic matching,and realize the auxiliary work of form-phonetic-syntactic interpolation.In this study,we use the paraphrased texts in the ancient Chinese annotated corpus as the research content,and complete the automatic word separation and similarity calculation for the annotated corpus with the help of pre-trained language models,and realize the clustering and association of similar texts.The study of text similarity calculation for ancient Chinese annotated corpus needs to start from several aspects,such as corpus,construction of word separation model and similarity calculation methods.Firstly,we constructed a database of ancient Chinese annotated corpus,annotated the annotated texts into fields,and extracted the core fields of the annotated texts as the experimental corpus for similarity calculation and text clustering;Secondly,we focus on the problem that many current automatic word separation methods are not ideal for the ancient Chinese annotated corpus,improve the accuracy of manual word separation annotation by establishing word separation specifications,and fine-tune the pre-training model based on BERT neural network to fully integrate the text features of the annotated corpus,and realize an automatic word separation model for the field of ancient Chinese annotated corpus——Cishu BERT;Thirdly,based on the fine-tuned model of Cishu BERT,the similar annotated corpus was trained again with manual annotation to improve the feature learning ability of the model,and the similarity calculation and text class clustering for the ancient Chinese annotated corpus were successfully realized;Finally,the texts are clustered according to similarity,and a knowledge map of the ancient Chinese annotated corpus is constructed to realize the phonetic-semantic relationship representation,which assists researchers to discover potential phoneticsemantic relationships conveniently.

Keywords/Search Tags:

The study of Chinese Sound and Meaning, Ancient Chinese automatic word segmentation, text clustering, text similarity, Knowledge Graph

PDF Full Text Request

Related items

1	Research On Automatic Texts Segmentation And Word Segmentation For Ancient Chinese Texts
2	The Study Of Automatic Chinese Phoneticize Label Based On Automatic Word Segmentation
3	The Study On Chinese Text Segmentation
4	Research On Ancient Chinese Character Recognition Based On Object Detection And Knowledge Graph
5	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
6	A Study On Cantonese Word Segmentation Specification For Information Processing
7	Research On Named Entity Recognition And Knowledge Graph Construction Of Chinese Classical Literature Texts
8	The Quantitative Study About The Impact Of Chinese Text Segmentation On Learning Efficiency For The Students From Japan And South Korea
9	Research And Design Of The English Essay Similarity Detection System For Chinese College Students
10	Effect Of Word Segmentation On International Students Reading Chinese Text As A Second Language