Font Size: a A A

Research On Semantic Differences In Ancient Chinese Based On Word2vec And Bert

Posted on:2024-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2545306938479654Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the study of semantic differences in ancient Chinese has attracted much attention,and Chinese-Japanese homographs are especially hot topics.On the one hand,the study of Chinese-Japanese homographs is helpful to deepen people’s understanding of the two languages.On the other hand,the existence of Chinese-Japanese homographs is easy to cause lexical misunderstanding and ambiguity.After research,this paper finds that scholars mainly use dictionaries and cases to investigate the semantics of homomorphic words one by one,but few papers numeralize the semantic differences of homomorphic words.Therefore,this paper hopes to design an algorithm that can directly calculate the semantic difference values of Chinese-Japanese homographs.By the size of semantic difference value,we can directly judge whether a word is a homonym or a synonym.This paper makes an in-depth study of Chinese and Japanese homographs with the help of the Dadi corpus constructed by the research group of " Construction and Research of Japanese Kanji Corpus ".According to the idea that Word2vec and Bert models can transform words into vectors and combine the distance between word vectors to represent the semantic difference of words,this paper proposes three methods of Word2vec-mask.Word2vec-distance and Bert-distance,to quantify the semantic differences between Chinese and Japanese homographs.Besides.Bert-distance can encode words into different vectors according to the context,and represent semantic differences by the distance between word vectors.After using the test set to evaluate,it is found that the Bert-distance method has a good effect,with the best auc of 79.58%.precision rate of 57.89%,recall rate of 82.09%and F1 score of 0.68.After using the test set to evaluate ChatGPT,it is found that the precision and recall rates are 30.77%and 11.59%,respectively,which are much lower than Bert-distance.Besides,The method proposed in this paper can not only help scholars to quickly screen out the homomorphs with possible semantic differences,but also corroborate the traditional research results with the calculated values,making the conclusions more convincing.For example,using Bert-distance to select 50 single-word words and 50 double-word words with large semantic differences,it is found that 32%words do have semantic differences,which greatly speeds up the speed of scholars in finding homonyms between China and Japan.In addition.cultural exchanges inevitably bring about changes in semantic differences.By measuring the change of the mean value of semantic differences over time,this paper can quantify the frequency of cultural exchanges between China and Japan,which expands the application direction of semantic difference value.The research shows that the semantic difference value of the Sui and Tang Dynasties and the Song and Yuan Dynasties is low.At that time,Japan was at the peak of learning Han culture.
Keywords/Search Tags:Semantic difference, Chinese-Japanese homographs, Word2vec, Bert
PDF Full Text Request
Related items