Font Size: a A A

Research On The Methods Of Ancient Chinese Word Segmentation And Part-of-speech Tagging

Posted on:2019-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:S C YangFull Text:PDF
GTID:2405330563490737Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,the research on Word Segmentation and Part-of-speech tagging in modern Chinese has achieved fruitful results while there are still some shortcomings in the study of ancient Chinese.As ancient Chinese words are mostly monosyllabic,the model distinguishes the polysyllabic words from being segmented and labeled as the key point to improve the performance of word segmentation and Part-of-speech tagging systems.In addition,the usage of words in ancient Chinese is quite flexible,there are many kinds of concurrent words and the use of Part-of-speech.That is,most sequences have different segmentation in different contexts or the same word has different Part-of-speech in different contexts.Recently,most of the research on word segmentation and Part-of-speech tagging in ancient Chinese is based on the Conditional Random Field model.Although good performance of word segmentation can be achieved,this method requires the manual development of a feature template,and because of the limitation of the feature window,the model can't learn the long-range contextual features better.With the development of neural networks,Deep Learning methods show great performance in handling sequence data tasks.It can efficiently extract the features of sequence data and has been applied to speech recognition and text generation.This paper try to use Deep Learning methods to automatically extract long-distance contextual information of ancient Chinese to solve the problem that the original methods used in ancient Chinese word segmentation and Partof-speech tagging require the manual development of an empirical feature template.It is of great significance to the study of the ancient Chinese Word Segmentation and Part-of-Speech tagging.In this paper,a Part-of-speech tagging set is developed based on analyzing the ancient Chinese Part-of-speech,the use of phenomena and dictionaries.It is the tag of Part-of speech tagging.Based on the distributed hypothesis theory,the ancient Chinese characters(words)are trained into computer-recognizable and computable character(word)vectors at the semantic understanding level.A general model structure suited for ancient Chinese Word Segmentation and Part-of-speech tagging is proposed,and based on this model,the task of Word Segmentation and Part-of-speech tagging in ancient Chinese have achieved good results.
Keywords/Search Tags:ancient Chinese, Word Segmentation, Part-of-speech tagging, Deep Learning
PDF Full Text Request
Related items