Research On The Methods Of Ancient Chinese Word Segmentation And Part-of-speech Tagging

Posted on:2019-10-07

Degree:Master

Type:Thesis

Country:China

Candidate:S C Yang

Full Text:PDF

GTID:2405330563490737

Subject:Computer application technology

Abstract/Summary:

In recent years,the research on Word Segmentation and Part-of-speech tagging in modern Chinese has achieved fruitful results while there are still some shortcomings in the study of ancient Chinese.As ancient Chinese words are mostly monosyllabic,the model distinguishes the polysyllabic words from being segmented and labeled as the key point to improve the performance of word segmentation and Part-of-speech tagging systems.In addition,the usage of words in ancient Chinese is quite flexible,there are many kinds of concurrent words and the use of Part-of-speech.That is,most sequences have different segmentation in different contexts or the same word has different Part-of-speech in different contexts.Recently,most of the research on word segmentation and Part-of-speech tagging in ancient Chinese is based on the Conditional Random Field model.Although good performance of word segmentation can be achieved,this method requires the manual development of a feature template,and because of the limitation of the feature window,the model can’t learn the long-range contextual features better.With the development of neural networks,Deep Learning methods show great performance in handling sequence data tasks.It can efficiently extract the features of sequence data and has been applied to speech recognition and text generation.This paper try to use Deep Learning methods to automatically extract long-distance contextual information of ancient Chinese to solve the problem that the original methods used in ancient Chinese word segmentation and Partof-speech tagging require the manual development of an empirical feature template.It is of great significance to the study of the ancient Chinese Word Segmentation and Part-of-Speech tagging.In this paper,a Part-of-speech tagging set is developed based on analyzing the ancient Chinese Part-of-speech,the use of phenomena and dictionaries.It is the tag of Part-of speech tagging.Based on the distributed hypothesis theory,the ancient Chinese characters(words)are trained into computer-recognizable and computable character(word)vectors at the semantic understanding level.A general model structure suited for ancient Chinese Word Segmentation and Part-of-speech tagging is proposed,and based on this model,the task of Word Segmentation and Part-of-speech tagging in ancient Chinese have achieved good results.

Keywords/Search Tags:

ancient Chinese, Word Segmentation, Part-of-speech tagging, Deep Learning

Related items

1	Research On The Integrated Processing Technology Of Sentence Segmentation And Lexical Analysis Of Ancient Texts Based On Deep Learning
2	Research On Thai Word Segmentation And Part-of-speech Tagging Based On Multi-granularity Feature
3	Research On Tibetan Word Segmentation And Part-of-speech Tagging Based On Pre-trained Language Models
4	Research On Tibetan Word Segmentation And Part-of-speech Tagging Based On GNN
5	Tibetan Segmentation And POS Tagging Study
6	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
7	Research And Implementation Of The Tibetan Part Of Speech Tagging System
8	Research On Word Segmentation And Part-of-speech Of Tibetan On Neural Network
9	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
10	Information Processing On Mencius And Its Commentations And Annotations