Research On Automatic Word Segmentation Of Zuo Zhuan Based On Conditional Random Field

Posted on:2019-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:Q W Lu

Full Text:PDF

GTID:2415330602970092

Subject:Library and Information Science

Abstract/Summary:

PDF Full Text Request

Chinese automatic word segmentation is an important branch of Chinese information processing.At present,most of the researches on Chinese automatic word segmentation are aimed at modern Chinese,and the study of automatic segmentation in ancient Chinese is rather weak.The classics of pre-Qin Dynasty is an important way to understand the culture and history of pre-Qin Dynasty.Zuo Zhuan is one of the representative historical works in the pre-Qin period.Therefore,we select Zuo Zhuan as the object of study.Based on the characteristics of information processing in ancient Chinese,we adopt reverse maximum matching method and conditional random field model to realize automatic word segmentation of Zuo Zhuan.We have mainly done the following aspects:(1)We designed the automatic word segmentation algorithm of Zuo Zhuan based on the conditional random field model,including the selection of corpus,the annotation of the corpus,the selection of the features and the formulation of the feature template.We selected four words tagging system,and selected the character classification,part of speech,ancient voice,tone,rhyme,reverse cut,ancient sound and other features to carry out CRF training.(2)According to the automatic segmentation algorithm of Zuo Zhuan based on conditional random field model,we carried out specific participle experiments.We used the CRF++0.58 version toolkit to carry out the CRF automatic word segmentation experiment under different features and feature combinations.(3)We tested the automatic word segmentation algorithm of Zuo Zhuan based on conditional random field model.We separately set up the result of the Zuo Zhuan segmentation with the reverse maximum matching algorithm and the result of the Zuo Zhuan segmentation with the conditional random field method which is not added to any feature as the base line called Baselinel and Baseline2 respectively.We compared and analyzed all the experimental results with Baseline,compared the results of different experiments,got the comparison results,drew the contrast results and provided suggestions for future automatic segmentation of texts in pre-Qin period.Through the analysis and evaluation of the experimental results,the following conclusions are reached.(1)The effect of automatic word segmentation of ancient Chinese obtained by conditional random field is better than that obtained by the reverse maximum matching method.The F value of automatic word segmentation of Zuo Zhuan with the reverse maximum matching method is 93.4631%,whlie the F value of automatic word segmentation of Zuo Zhuan with the conditional random field can reach more than 95%.(2)In the automatic segmentation of Zuo Zhuan,the feature adding of "tune" and"ancient sound" can improve the segmentation precision of the system,while the feature of"character classification","sound","anti tangent" and"Rhyme" not only do not improve the segmentation efficiency of the system,but weaken the segmentation efficiency of the system.As the best feature of the experimental results,the effect of it has greatly improved the segmentation accuracy of Zuo Zhuan,and its F value can reach more than 99%.(3)We can not simply think that the effect of a feature in obtained in a single feature word segmentation experiment has a positive correlation with the effect of it in the multi feature word segmentation experiment,and there is no obvious correlation between the two.In the single feature experiment,the poor segmentation results may get better segmentation results in the combination feature experiment,and the better feature of the word segmentation results in the single feature experiment may be worse in the combination feature experiment.(4)Since most of the ancient Chinese is monosyllabic,the length of the feature template window for the automatic segmentation of ancient Chinese with conditional random fields should not be too long.In our Zuo Zhuan segmentation experiments,the feature template with 1 window length has the best segmentation effect.The main contributions of this article are as follows:(1)This paper designs an automatic word segmentation method of Zuo Zhuan based on conditional random field model,which combines character classification,word character,ancient sound,tone,rhyme,rhyme,antiquity and other features,so as to improve the effect of word segmentation.(2)In the training of the conditional random field model,this paper aads the features of different numbers,and evaluates the influence of different feature combinations on the effect of automatic word segmentation.In future automatic segmentation of ancient Chinese,we can first consider adding the feature combination which has good performance in the word segmentation of Zuo Zhuan.This undoubtedly has some enlightening significance to the automatic segmentation of pre-Qin texts.But in this paper,the word segmentation method as a base line is relatively unitary.The thesaurus in the experiment of word segmentation based on word list is also relatively simple.In this regard,we can take a more abundant method to compare the experiment results in the next step,and we can take a more rich word table(such as the note list)to carry out the experiment of the word segmentation based on the word list.

Keywords/Search Tags:

Zuo Zhuan, automatic word segmentation, conditional random field, reverse maximum matching method

PDF Full Text Request

Related items

1	Experimental Study On The Fusion Of Dictionary Segmentation And Model Word Segmentation In Chinese
2	The Research On Tibetan Automatic Word Segmentation Technology
3	The Study Of Automatic Chinese Phoneticize Label Based On Automatic Word Segmentation
4	Research On Automatic Texts Segmentation And Word Segmentation For Ancient Chinese Texts
5	Desigh And Implement Of Parser Based On Grammar Function And Collocation
6	Research On Color Restoration Of Mural Image Based On Convolutional Neural Network
7	Research On Automatic Recognition Of Tibetan Word Words
8	Information Processing On Mencius And Its Commentations And Annotations
9	A Study On Cantonese Word Segmentation Specification For Information Processing
10	Tibetan Segmentation And POS Tagging Study