Research On English Clause Identification For Machine Translation System

Posted on:2007-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:F Ma

Full Text:PDF

GTID:2155360212975715

Subject:Military Intelligence

Abstract/Summary:

PDF Full Text Request

To parse the compound sentences correctly, the first step is to identify the clauses. A clause is a grammatical unit that includes, at minimum, a predicate and an explicit or implied subject, and expresses a proposition. Clause identification is a process to identify clauses and tag them in the text. It belongs to the partial parsing (shallow parsing), which aims to recognize and parse the chunks. As the basis of further analyses, clause identification makes the syntactic analysis much easier.In the natural language processing, both the feature template and feature description could affect the tagging quality. This paper proposes a new method to describe grammatical rules by word features and sentence features, intuiated by Xavier's features. The experimental results have shown very good performance, especially in the clause end identification.The Maximum Entropy algorithm and the Bagging algorithm are used to implement the clause identification. Firstly, the clause identification is divided into three parts, including identification of clause start, clause end and the complete clause segmentation. In the third part, to deal with the high complexity, three subtasks are presented: the multi-start identification, the clause candidate identification and the complete clause tagging. The multi-starts identification and the clause candidate identification as well as the first two parts could be regarded as the classification problems, so the Maximum Entropy model is employed. In addition, considering the sentence structure and the analysis process of the brain, a clause exaction algorithm is proposed. Sencodly, a new clause identification method based on Bagging algorithm is proposed, which is developed upon the Maximum Entropy model. The multiple classifiers are employed by making bootstrap replicates of the training sets which in turn serve as new training sets. Then, the results of the classifiers are aggregated by a weighted sum method. Experiment results showed the validity of both methods, and higher accuracy of idenfication of Bagging method compared with the original Maximum Entropy model.Beside the development of the complete clause identification systems, extra experiments are designed to determine the best parameters in features experiment, smoothing experiment in Maximum Entropy and parameters experiment in Bagging.

Keywords/Search Tags:

clause identification, Maximum Entropy, ensemble learning theory, Bagging, feature

PDF Full Text Request

Related items

1	Research On The Methods Of Chinese Noun Compounds Identification And Classification
2	Identification Of Non-clauses Among Lanuage Fragments Between Punctuation In Chinese Complex Sentences
3	The Entropy Theory Of Film&TV Arts Communication
4	Research On Automatic Chinense Q&A System Based On Syntax Analysis And Machine Learning
5	Identification Of Dependency Relations Between Clauses Within Clause Complexes Based On The MOOD System
6	Entropy And Its Application In Human Minds In Something Happened
7	Mining And Recognition Of English Learning Patterns For Mobile Users Based On Time Series Clustering And Ensemble Learning
8	The Research On Sentiment Analysis Of Movie Reviews Based On Improved Word2vec And Ensemble Learning
9	Prediction Research On Movie Box-office Based On Stacking Ensemble Learning
10	Research On Electronic Music Feature Analysis And Genre Classification