Font Size: a A A

Research On English Clause Identification For Machine Translation System

Posted on:2007-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:F MaFull Text:PDF
GTID:2155360212975715Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
To parse the compound sentences correctly, the first step is to identify the clauses. A clause is a grammatical unit that includes, at minimum, a predicate and an explicit or implied subject, and expresses a proposition. Clause identification is a process to identify clauses and tag them in the text. It belongs to the partial parsing (shallow parsing), which aims to recognize and parse the chunks. As the basis of further analyses, clause identification makes the syntactic analysis much easier.In the natural language processing, both the feature template and feature description could affect the tagging quality. This paper proposes a new method to describe grammatical rules by word features and sentence features, intuiated by Xavier's features. The experimental results have shown very good performance, especially in the clause end identification.The Maximum Entropy algorithm and the Bagging algorithm are used to implement the clause identification. Firstly, the clause identification is divided into three parts, including identification of clause start, clause end and the complete clause segmentation. In the third part, to deal with the high complexity, three subtasks are presented: the multi-start identification, the clause candidate identification and the complete clause tagging. The multi-starts identification and the clause candidate identification as well as the first two parts could be regarded as the classification problems, so the Maximum Entropy model is employed. In addition, considering the sentence structure and the analysis process of the brain, a clause exaction algorithm is proposed. Sencodly, a new clause identification method based on Bagging algorithm is proposed, which is developed upon the Maximum Entropy model. The multiple classifiers are employed by making bootstrap replicates of the training sets which in turn serve as new training sets. Then, the results of the classifiers are aggregated by a weighted sum method. Experiment results showed the validity of both methods, and higher accuracy of idenfication of Bagging method compared with the original Maximum Entropy model.Beside the development of the complete clause identification systems, extra experiments are designed to determine the best parameters in features experiment, smoothing experiment in Maximum Entropy and parameters experiment in Bagging.
Keywords/Search Tags:clause identification, Maximum Entropy, ensemble learning theory, Bagging, feature
PDF Full Text Request
Related items