Font Size: a A A

Research On English Long Sentence Partitioning For Machine Translation

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZuoFull Text:PDF
GTID:2235330395487137Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation is an important research aspect in the field of natural languageprocessing. Long sentence processing, which serves machine translation, is a difficultresearch point. With the development of information technology, sentences with a largenumber of words and with complex structures have already appeared in people’s lives. How totranslate these sentences effectively and reasonably becomes a tough problem for the presentmachine translation systems.Sentence splitting which can improve the translation quality is adapted for processingEnglish long sentences in this paper. In order to split a sentence reasonably, two rule-basedapproaches are proposed in this paper.The first approach relies on sentence part of speech information for completing sentencesplitting. Firstly, in order to benefit upcoming steps, some components within sentences aremerged so as to “shorten” these sentences. Secondly, sentences are split by recognizingcoordinate sub-sentences which are all relatively independent. These sub-sentences are easierto be processed by machine translation systems. And finally, clauses within coordinatesub-sentences are recognized for splitting. After processing clauses, the main frame ofsentences become quite clear and sentence structures are simplified.However, problems, such as low coverage rate of rules, fewer linguistic features usedand deep dependence of part of speech, have influenced the application of above approach. Inorder to eliminate these problems, another error-driven based splitting method is introduced.Within the new method, sentence components that might affect splitting and some simplephrases are combined for reducing split errors; then natural split points are used for splittinglong sentences including coordinate sub-sentences and clauses; and finally, lots of linguisticfeatures such as excerpt length, excerpt grammatical structure and so on are applied formodifying split errors.Experiments on the corpus of NTCIR-9proved the efficiency of both approaches. Inorder to check the actual translation effect of sentence splitting, the split results are applied to Google online translation platform. BLEU values of the two approaches improve4.42%and11.26%separately.
Keywords/Search Tags:machine translation, long sentence partitioning, regular match, error driven, sentence pattern
PDF Full Text Request
Related items