Font Size: a A A

A Study Of Long Sentence Parsing In English-chinese Machine Translation

Posted on:2011-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2155360308454401Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
After decades of development and improvements, as the information and computer techniques are being perfected day by day, the research on Machine Translation (MT) has become a comprehensive discipline from the very beginning of computational linguistics, which is now related to many other subjects, such as semantics, mathematics, corpus, computer science, Artificial Intelligence(AI) technology and biology, etc. However, the translation quality of MT still can't meet requirements, especially lacking capacities of analyzing long sentences. Although compared to the past decades, there are qualitative leaps in the computer technology and many related technologies, the problem of parsing long sentences is still a formidable obstacle in this field.The methods of parsing long sentences are quite different from finding word meanings from dictionaries. The lexical translation only needs lemmatization and tokenization to check the original forms of words in the data base. The lexical analysis is only the initial procedure of the parsing process, in the following, it also needs to identify the context-sensitive ambiguous, the far-distance related words. And then during the procedure of parsing the sentence structures, the problems, such as phrases included in the sub-clause, phrases introduce clauses, and the relations between them also need to be dealt properly. Thus, the subject of whether these complex sentence structures can be identified and transformed into Target Language (TL) with correct word order has become the restricting factor of the MT development. Complex long sentences can be seen everywhere in English, many of these long sentences can even be the whole paragraph. The result of long sentence analysis can impact the qualification and the readability of the MT. On the other hand, the demand for translation service is facing the severe insufficient circumstance, and this kind of insufficiency really may affect the capabilities of capturing useful information. Thus, the first feasible solution to this problem is using MT to fill the vacancies. Spontaneously, the problem of how to analyze long sentences has become the key problem of fulfilling the Informationization Strategy in China.In this thesis, the history of MT and some basic translation methods were introduced at the very beginning, and then introduced the details of present MT systems'sentence parsing methods. Based on all the above, in the last part of the thesis long sentence parsing system structures and several significant difficulties existing in current MT studies are discussed in detail. Especially in the Chapter 3, in which long sentence parsing is fully discussed, and thesis emphasizes on the problems of the basic principles and sentence parsing methods, of defining and identifying the long sentence and of difficulties in analyzing long sentence. Then MT dictionary and its structure are discussed in the following part, and the thesis also proposes some improving suggestions to the designs of MT dictionaries. At last a sample sentence was used to demonstrate how MT system works with an improved dictionary. In the final chapter, some questions that are encountered during the studying of this subject are discussed.
Keywords/Search Tags:machine translation, long sentence, sentence parsing, FAHQMT, Right-to-Left algorithm
PDF Full Text Request
Related items