Font Size: a A A

Chinese - Thai Translation And Extension Of Cross - Language Query

Posted on:2017-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2175330488465665Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development of relationship between China and Thailand, the communication of two countries is deepening in aspect of culture, economy, politics and so on. However, Since Chinese and Thai belong to different language, there are many obstacles in network information exchange and communication. In order to better share information on the Internet and promote the information exchange between Chinese and Thai, the research of cross-language information retrieval technology has been on the agenda, it can be more effectively to solve the problem of communication obstacles. In order to improve the performance of cross-language information retrieval, we studied the query translation and expansion of Chinese and Thai.Currently, the research of Chinese-Thai language information technology mostly focuses on their own lexical, syntactic, and semantic. however, the research of information processing between Chinese and Thai is still relatively scarce, such as the research of machine translation between Chinese and Thai, query expansion and so on. Currently, in the research of translation between Chinese and Thai, there are no suitable translation tool, such as bilingual dictionary, and there are many unknown words and named entities, and many problems of translation ambiguity. For the query expansion, due to the absence of a suitable knowledge base as an extension of the source word, the study of Thai word expansion has encountered many difficulties. To solve those problems, the following work is carried out in this paper:(1) The translation method of Chinese query which is based on Word2VecFor the translation of Chinese query, this paper presents a translation method of Chinese and Thai which is based on Word2Vec. Firstly, the word of text in Chinese-Thai comparable corpus should be trained by using Word2Vec tool, then the word vectors can be formed, and we can easily discover that the linear mapping relationship between the bilingual word vectors. The linear mapping relationship of this word vector is reflected in the different languages, such as two words which have similar concept will be mapped to similar spatial distribution. The similar spatial distribution between the word vectors of Chinese and Thai is used to train the translation matrix, and the translation candidate work of Thai can be obtained through the translation matrix.(2) The selection method of Thai translation candidate wordFor the problem of ambiguity, this paper present a method for selecting Thai translation candidate word, this method take advantage of the relationship of translation probability of bilingual words, and combining with the relationship between monolingual words. At last, the best Thai words which eliminate ambiguity can be obtained.(3) The construction of Thai Query Expansion ModelIn order to improve the performance of retrieval system, we propose the Thai query expansion method which is base on pseudo feedback. The core idea of this method is that: Firstly, we get the related documentation set by using the technology of Lucene search, this related documentations are used as a source of expansion terms. Secondly, the expansion method of KL distance combine with word co-occurrence by using the Borda Count for selecting query expansion words. Finally, expansion words are added to original query and Thai query expansion sentence can be got.(4) In this paper, we design a prototype system of cross-language query expansion, and provide a platform for further study of cross-language information retrieval, and it will be a good foundation for the study of cross-language query expansion.
Keywords/Search Tags:Cross-language Information Retrieval, Query Expansion, Query Translation, Translation Disambiguation, Word2Vec
PDF Full Text Request
Related items