Font Size: a A A

Language identification for Instant Chat translation

Posted on:2012-01-31Degree:M.ScType:Thesis
University:The University of Regina (Canada)Candidate:Bailey, Robert BruceFull Text:PDF
GTID:2465390011968356Subject:Artificial Intelligence
Abstract/Summary:
If two users speaking different languages wish to communicate with each other using an internet chat program, a machine translation system must be present, and a means of identifying the languages of both users must be provided for this machine translation system. This thesis presents the Instant Chat Translator system, which fulfills these two requirements in a unified manner. The task is difficult because language identification in a chat environment has three issues not typical of language identification in general: the texts are very short, the channel is noisy, and nonnative character sets are used. The Instant Chat Translator system combines a novel high-quality language identification system with three existing software packages: the D-Bus interprocess communication system, the Pidgin chat system, and the Moses machine translation system. The overall system catches messages received as text input by the chat system, identifies the language of these messages, translates them if necessary, and presents the possibly translated messages to the user. It is presented as a proof-of-concept work to demonstrate the feasibility of providing an instant translator for a chat system. Testing demonstrated that very high levels of identification accuracy are obtained even when dealing with tiny amounts of often noisy input text. An average accuracy of 99.61% was obtained for identifying sentences 10 words in length across 7 languages. For the same 7 languages, the accuracy of identifying the language of individual words was 75%. Another goal of this research was to assess training using text solely from conventional corpora versus a combination of such texts with some from noisy channel environments. Experiments showed that the latter may lead to higher accuracy.
Keywords/Search Tags:Chat, Language, Translation, Accuracy
Related items