Font Size: a A A

Research And Implementation Of Uyghur-Chinese Translation Software Based On Optical Character Recognition

Posted on:2019-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:E D T L J MaiFull Text:PDF
GTID:2428330566466613Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,with the rapid development of social science and technology,the popularity of smart phones has risen in a straight line due to the convenience of mobile phones being portable and functions have become increasingly sophisticated,and the scale of mobile Internet users has become larger.Nowadays at home and abroad,the OCR recognition technology for English,Chinese and other languages is quite mature,and the use of OCR technology for text translation in a certain language has become more and more popular.However,in Xinjiang,OCR technology is used to realize the identification and translation of Uighur texts.Research is not yet mature.Therefore,researching Uighur OCR technology and machine translation technology will play an active role in Xinjiang's economic construction,cultural exchanges among all ethnic groups,and accelerating the development of Uighur text information.This paper mainly studies the UCR(Ukraine OCR)and UW statistical machine translation technologies,and trains the Uyghur graphic recognition training model on the Tessetact-OCR platform,and uses it as the basis to develop the Uygur optical character on the Android platform.The integrated application of recognition and translation realizes the recognition of text information and real-time translation from Uyghur images.Firstly,in terms of Uyghur character-based image recognition,the system uses local adaptive threshold binarization and morphological closed operations to perform image processing algorithms to preprocess the target image and improve the recognition success rate of Tessetact-OCR.The scaled watershed segmentation algorithm segmented the Uyghur images and then used the Tesseract engine to train Uighur.Then,in terms of vocabulary storage and translation,49,000 Uyghur words and parallel sentence pairs were prepared.The NiuTrans Server tool kit was used to build the Uyghur-Chinese translation system,and the translation function was provided on the Azure cloud platform to provide APIs for the client,and finally Java was used.Language Android client integrated development environment to achieve Android client.
Keywords/Search Tags:OCR, Azure cloud platform, Android, Tesseract, Machine statistics translation
PDF Full Text Request
Related items