Font Size: a A A

Uyghur Interlanguage Corpus Construction Management System And Automatic Tagging Technology

Posted on:2017-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:W L J A Y T MaiFull Text:PDF
GTID:2335330488469819Subject:Agricultural information technology
Abstract/Summary:PDF Full Text Request
The construction of process of corpus is a huge project. A large number of have already established corpus and Inter-Language Corpus research results can fully testify it. Building process also requires different levels of professional involvement. Thus achieving Uygur Interlanguage systems have a profound effect on improving the quality of teaching and training of Uyghur professional.The system is carried out Research and Design mainly from three aspects. Firstly, for the construction of the Uyghur inter-language corpus proposed a overall plan, including mark design code, analyze the content and scope of data and collect the data. Secondly, this study aims to design and develop Uyghur inter-language corpus based on JAVAEE and solve the format-error problem of the mixed format of the Uyghur letter,numbers,English letters and punctuations. Designed JS Uyghur letter input package users don't need to install the third party Uyghur letter input method,it is directly solved in this system. This system enables us to enter inter-language corpus, audit, annotation and retrieval, etc. Thirdly, due to the corpus collected information is various, the process of tagging need manual annotation. On huge workload, two methods was studied, they are respectively, the errors dictionary library with string matching method and the language model training method.Currently, Uyghur Inter-language system after testing has been put into use, automatic tagging technology methods have been tested to achieve the desired objectives to work well.
Keywords/Search Tags:uyghur language, inter-language, corpus construction, automatic annotation, language model
PDF Full Text Request
Related items