Font Size: a A A

Design Principle And Application Platform Text Annotations

Posted on:2015-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M YangFull Text:PDF
GTID:1265330431966226Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
The building of large databases of Chinese languages has developed rapidly in terms of phonetics and vocabulary, while that of syntax has developed slowly, which can be explained as follows. Firstly, the text-based language resources are ignored due to constraints by academic concepts. Secondly, the development of the study of text annotation syntax has been restricted by research methods. Finally, there are not enough researchers while there are a great number of minority languages in China. Nowadays more and more linguists have become aware of the importance of the study of text resources, and there have been some achievements in the study of syntax with text annotation. But there are some problems with the methods which are used to annotate and analyze syntax, which are not good enough for languages in China, especially in handling tone languages. Therefore, it is very necessary to design and develop a research platform used to process texts, thus realizing syntax annotation, supported by computer technology.The main objective of this study is to design a syntax research platform which is suitable for text annotation of Chinese languages, with practicality and efficiency, and linguists can complete annotating languages from raw materials to annotated materials efficiently and accurately, thus establishing corpus with high efficiency.This paper focuses on the two aspects, on the one hand, to expand the corpus made by researchers themselves by way of improving the sources of data; On the other hand, to complete syntax annotation accurately and efficiently by way of improving text resources processing methods. The basic technology consists of three components:input technology, text processing technology, and output technology The design principles and strategy of these three parts of is the overall framework of the platform, providing the researchers with a syntax study platform which is more appropriate for Chinese languages and used for grammar parsing and text annotation. The thesis is divided into eight chapters:Chapter One:To analyze the current situation of language resources and syntax annotation, therefore to prove the necessity and importance of the study;Chapter Two:To introduce the overall framework of text annotation platform and the design principles of main technical methods in this paper;Chapter Three:To get a variety of text resources by way of input technology provided in this paper, and to introduce new ways of forming new texts by way of quick entry;Chapter Four:To introduce the importance of dictionaries in the research platform, and the interactive technology of text and dictionaries, interlaced control, jump-insert method and the dictionary editing.Chapter Five:Syntactic analysis:matching algorithm used in multilingual text improves the efficiency and accuracy of text segmentation and matching annotation. Also, this chapter introduces the importance of text word segmentation and implementation strategies.Chapter Six:Morphological analysis:to introduce feasible solutions of text annotation of phonetic, syntactic and semantic phenomenon:inflection, adhesion, tone, overlapping and polysemy.Chapter Seven:To offer the ways of outputting a variety of resources outcome, including the corpus, example sentences, collate copy lights, dictionaries, thesaurus, etc.Chapter Eight:To summarize the main conclusions and innovation of this paper, and introduce the work that will be done.This study describes the sources of text resources, the methods of syntax annotation and output technologies of diverse resources results. In this paper, syntax annotation of a large number of text resources is completed by way of dictionary strategies, text segmentation, interlacing control, match tagging, morphological processing, technical methods of deep and surface form, and the word grammar rules. This study improves the methods of researching Chinese language resources, promote the development of syntax study of minority languages and Chinese dialects, and especially, protect the endangered languages and non-material culture.
Keywords/Search Tags:language resources, text, dictionary, grammatical analysis, text annotation
PDF Full Text Request
Related items