Font Size: a A A

A Research Of Two-word Structure Independent Application In Modern Chinese Based On Collocation

Posted on:2015-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ChenFull Text:PDF
GTID:2255330428965539Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
This paper adopts the method of word frequency to counts all two-words structure that be used independently in corpus. Then to ensure these two-word structure can be used independently by the pre-set threshold and to compare it with common phrases in modern Chinese. Make us to know the rules of composition, function and independence of the two-word structure.This paper can be divided into five chapters:The first chapter mainly introduces the concept of "collocation", as well as domestic and foreign relevant experimental research about words collocation, to lay a foundation for the research of this paper. Simultaneously, we will briefly expounds the research purpose and significance, research methods and the structure of organization and so on.The second chapter respectively introduces the situation of the Chinese word segmentation and part-of-speech tagging, including definition,method and the facing difficulties, after based on the actual situation of this study build a part-of-speech tagging set suitable experiments below. Finally, we will use the word segmentation and annotation assessment to evaluate the accuracy of segmentation and POS tagging software, verify its reliability.The third chapter through text corpus experiments to determine the appropriate threshold, in order to get effective experimental data of the two-word structure and to be prepared for the next large-scale corpus experiment. First of all, this chapter need to extract all two-word structure between the two Chinese punctuation, and assumes these two-word structure can be used independently.Then statistics of the number of its independence, the number of total number occurrences in the full text and the number of the word alone occurrences in the full text, and establish a glossary Frel, Fre2and Fre3respectively. Next, divide a low frequency and high frequency region by textl and choose the two words structure of the high frequency areas, and through the text2(the ratio of Fre1/Fre2is observed whether a two words structure occurrence in the full text will be accidentally) and text3(mutual information value is observed whether the two words have the connectivity and its connection strength)set threshold jointly, exclude the two-word structure that can not be used independently.The fourth chapter is large-scale corpus experiment, it is mainly compare the experimental data obtained with the common phrases in modern Chinese, to judge the word formation rules and function of the two-word structure. And from the point of view of the collocation to find what types of two-word structure can be two-word sentence. Due to the structure of the Chinese phrases principle and the structure of the sentence principle is consistent basically, so it will help us analysis the two-word sentence from which we research the structure of the word combination rules and features, firstly, we use test corpus experiment setting threshold and filter the high frequency area data to obtain the final results. Select a sample of1000two-word structure from the final results to manual analysis randomly, not only comparison of the rule table of the ten common phrases of modern Chinese, observe the structure of the two-word structures, but also comparison of the function list of the phrase of modern Chinese, observe the function of the two-word structures, finally it make mention of the type of two-word sentences.The fifth chapter is the conclusion to summarizes this article, put forward some problems of objective in this paper and discuss some work that can be carried out in the future.
Keywords/Search Tags:Collocation, Word Frequency Statistics, Two-word Structure, Independent Application
PDF Full Text Request
Related items