Font Size: a A A

Isomorphic Word Law Function Disambiguation

Posted on:2009-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Z YuFull Text:PDF
GTID:2205360245976514Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The analysis of the automatic sentence structure is a focus that computational linguistics is studied at present, which is a difficult point at the same time, because it is some Chinese characteristics that make the research in automatic sentence structure of Chinese more difficult. While making automatic sentence structure of modern Chinese based on the grammar of function matches t, we found that modern Chinese with shape word had a lot of function ambiguous, which will cause a large number of branches to our analysis tree and bring very great difficulty to our analysis. So it is necessary to probing into alone with the shape word based on the function of the sentence structure. Taken some feasible measures to dealing with the shape word at the initial stage, this subject is practised in the project so that the quantity of branch of reducing the sentence structure and analyzing the tree, has improved pace and quality of the analysor.The dissertation carried on statistical analysis with the shape word in 973 treebanks . From the statistics, we found,the shape word occupied larger proportion and had very great function in modem Chinese. So it is to very importan to analyse and research in the noumenonn research of modern Chinese and automatic sentence structure.Because our automatic sentence structure analysor of modern Chinese based on the grammar of function matches only utilized the sentence structure functions of word and phrase from the 973 treebanks which gave up informations such as morphological feature and the meaning of a word,etc.So we utilized sentence structure function only and eliminated ambiguity in certain range. After getting rid of the morphological feature and information of the meaning of a word, we could see context information can be used for dispelling the valid information in ambiguous function. On the basis of drawing lessons from the past morphological feature and the meaning of a word in small-scale test, the dissertation determines to eliminate ambiguity on the basis of the context information with the shape word. The researches of the articale in eliminating ambiguity were divided into two parts.For one thing , based on shape word collocation, that is to eliminate ambiguity according to carry on grammar function in particular context ;For another , based on linguistic context information of shape word , that is to say through calculate semantic similarity of shape word in context. Because these tactics all put forth effort on statistics of the large corpus and there are less analysis and description of the angle of linguistics, the credibility of the result improves greatly.The dissertation used the lexicons of "打" and "花" to have test. The closed test is 973 treebanks,and the open test is Peking corpus,the scale of which is one month. The recall and precision in the collocation respectively reaches80.4%and 91.5% in the closed test, and the open test is 69.16% and 70.00%. The recall and precision in the context respectively reaches93.68%and 92.56% in the closed test, and the open test is 72.06% and 62.50%. The result of tests is very good. Because it is difficult to extract the knowledge of collocation from the large corpus, we should use the method of calculating the context to automatically analyse the sentence.
Keywords/Search Tags:homomorph, parsing, grammaral function, contextual calculation, eliminating ambiguity
PDF Full Text Request
Related items