Font Size: a A A

Sample Selection For Statistical Parsing

Posted on:2007-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:J SunFull Text:PDF
GTID:2178360185986114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Parsing is one of the fundamental problems in natural language processing, and the main approach is statistical parsing. Statistical parser relies on using many hand-parsed sentences as training examples. However, the task of labeling so many sentences is a labor-intensive task. We propose to select these samples to reduce the amount of sentences in the training data.Now the major approaches of sample selecting are based on active learning method which selected samples by parser itself. But this method closely relate parsing model. For coping with this disadvantage, we put forward a sample selecting method without model. In this method, we selected a subset from unlabeled sentences which had a similar syntactic rule distribution to the set of all sentences, so the precision and recall of the parser trained by the labeled subset would be similar to the result of parser trained by all sentences, and the labor of labeling would be less.We compared the syntactic rule distribution of the subset with the distribution of the whole set to see whether they were similar, and made a baseline of the distribution similarity between subset selected randomly and the whole set. The result showed us that the distribution similarity between the subset and the whole set was great. And then we trained PCFG parser by the subset labeled, the result was that the amount of training sentence could be reduced by 50% with an approximate performance of the statistical parser.
Keywords/Search Tags:parsing, sample selection, syntactic rule
PDF Full Text Request
Related items