Sample Selection For Statistical Parsing

Posted on:2007-12-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Sun

Full Text:PDF

GTID:2178360185986114

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Parsing is one of the fundamental problems in natural language processing, and the main approach is statistical parsing. Statistical parser relies on using many hand-parsed sentences as training examples. However, the task of labeling so many sentences is a labor-intensive task. We propose to select these samples to reduce the amount of sentences in the training data.Now the major approaches of sample selecting are based on active learning method which selected samples by parser itself. But this method closely relate parsing model. For coping with this disadvantage, we put forward a sample selecting method without model. In this method, we selected a subset from unlabeled sentences which had a similar syntactic rule distribution to the set of all sentences, so the precision and recall of the parser trained by the labeled subset would be similar to the result of parser trained by all sentences, and the labor of labeling would be less.We compared the syntactic rule distribution of the subset with the distribution of the whole set to see whether they were similar, and made a baseline of the distribution similarity between subset selected randomly and the whole set. The result showed us that the distribution similarity between the subset and the whole set was great. And then we trained PCFG parser by the subset labeled, the result was that the amount of training sentence could be reduced by 50% with an approximate performance of the statistical parser.

Keywords/Search Tags:

parsing, sample selection, syntactic rule

PDF Full Text Request

Related items

1	Research On Syntactic Parsing Based On Treebank Without Phrase Labels
2	Research On Chinese Syntactic Structure-Tree Based On Data-Oriented Parsing
3	The Study, Based On Chinese Syntactic Subcategorization Analysis Approach
4	Study On Sample Selection By Using SFL Algorithm
5	Sample Selection Algorithms Based On Sample Entropy And Pre-clustering
6	Research On Chinese Syntactic Parsing Based On SEARN Algorithm
7	Chunk Based Chinese Syntactic Parsing And Its Application
8	Efficient non-detministic search in structured prediction: A case study on syntactic parsing
9	Research On Reranking Technology For Chinese Syntactic Parsing
10	A New Algorithm For Sample Selection Based On The Reachable And Coverage