Study On Chinese Constituent Parsing

Posted on:2015-11-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z Liu

Full Text:PDF

GTID:2348330473453646

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Constituent parsing (also known as phrase-structure parsing) is one of the core tasks of natural language processing, which is often used in many other nlp tasks, such as Statistical Machine Translation, Semantic Role Labeling, Question Answering and Information Extraction. Since the release of human-labeled corpus (called treebank in parsing), data-driven approaches have become the main stream of constituent parsing. Currently, many state-of-the-art parsers could achieve high performance, but many of those parsers have a drawback that they parse sentences very slowly. As we know, slow parsers could not feed the modern application demand. In this thesis, we study on fast parsing technique that it is a shift-reduce based parser. This parsing model has a great advantage in parsing speed and it can also keep high performance. In addition, we also propose some methods which aim to improve this parser. The following is the content of this thesis:Firstly, we study and build a Chinese constituent parser as the baseline system. Our parser is based on shift-reduce algorithm, which is a bottom-up parsing algorithm. It transfers the parsing process into an optimal action sequence search process. Besides, it’s a linear model, so it can accomplish constituent parsing efficiently. When building the baseline system, we choose perceptron in our training process, which is used to train the parsing model. Besides, we choose beam-search in our decoder. They can ensure linear time complexity and high accuracy. All of our study is based on this baseline parser.Secondly, we research on the method to improve the Chinese constituent parsing performance. After analyzing the experiment result, we proposed two methods to improve the Chinese constituent parsing:The first is to use richer features to improve our parsing model. The other is to use simple semi-supervised method to expand the training set, and then improve the precision of action decision to improve the baseline performance.The contribution of our work includes the followings:we study and build a fast Chinese constituent parser. We obtain more than 80 sentences per second speed in our experiment. We proposed two methods to improve the performance of our parser. And we conducted a number of experiments to prove the effectiveness of the methods. The experiments show that most of our proposed methods can help improve Chinese constituent parsing performance. When we use the Chinese Penn Treebank corpus, we get 84.55% as the result of F1-score.

Keywords/Search Tags:

Natural Language Processing, Constituent Parsing, Shift-Reduce, Perceptron, Beam-Search, Semi-supervised

PDF Full Text Request

Related items

1	On Constituent Parsing With Multiple Data Sources
2	Research On Shift-Reduce Incremental AMR Parsing
3	Research On Kazakh Syntactic Parsing Auxiliary Feature Extraction
4	Research On Natural Language Syntactic Parsing Based On Deep Learning
5	Research On Technology Of Chinese Dependency Parsing
6	Research On The Application Of Semi-supervised Learning In Natural Language Processing
7	Research On Chinese Syntactic Parsing Based On SEARN Algorithm
8	Research On Pre-training Model For Text Analysis Based On Semi-supervised
9	Implementing Nature Language Interfaces To Chinese GIS Based On Semantic Parsing
10	Large-Scale Semi-Supervised Learning for Natural Language Processing