Font Size: a A A

Study On Chinese Constituent Parsing

Posted on:2015-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2348330473453646Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Constituent parsing (also known as phrase-structure parsing) is one of the core tasks of natural language processing, which is often used in many other nlp tasks, such as Statistical Machine Translation, Semantic Role Labeling, Question Answering and Information Extraction. Since the release of human-labeled corpus (called treebank in parsing), data-driven approaches have become the main stream of constituent parsing. Currently, many state-of-the-art parsers could achieve high performance, but many of those parsers have a drawback that they parse sentences very slowly. As we know, slow parsers could not feed the modern application demand. In this thesis, we study on fast parsing technique that it is a shift-reduce based parser. This parsing model has a great advantage in parsing speed and it can also keep high performance. In addition, we also propose some methods which aim to improve this parser. The following is the content of this thesis:Firstly, we study and build a Chinese constituent parser as the baseline system. Our parser is based on shift-reduce algorithm, which is a bottom-up parsing algorithm. It transfers the parsing process into an optimal action sequence search process. Besides, it’s a linear model, so it can accomplish constituent parsing efficiently. When building the baseline system, we choose perceptron in our training process, which is used to train the parsing model. Besides, we choose beam-search in our decoder. They can ensure linear time complexity and high accuracy. All of our study is based on this baseline parser.Secondly, we research on the method to improve the Chinese constituent parsing performance. After analyzing the experiment result, we proposed two methods to improve the Chinese constituent parsing:The first is to use richer features to improve our parsing model. The other is to use simple semi-supervised method to expand the training set, and then improve the precision of action decision to improve the baseline performance.The contribution of our work includes the followings:we study and build a fast Chinese constituent parser. We obtain more than 80 sentences per second speed in our experiment. We proposed two methods to improve the performance of our parser. And we conducted a number of experiments to prove the effectiveness of the methods. The experiments show that most of our proposed methods can help improve Chinese constituent parsing performance. When we use the Chinese Penn Treebank corpus, we get 84.55% as the result of F1-score.
Keywords/Search Tags:Natural Language Processing, Constituent Parsing, Shift-Reduce, Perceptron, Beam-Search, Semi-supervised
PDF Full Text Request
Related items