Font Size: a A A

Syntactic Parsing Of English Functional Clauses Based On CRFs

Posted on:2015-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:M ZongFull Text:PDF
GTID:2285330467980422Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Automatic syntactic parsing is a key and important field in Natural Language Processing (NLP), but it has reached its bottleneck period. Not only a proper arithmetic or parsing model is needed to improve parsing result, but also a well-founded linguistic grammar. Systemic Functional Grammar (SFG) is a grammar that emphasizes the function of language. When it comes to analyzing the structure of a clause, semantics and context will also be considered in SFG, which may help in solving semantic problems in automatic parsing. As Functional clause syntax is a new field in SFG, to apply functional clause theories into parsing is worth studying.Based on the previous researches on functional clause, this thesis divided the functional constituents of clause into seven types:Subject, Predicator, Residue of Predicator, Complement, Complement2/3/4, Residue of Complement and Adjunct. A small self-made corpus in Business English domain was annotated according to the seven function types manually. Then, six-fold experiments were conducted under CRFs. Results showed that, our system achieved an overall precision of92.5%, recall of91.96%and F-score92.18%, which were quite favorable. At the meantime, the identification of Predicator and Subject were the best, with precision, recall and F-score all over97%. The identification of the first complement was the second best, with a precision of93.39%, recall of88.62%and F-score of90.86%. The identification of adjunct, complement, the second complement and residue of the complement were not so well.In order to improve the parsing result, an error analysis from the perspective of linguistics was conducted. The corpus used in error analysis consisted of5021sentences after parsing. Through the analysis of SQL Server,193kinds of errors were found out. They were divided into three kinds of first-class errors, seven kinds of second-class errors and38kinds of third-class errors. The classification and data showed that manual annotation errors accounted for the least proportion in all the errors, and the identification of adjunct and complement errors took the largest proportion. Possible causes of errors were also analyzed and classified in this thesis. There were four kinds of causes with13sub-kinds of causes. The four kinds of causes include:sentence structure problems, omission problems, manual problems and separation problems by punctuation. After each cause, one or two examples were displayed. The error analysis indicated that not only the size of the corpus may influence the quality of parsing output, but the ability of learning and applying of the linguistic knowledge of the machine may be is the key factor as well.This thesis is a tentative research in applying functional clause into parsing. The parsing system still needs to be improved.
Keywords/Search Tags:Systemic Functional Grammar, functional clause, parsing, error analysis
PDF Full Text Request
Related items