Forest-based algorithms in natural language processing

Posted on:2009-08-16

Degree:Ph.D

Type:Dissertation

University:University of Pennsylvania

Candidate:Huang, Liang

Full Text:PDF

GTID:1443390005452458

Subject:Computer Science

Abstract/Summary:

Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates. For example, a parser aims to find the best syntactic tree for a given sentence among all derivations under a grammar, and a machine translation (MT) decoder explores the space of all possible translations of the source-language sentence. In these cases, the concept of packed forest provides a compact representation of huge search spaces by sharing common sub-derivations, where efficient algorithms based on Dynamic Programming (DP) are possible.;Building upon the hypergraph formulation of forests and well-known 1-best DP algorithms, this dissertation develops fast and exact k-best DP algorithms on forests, which are orders of magnitudes faster than previously used methods on state-of-the-art parsers. We also show empirically how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications.;We then extend these algorithms to approximate search when the forests are too big for exact inference. We discuss two particular instances of this new method, forest rescoring for MT decoding, and forest reranking for parsing. In both cases, our methods perform orders of magnitudes faster than conventional approaches. In the latter, faster search also leads to better learning, where our approximate decoding makes whole-Treebank discriminative training practical and results in an accuracy better than any previously reported systems trained on the Treebank.;Finally, we apply the above materials to the problem of syntax-based translation and propose a new paradigm, forest-based translation. This scheme translates a packed forest of the source sentence into a target sentence, rather than just using 1-best or k -best parses as in usual practice. By considering exponentially many alternatives, it alleviates the propagation of parsing errors into translation, yet only comes with fractional overhead in running time. We also push this direction further to extract translation rules from packed forests. The combined results of forest-based decoding and rule extraction show significant improvements in translation quality with large-scale experiments, and consistently outperform the hierarchical system Hiero, one of the best performing systems to date.

Keywords/Search Tags:

Algorithms, Translation, Forest, Search

Related items

1	Research Of The Theory For Optimal Water Distribution And The Management Decision-Making System In Irrigation District
2	The Influence Of The Translation Initarion Region And A Single Nucleotide Mutation Of2A Of Foot-and-mouth Disease Virus On The Translation Initiation And Cleavage Efficiency
3	Study On The Intelligent Grain Monitoring Device Based On The Auto-focus
4	Research And Application Of Grain Level Measurement In Intelligent Grain Depot
5	Research On The Intelligent Distribution And Prediction For Crop Water
6	RNA Structure Basis And Molecular Evolution Of Cap-independent Translation In Tobacco Bushy Top Virus
7	Research On Focused Search Engine For Forestry
8	Newcastle Disease Virus Infection Upregulates The Assembly Of Eukaryotic Translation Initiation Factor4F To Benefit Viral Translation And Replication
9	Research On Models And Algorithms For Agricultural Machinery Scheduling Problem With Time Window
10	Molecular mechanisms of cap and poly(A) independent translation of barley yellow dwarf virus RNA